AIRoboticsSoftwareHardware

PetBot: AI-Powered Social Robot with Local LLM Processing

February 1, 2025

PetBot: AI-Powered Social Robot with Local LLM Processing

Tech Stack

Raspberry Pi 5FlaskSocket.IOOllamaMQTTPyQt6Servo Motors3D Printing

Creating an Autonomous Social Robot with Edge AI

PetBot represents the convergence of artificial intelligence, robotics, and human-computer interaction—a fully autonomous social robot capable of natural conversation, emotional expression, and adaptive learning. Built entirely with open-source technologies and local processing, it demonstrates how advanced AI can be deployed at the edge without relying on cloud services.

Project Demonstration

The Vision: Accessible Social Robotics

Social robots have traditionally been expensive, proprietary systems requiring constant internet connectivity. PetBot challenges this paradigm by providing:

  • Privacy-First Design: All AI processing happens locally
  • Open Source Architecture: Complete hardware and software designs available
  • Affordable Components: Built with off-the-shelf electronics under £300
  • Extensible Platform: Modular design supports various applications

Technical Architecture

Edge AI Processing

Local LLM Implementation:

  • Ollama Integration: Running Gemma 3:1b/4b models entirely on Raspberry Pi 5
  • Real-time Inference: Sub-2 second response times for natural conversation
  • Memory Management: Efficient context handling for extended interactions
  • No Internet Required: Complete functionality in offline environments

Speech Processing Pipeline:

Microphone → Vosk (Speech-to-Text) → Ollama (LLM) → Piper (Text-to-Speech) → Speaker

Multi-Modal Interface System

Three Operational Modes:

  1. Robot Mode: Autonomous social interaction

    • Natural conversation with personality traits
    • Emotional expression through servo movements
    • Adaptive responses based on interaction history
  2. Developer Mode: Technical configuration and monitoring

    • Real-time system diagnostics
    • Parameter adjustment interface
    • Debug logging and performance metrics
  3. Demo Mode: Presentation and showcase functionality

    • Scripted interactions for demonstrations
    • Performance optimization for public display
    • Simplified user interface for non-technical users

Hardware Integration

Mechanical Design:

  • Custom 3D-Printed Chassis: Designed in Onshape CAD software
  • Servo Motor Array: 3 servos for expressive ears and arm movement
  • DC Motor System: 4 TT gearbox motors for mobility
  • Compact Footprint: 20cm × 15cm × 18cm desktop-friendly form factor

Electronic Systems:

  • Raspberry Pi 5: 8GB RAM for AI processing and system control
  • Computer Vision: ESP32 camera module with YOLO V8n object detection
  • Audio System: USB microphone and amplified speakers
  • Sensor Array: VL53L5CX Time-of-Flight sensor for spatial awareness
  • Display: Raspberry Pi Touch 2 (800×480) for user interaction
  • Power Management: Pi UPS Hat providing ~5 hours runtime

Software Engineering Excellence

Real-Time Communication Architecture

Flask Web Server with Socket.IO:

  • Asynchronous Communication: Real-time updates between components
  • WebSocket Protocol: Low-latency bidirectional communication
  • RESTful API: Standard HTTP endpoints for configuration
  • Session Management: Persistent connections across interactions

MQTT Integration:

  • Distributed System Design: Decoupled component communication
  • Topic-Based Messaging: Organized data flow between subsystems
  • Quality of Service: Guaranteed message delivery for critical commands
  • Scalability: Easy addition of new sensors and actuators

PyQt6 User Interface

Professional Desktop Application:

  • Native Performance: Optimized Qt framework for responsive UI
  • Multi-Window Management: Dedicated interfaces for different modes
  • Real-Time Visualization: Live data streaming and status monitoring
  • Cross-Platform Compatibility: Runs on Windows, macOS, and Linux

AI and Machine Learning Integration

Conversational AI Implementation

Natural Language Understanding:

  • Context Awareness: Maintains conversation history and context
  • Personality Modeling: Consistent character traits across interactions
  • Emotional Intelligence: Recognizes and responds to user emotional states
  • Domain Adaptation: Specialized knowledge bases for specific applications

Computer Vision Capabilities:

  • Person Tracking: Real-time human detection and following
  • Object Recognition: YOLO V8n for environmental understanding
  • Spatial Awareness: ToF sensor integration for navigation
  • Expression Recognition: Visual feedback for emotional responses

Performance Optimization

Edge Computing Constraints:

  • Model Selection: Gemma 3:1b for real-time performance vs 4b for accuracy
  • Memory Optimization: Efficient context window management
  • Thermal Management: CPU throttling prevention strategies
  • Battery Optimization: Power-aware processing modes

Collaborative Development Process

Team Coordination (35% Individual Contribution)

Primary Technical Leadership:

  • System Architecture: Designed overall software and hardware integration
  • AI Implementation: Developed LLM integration and conversation management
  • Real-Time Systems: Implemented communication protocols and timing
  • Hardware Integration: Managed servo control and sensor interfaces

Collaborative Elements:

  • Mechanical Design: Worked with team on chassis optimization
  • User Experience: Coordinated interface design across modes
  • Testing Protocol: Developed comprehensive validation procedures
  • Documentation: Created technical specifications and user guides

Development Methodology

Agile Practices:

  • Sprint Planning: Weekly development cycles with defined deliverables
  • Continuous Integration: Automated testing and deployment pipelines
  • Code Review: Peer review process for quality assurance
  • Version Control: Git-based workflow with feature branching

Innovation and Technical Achievements

Edge AI Breakthrough

Local LLM Deployment:

  • Successfully running 1-4B parameter models on embedded hardware
  • Achieved conversational AI without cloud dependencies
  • Demonstrated feasibility of privacy-preserving social robots
  • Optimized inference pipeline for real-time interaction

System Integration Excellence

Multi-Modal Fusion:

  • Seamless integration of speech, vision, and motor control
  • Real-time coordination between AI and physical systems
  • Robust error handling and recovery mechanisms
  • Scalable architecture supporting future enhancements

Open Source Contribution

Community Impact:

  • Complete project documentation and build instructions
  • Reusable components for other robotics projects
  • Educational resource for AI and robotics learning
  • Platform for research and development collaboration

Real-World Applications

Educational Technology

  • STEM Learning: Interactive programming and robotics education
  • Language Learning: Conversational practice with AI tutor
  • Special Needs Support: Assistive technology for communication
  • Research Platform: Academic research in human-robot interaction

Commercial Potential

  • Customer Service: Retail and hospitality applications
  • Elder Care: Companionship and monitoring systems
  • Entertainment: Interactive gaming and storytelling
  • Therapy: Social skills development and emotional support

Performance Metrics

System Specifications:

  • Response Time: <2 seconds for conversational AI
  • Battery Life: ~5 hours continuous operation
  • Processing Power: 8GB RAM with thermal management
  • Weight: 850g total system weight
  • Mobility: 4-wheel drive with differential steering

AI Performance:

  • Speech Recognition: >95% accuracy in quiet environments
  • LLM Inference: Context-aware responses with personality consistency
  • Computer Vision: Real-time object detection at 15 FPS
  • Motor Control: Precise servo positioning with emotional expression

Future Development Roadmap

Technical Enhancements

  • Advanced Navigation: SLAM implementation for autonomous mapping
  • Gesture Recognition: Hand tracking for enhanced interaction
  • Multi-Language Support: Conversation in multiple languages
  • Cloud Synchronization: Optional cloud backup while maintaining privacy

AI Capabilities

  • Emotion Recognition: Visual and audio emotion detection
  • Skill Learning: Dynamic capability acquisition through interaction
  • Personality Customization: User-defined character traits and behaviors
  • Social Learning: Group interaction and social behavior modeling

Impact and Recognition

PetBot demonstrates that sophisticated social robotics is accessible to individual developers and small teams. By combining cutting-edge AI with practical engineering, it proves that the future of human-robot interaction doesn't require massive corporate resources or cloud dependencies.

Technical Skills Demonstrated:

  • AI/ML Engineering: Local LLM deployment and optimization
  • Robotics Integration: Hardware-software system design
  • Full-Stack Development: Web services and desktop applications
  • Real-Time Systems: Low-latency communication and control
  • Mechanical Design: 3D modeling and manufacturing
  • Project Management: Collaborative development coordination

The success of PetBot opens new possibilities for privacy-preserving AI systems and demonstrates the potential for democratizing advanced robotics technology.


PetBot was developed as a collaborative project demonstrating the integration of modern AI technologies with practical robotics engineering. The complete source code, CAD files, and documentation are available for the open-source community.