Creating an Autonomous Social Robot with Edge AI

PetBot represents the convergence of artificial intelligence, robotics, and human-computer interaction—a fully autonomous social robot capable of natural conversation, emotional expression, and adaptive learning. Built entirely with open-source technologies and local processing, it demonstrates how advanced AI can be deployed at the edge without relying on cloud services.

Project Demonstration

The Vision: Accessible Social Robotics

Social robots have traditionally been expensive, proprietary systems requiring constant internet connectivity. PetBot challenges this paradigm by providing:

Privacy-First Design: All AI processing happens locally
Open Source Architecture: Complete hardware and software designs available
Affordable Components: Built with off-the-shelf electronics under £300
Extensible Platform: Modular design supports various applications

Technical Architecture

Edge AI Processing

Local LLM Implementation:

Ollama Integration: Running Gemma 3:1b/4b models entirely on Raspberry Pi 5
Real-time Inference: Sub-2 second response times for natural conversation
Memory Management: Efficient context handling for extended interactions
No Internet Required: Complete functionality in offline environments

Speech Processing Pipeline:

Microphone → Vosk (Speech-to-Text) → Ollama (LLM) → Piper (Text-to-Speech) → Speaker

Multi-Modal Interface System

Three Operational Modes:

Robot Mode: Autonomous social interaction
- Natural conversation with personality traits
- Emotional expression through servo movements
- Adaptive responses based on interaction history
Developer Mode: Technical configuration and monitoring
- Real-time system diagnostics
- Parameter adjustment interface
- Debug logging and performance metrics
Demo Mode: Presentation and showcase functionality
- Scripted interactions for demonstrations
- Performance optimization for public display
- Simplified user interface for non-technical users

Hardware Integration

Mechanical Design:

Custom 3D-Printed Chassis: Designed in Onshape CAD software
Servo Motor Array: 3 servos for expressive ears and arm movement
DC Motor System: 4 TT gearbox motors for mobility
Compact Footprint: 20cm × 15cm × 18cm desktop-friendly form factor

Electronic Systems:

Raspberry Pi 5: 8GB RAM for AI processing and system control
Computer Vision: ESP32 camera module with YOLO V8n object detection
Audio System: USB microphone and amplified speakers
Sensor Array: VL53L5CX Time-of-Flight sensor for spatial awareness
Display: Raspberry Pi Touch 2 (800×480) for user interaction
Power Management: Pi UPS Hat providing ~5 hours runtime

Software Engineering Excellence

Real-Time Communication Architecture

Flask Web Server with Socket.IO:

Asynchronous Communication: Real-time updates between components
WebSocket Protocol: Low-latency bidirectional communication
RESTful API: Standard HTTP endpoints for configuration
Session Management: Persistent connections across interactions

MQTT Integration:

Distributed System Design: Decoupled component communication
Topic-Based Messaging: Organized data flow between subsystems
Quality of Service: Guaranteed message delivery for critical commands
Scalability: Easy addition of new sensors and actuators

PyQt6 User Interface

Professional Desktop Application:

Native Performance: Optimized Qt framework for responsive UI
Multi-Window Management: Dedicated interfaces for different modes
Real-Time Visualization: Live data streaming and status monitoring
Cross-Platform Compatibility: Runs on Windows, macOS, and Linux

AI and Machine Learning Integration

Conversational AI Implementation

Natural Language Understanding:

Context Awareness: Maintains conversation history and context
Personality Modeling: Consistent character traits across interactions
Emotional Intelligence: Recognizes and responds to user emotional states
Domain Adaptation: Specialized knowledge bases for specific applications

Computer Vision Capabilities:

Person Tracking: Real-time human detection and following
Object Recognition: YOLO V8n for environmental understanding
Spatial Awareness: ToF sensor integration for navigation
Expression Recognition: Visual feedback for emotional responses

Performance Optimization

Edge Computing Constraints:

Model Selection: Gemma 3:1b for real-time performance vs 4b for accuracy
Memory Optimization: Efficient context window management
Thermal Management: CPU throttling prevention strategies
Battery Optimization: Power-aware processing modes

Collaborative Development Process

Team Coordination (35% Individual Contribution)

Primary Technical Leadership:

System Architecture: Designed overall software and hardware integration
AI Implementation: Developed LLM integration and conversation management
Real-Time Systems: Implemented communication protocols and timing
Hardware Integration: Managed servo control and sensor interfaces

Collaborative Elements:

Mechanical Design: Worked with team on chassis optimization
User Experience: Coordinated interface design across modes
Testing Protocol: Developed comprehensive validation procedures
Documentation: Created technical specifications and user guides

Development Methodology

Agile Practices:

Sprint Planning: Weekly development cycles with defined deliverables
Continuous Integration: Automated testing and deployment pipelines
Code Review: Peer review process for quality assurance
Version Control: Git-based workflow with feature branching

Innovation and Technical Achievements

Edge AI Breakthrough

Local LLM Deployment:

Successfully running 1-4B parameter models on embedded hardware
Achieved conversational AI without cloud dependencies
Demonstrated feasibility of privacy-preserving social robots
Optimized inference pipeline for real-time interaction

System Integration Excellence

Multi-Modal Fusion:

Seamless integration of speech, vision, and motor control
Real-time coordination between AI and physical systems
Robust error handling and recovery mechanisms
Scalable architecture supporting future enhancements

Open Source Contribution

Community Impact:

Complete project documentation and build instructions
Reusable components for other robotics projects
Educational resource for AI and robotics learning
Platform for research and development collaboration

Real-World Applications

Educational Technology

STEM Learning: Interactive programming and robotics education
Language Learning: Conversational practice with AI tutor
Special Needs Support: Assistive technology for communication
Research Platform: Academic research in human-robot interaction

Commercial Potential

Customer Service: Retail and hospitality applications
Elder Care: Companionship and monitoring systems
Entertainment: Interactive gaming and storytelling
Therapy: Social skills development and emotional support

Performance Metrics

System Specifications:

Response Time: <2 seconds for conversational AI
Battery Life: ~5 hours continuous operation
Processing Power: 8GB RAM with thermal management
Weight: 850g total system weight
Mobility: 4-wheel drive with differential steering

AI Performance:

Speech Recognition: >95% accuracy in quiet environments
LLM Inference: Context-aware responses with personality consistency
Computer Vision: Real-time object detection at 15 FPS
Motor Control: Precise servo positioning with emotional expression

Future Development Roadmap

Technical Enhancements

Advanced Navigation: SLAM implementation for autonomous mapping
Gesture Recognition: Hand tracking for enhanced interaction
Multi-Language Support: Conversation in multiple languages
Cloud Synchronization: Optional cloud backup while maintaining privacy

AI Capabilities

Emotion Recognition: Visual and audio emotion detection
Skill Learning: Dynamic capability acquisition through interaction
Personality Customization: User-defined character traits and behaviors
Social Learning: Group interaction and social behavior modeling

Impact and Recognition

PetBot demonstrates that sophisticated social robotics is accessible to individual developers and small teams. By combining cutting-edge AI with practical engineering, it proves that the future of human-robot interaction doesn't require massive corporate resources or cloud dependencies.

Technical Skills Demonstrated:

AI/ML Engineering: Local LLM deployment and optimization
Robotics Integration: Hardware-software system design
Full-Stack Development: Web services and desktop applications
Real-Time Systems: Low-latency communication and control
Mechanical Design: 3D modeling and manufacturing
Project Management: Collaborative development coordination

The success of PetBot opens new possibilities for privacy-preserving AI systems and demonstrates the potential for democratizing advanced robotics technology.

PetBot was developed as a collaborative project demonstrating the integration of modern AI technologies with practical robotics engineering. The complete source code, CAD files, and documentation are available for the open-source community.

PetBot: AI-Powered Social Robot with Local LLM Processing

Tech Stack