PepperPerception is a lightweight perception service designed to provide object detection capabilities to the Pepper robot (or other applications) via a ZeroMQ interface.
It supports multiple backends:
- Ultralytics YOLOv8: For object detection.
- Google MediaPipe Holistic: For face, hand, and pose tracking.
- Combined: Runs both YOLO and MediaPipe simultaneously.
This service is designed to run in a Docker container, preferably on a machine with an NVIDIA GPU.
- Multi-Backend: Switch between YOLO (Object Detection), MediaPipe (Holistic Tracking), or Combined.
- ZeroMQ Interface: Exposes a fast and language-agnostic API using ZMQ (REP/REQ pattern).
- Dockerized: Easy to deploy with all dependencies encapsulated.
- GPU Accelerated: Configured to leverage NVIDIA GPUs for inference (YOLO).
- Docker
- Docker Compose
- NVIDIA Container Toolkit (Required for GPU support)
Start the service:
docker-compose up --buildThe service will start and listen on port 5557.
You can configure the backend in docker-compose.yml.
-
YOLO Backend (Default):
command: [ "python", "-m", "perception_service.main", "--backend", "yolo", "--model", "yolov8m.pt" ]
-
MediaPipe Backend:
command: [ "python", "-m", "perception_service.main", "--backend", "mediapipe" ]
-
Combined Backend:
command: [ "python", "-m", "perception_service.main", "--backend", "combined" ]
Send a multipart ZMQ message where the last frame contains the encoded image bytes.
- Frame 0 (Optional): Metadata JSON string.
- Frame 1 (Last): Encoded Image Bytes.
The service replies with a JSON object.
YOLO Response:
{
"status": "success",
"backend": "yolo",
"data": [
{
"class": "person",
"confidence": 0.92,
"bbox": [100.0, 50.0, 250.0, 400.0]
}
]
}MediaPipe Response:
{
"status": "success",
"backend": "mediapipe",
"data": {
"pose_landmarks": [{"x": 0.5, "y": 0.5, "z": 0.0, "visibility": 0.9}, ...],
"face_landmarks": [...],
"left_hand_landmarks": [...],
"right_hand_landmarks": [...]
}
}Combined Response:
{
"status": "success",
"backend": "combined",
"data": {
"detections": [...], // YOLO results
"pose_landmarks": [...], // MediaPipe results
"face_landmarks": [...],
...
}
}The repository includes a benchmarking utility.
# Run benchmark (requires service to be running)
python benchmark.pyThis project is licensed under the Apache License 2.0, see the LICENSE file for details.