Wendy LogoWendy
Guides & TutorialsPython Guides

YOLOv8 Face Detection

Build a real-time face detection application with YOLOv8 and WebSocket streaming on WendyOS

Real-Time Face Detection with YOLOv8

Computer vision is one of the most powerful capabilities of edge devices. In this guide, we'll build a real-time face detection application using YOLOv8, one of the most popular object detection models. The application captures video from a webcam, runs inference on each frame, and streams the results to a web browser with bounding boxes drawn around detected faces.

Prerequisites

  • Wendy CLI installed on your development machine
  • Python 3.14 installed
  • Docker installed (see Docker Installation)
  • A WendyOS device plugged in over USB or connectable over Wi-Fi
  • A USB webcam connected to your WendyOS device

Setting Up Your Project

Create a New Directory

First, create a directory for your project:

mkdir yolov8-webcam
cd yolov8-webcam

Create requirements.txt

Create a requirements.txt file with the required dependencies:

fastapi
uvicorn[standard]
opencv-python-headless
ultralytics
numpy

Dependencies Explained:

  • fastapi - Modern web framework for building APIs
  • uvicorn - ASGI server for running FastAPI
  • opencv-python-headless - Computer vision library (headless for server environments)
  • ultralytics - YOLOv8 implementation
  • numpy - Numerical computing library

Create the Application

Create an app.py file with the face detection logic:

#!/usr/bin/env python3
import asyncio
import base64
import json
from pathlib import Path

import cv2
import numpy as np
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import FileResponse
from ultralytics import YOLO

app = FastAPI()

# Webcam capture settings
CAMERA_INDEX = 0
FRAME_WIDTH = 640
FRAME_HEIGHT = 480
JPEG_QUALITY = 80
TARGET_FPS = 15
CONFIDENCE_THRESHOLD = 0.5

# Load YOLOv8 model
model = None
MODEL_TYPE = "general"


def load_model():
    """Load the YOLOv8 model for detection."""
    global model, MODEL_TYPE
    if model is None:
        model = YOLO("yolov8n.pt")
        MODEL_TYPE = "general"
    return model


class CameraManager:
    """Manages webcam capture with YOLOv8 detection."""

    def __init__(self):
        self._cap: cv2.VideoCapture | None = None
        self._lock = asyncio.Lock()
        self._clients: set[WebSocket] = set()
        self._running = False
        self._task: asyncio.Task | None = None
        self._model = None

    async def _init_camera(self) -> bool:
        """Initialize the camera if not already open."""
        if self._cap is None or not self._cap.isOpened():
            self._cap = cv2.VideoCapture(CAMERA_INDEX)
            self._cap.set(cv2.CAP_PROP_FRAME_WIDTH, FRAME_WIDTH)
            self._cap.set(cv2.CAP_PROP_FRAME_HEIGHT, FRAME_HEIGHT)
            self._cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)

        if self._model is None:
            self._model = load_model()

        return self._cap.isOpened()

    async def _capture_loop(self):
        """Continuously capture, detect, and broadcast to clients."""
        frame_interval = 1.0 / TARGET_FPS
        while self._running and self._clients:
            start_time = asyncio.get_event_loop().time()

            result = await asyncio.get_event_loop().run_in_executor(
                None, self._capture_and_detect
            )

            if result is not None:
                disconnected = set()
                for ws in self._clients.copy():
                    try:
                        await ws.send_text(result)
                    except Exception:
                        disconnected.add(ws)
                self._clients -= disconnected

            elapsed = asyncio.get_event_loop().time() - start_time
            sleep_time = max(0, frame_interval - elapsed)
            if sleep_time > 0:
                await asyncio.sleep(sleep_time)

        self._running = False

    def _capture_and_detect(self) -> str | None:
        """Capture a frame, run detection, return JSON."""
        if self._cap is None or self._model is None:
            return None

        ret, frame = self._cap.read()
        if not ret:
            return None

        # Run YOLOv8 inference
        results = self._model(frame, verbose=False, conf=CONFIDENCE_THRESHOLD)

        # Extract detections
        detections = []
        for result in results:
            boxes = result.boxes
            if boxes is not None:
                for box in boxes:
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    confidence = float(box.conf[0])
                    class_id = int(box.cls[0])
                    class_name = self._model.names.get(class_id, "unknown")

                    # Filter for person class
                    if class_name == "person":
                        detections.append({
                            "x1": x1, "y1": y1,
                            "x2": x2, "y2": y2,
                            "confidence": confidence,
                            "class": class_name
                        })

        # Encode frame as JPEG
        encode_params = [cv2.IMWRITE_JPEG_QUALITY, JPEG_QUALITY]
        _, buffer = cv2.imencode(".jpg", frame, encode_params)
        image_base64 = base64.b64encode(buffer.tobytes()).decode("utf-8")

        return json.dumps({
            "type": "frame",
            "image": image_base64,
            "detections": detections,
            "width": frame.shape[1],
            "height": frame.shape[0]
        })

    async def add_client(self, websocket: WebSocket) -> bool:
        """Add a client and start streaming."""
        async with self._lock:
            if not await self._init_camera():
                return False
            self._clients.add(websocket)
            if not self._running:
                self._running = True
                self._task = asyncio.create_task(self._capture_loop())
            return True

    async def remove_client(self, websocket: WebSocket):
        """Remove a client and cleanup if needed."""
        async with self._lock:
            self._clients.discard(websocket)
            if not self._clients:
                self._running = False
                if self._task:
                    await self._task
                    self._task = None
                if self._cap:
                    self._cap.release()
                    self._cap = None


camera_manager = CameraManager()


@app.websocket("/stream")
async def websocket_stream(websocket: WebSocket):
    """WebSocket endpoint for video streaming."""
    await websocket.accept()
    if not await camera_manager.add_client(websocket):
        await websocket.close(code=1011, reason="Failed to open camera")
        return
    try:
        while True:
            try:
                await asyncio.wait_for(websocket.receive(), timeout=30.0)
            except asyncio.TimeoutError:
                await websocket.send_json({"type": "ping"})
    except WebSocketDisconnect:
        pass
    finally:
        await camera_manager.remove_client(websocket)


@app.get("/status")
async def get_status():
    """Return status information."""
    return {
        "connected_clients": len(camera_manager._clients),
        "camera_active": camera_manager._running,
        "model": "YOLOv8 Detection",
        "settings": {
            "width": FRAME_WIDTH,
            "height": FRAME_HEIGHT,
            "target_fps": TARGET_FPS,
            "confidence_threshold": CONFIDENCE_THRESHOLD,
        },
    }


@app.get("/")
async def root():
    """Serve the index.html file."""
    return FileResponse(Path(__file__).parent / "index.html")

Create the Frontend

Create an index.html file that displays the video stream with bounding boxes:

<!doctype html>
<html>
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>YOLOv8 Face Detection</title>
    <script src="https://cdn.jsdelivr.net/npm/@tailwindcss/browser@4"></script>
  </head>
  <body class="bg-gray-100 min-h-screen p-8">
    <div class="max-w-4xl mx-auto">
      <h1 class="text-3xl font-bold text-gray-800 mb-1">YOLOv8 Face Detection</h1>
      <p class="mb-2">
        <span id="status" class="inline-flex items-center gap-1.5 text-sm font-medium">
          <span id="status-dot" class="w-2 h-2 rounded-full bg-yellow-500"></span>
          <span id="status-text" class="text-yellow-600">Connecting...</span>
        </span>
      </p>
      <p class="text-gray-600 mb-6">Real-time detection on the edge device</p>

      <div class="bg-black rounded-lg shadow overflow-hidden relative">
        <div id="video-container" class="relative w-full" style="aspect-ratio: 4/3;">
          <img id="video-frame" class="absolute inset-0 w-full h-full object-contain bg-gray-900" />
          <canvas id="overlay-canvas" class="absolute inset-0 w-full h-full pointer-events-none"></canvas>
        </div>
        <div id="loading-overlay" class="absolute inset-0 flex items-center justify-center bg-gray-900">
          <div class="text-center">
            <svg class="animate-spin h-10 w-10 text-blue-500 mx-auto mb-3" fill="none" viewBox="0 0 24 24">
              <circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
              <path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
            </svg>
            <p class="text-gray-400">Loading YOLOv8 model...</p>
          </div>
        </div>
      </div>

      <div class="mt-4 flex flex-wrap items-center gap-6 text-sm text-gray-500">
        <span><span class="font-medium">FPS:</span> <span id="fps-counter">0</span></span>
        <span><span class="font-medium">Latency:</span> <span id="latency">--</span> ms</span>
        <span><span class="font-medium">Detections:</span> <span id="face-count">0</span></span>
      </div>
    </div>

    <script>
      const videoFrame = document.getElementById("video-frame");
      const overlayCanvas = document.getElementById("overlay-canvas");
      const ctx = overlayCanvas.getContext("2d");
      const loadingOverlay = document.getElementById("loading-overlay");
      const statusDot = document.getElementById("status-dot");
      const statusText = document.getElementById("status-text");
      const fpsCounter = document.getElementById("fps-counter");
      const latencyDisplay = document.getElementById("latency");
      const faceCountDisplay = document.getElementById("face-count");

      let frameCount = 0;
      let lastFpsUpdate = performance.now();
      let ws = null;
      let imageWidth = 640, imageHeight = 480;

      function setStatus(status, color) {
        statusDot.className = `w-2 h-2 rounded-full bg-${color}-500`;
        statusText.textContent = status;
        statusText.className = `text-${color}-600`;
      }

      function drawDetections(detections) {
        ctx.clearRect(0, 0, overlayCanvas.width, overlayCanvas.height);
        const scaleX = overlayCanvas.width / imageWidth;
        const scaleY = overlayCanvas.height / imageHeight;

        detections.forEach((det) => {
          const x = det.x1 * scaleX;
          const y = det.y1 * scaleY;
          const width = (det.x2 - det.x1) * scaleX;
          const height = (det.y2 - det.y1) * scaleY;

          ctx.strokeStyle = "#22c55e";
          ctx.lineWidth = 3;
          ctx.strokeRect(x, y, width, height);

          const label = `${det.class} ${(det.confidence * 100).toFixed(0)}%`;
          ctx.font = "bold 14px sans-serif";
          const textWidth = ctx.measureText(label).width;
          ctx.fillStyle = "#22c55e";
          ctx.fillRect(x, y - 24, textWidth + 8, 24);
          ctx.fillStyle = "#ffffff";
          ctx.fillText(label, x + 4, y - 6);
        });

        faceCountDisplay.textContent = detections.length;
      }

      function resizeCanvas() {
        const container = document.getElementById("video-container");
        overlayCanvas.width = container.clientWidth;
        overlayCanvas.height = container.clientHeight;
      }

      function connect() {
        const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
        ws = new WebSocket(`${protocol}//${window.location.host}/stream`);

        ws.onopen = () => setStatus("Connected", "green");

        ws.onmessage = (event) => {
          try {
            const frameStart = performance.now();
            const data = JSON.parse(event.data);

            if (data.type === "frame") {
              imageWidth = data.width;
              imageHeight = data.height;
              videoFrame.src = `data:image/jpeg;base64,${data.image}`;
              drawDetections(data.detections);

              if (loadingOverlay.style.display !== "none") {
                loadingOverlay.style.display = "none";
                resizeCanvas();
              }

              frameCount++;
              const now = performance.now();
              if (now - lastFpsUpdate >= 1000) {
                fpsCounter.textContent = Math.round((frameCount * 1000) / (now - lastFpsUpdate));
                frameCount = 0;
                lastFpsUpdate = now;
              }
              latencyDisplay.textContent = Math.round(performance.now() - frameStart);
            }
          } catch (e) {}
        };

        ws.onclose = () => {
          setStatus("Disconnected", "red");
          loadingOverlay.style.display = "flex";
          setTimeout(() => { setStatus("Reconnecting...", "yellow"); connect(); }, 2000);
        };
      }

      window.addEventListener("resize", resizeCanvas);
      resizeCanvas();
      connect();
    </script>
  </body>
</html>

How It Works: The frontend connects via WebSocket, receives JSON frames containing base64-encoded images and detection coordinates, then draws green bounding boxes on a canvas overlay.

Create the Dockerfile

Create a Dockerfile in the project root:

# Use uv with Python 3.14 (slim variant for better compatibility)
FROM ghcr.io/astral-sh/uv:python3.14-bookworm-slim

# Install build tools and OpenCV dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    libgl1 \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender1 \
    libv4l-0 \
    v4l-utils \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements first for better layer caching
COPY requirements.txt .

# Install Python dependencies
RUN uv pip install --system -r requirements.txt

# Copy application files
COPY app.py .
COPY index.html .

# Create a non-root user for security
RUN useradd --create-home --shell /bin/bash app && \
    chown -R app:app /app

# Pre-download the YOLOv8 model
RUN python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"

# Switch to non-root user
USER app

# Expose port 8100
EXPOSE 8100

# Run the application
CMD ["python", "-m", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8100"]

Model Download: The Dockerfile pre-downloads the YOLOv8n model during build time. This ensures faster startup on the device since the model is already included in the container image.

Create wendy.json

Create a wendy.json file to configure the application entitlements:

{
    "appId": "sh.wendy.examples.yolov8-webcam",
    "version": "1.0.0",
    "language": "python",
    "entitlements": [
        {
            "type": "network",
            "mode": "host"
        },
        {
            "type": "video"
        },
        {
            "type": "gpu"
        }
    ]
}

Entitlements Explained:

  • network with host mode allows the container to bind to ports directly
  • video grants access to webcam devices
  • gpu enables GPU acceleration for faster inference on NVIDIA Jetson devices

Learn more about available entitlements in the App Entitlements guide.

Deploy to WendyOS Device

Make sure your webcam is connected to your WendyOS device, then deploy:

wendy run

The CLI will build the container (which may take a few minutes the first time due to model download), upload it to your device, and start streaming:

wendy run
✔︎ Searching for WendyOS devices [5.0s]
✔︎ Which device do you want to run this app on?: Humble Pepper (wendyos-humble-pepper.local) [USB, LAN]
✔︎ Builder ready [0.1s]
✔︎ Container built and uploaded successfully! [45.2s]
ℹ︎ Preparing app
✔︎ App ready to start [0.1s]
✔ Success
  Started app
INFO:     Started server process [41]
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8100 (Press CTRL+C to quit)

View the Detection Stream

Open your browser and navigate to:

http://wendyos-humble-pepper.local:8100

Replace the hostname: Each WendyOS device has a unique hostname. Replace wendyos-humble-pepper with your device's actual hostname shown in the CLI output.

You should see a live video stream with green bounding boxes drawn around detected people. The interface displays real-time statistics including FPS, latency, and detection count.

Understanding the Architecture

The application uses a producer-consumer pattern:

  1. Camera Capture: OpenCV captures frames from the USB webcam
  2. YOLOv8 Inference: Each frame is processed by the YOLOv8 model
  3. WebSocket Streaming: Frames and detection data are sent as JSON to connected clients
  4. Canvas Overlay: The browser draws bounding boxes on a transparent canvas layer

This architecture allows multiple browser clients to view the same stream without increasing the inference load on the device.

Performance Tuning

You can adjust these parameters in app.py to optimize for your use case:

ParameterDefaultDescription
TARGET_FPS15Target frames per second (lower = less CPU usage)
CONFIDENCE_THRESHOLD0.5Minimum confidence for detections (higher = fewer false positives)
JPEG_QUALITY80Image compression quality (lower = faster streaming, lower quality)
FRAME_WIDTH640Capture resolution width
FRAME_HEIGHT480Capture resolution height

Next Steps

Now that you have real-time object detection running:

  • Swap in a face-specific YOLOv8 model for dedicated face detection
  • Add multiple object class detection (vehicles, animals, etc.)
  • Implement object tracking across frames
  • Add alerts or notifications when specific objects are detected
  • Stream to multiple clients or record detections to storage