The science behind RehabVision

An explainable AI platform for pervasive, personalised knee rehabilitation with real-time biofeedback from consumer video — grounded in peer-reviewed research by Martin Štufi, Ph.D. (Solutia) and Boris Bačić, Ph.D. (AUT).

Featured publication

RehabVision: An Explainable AI Platform for Pervasive, Personalised Knee Rehabilitation with Real-Time Biofeedback from Consumer Video

Martin Štufi, Ph.D. and Boris Bačić, Ph.D.

ICIST 2026 — International Conference on Information Science and Technology

The platform integrates six synergistic innovations: an AI Motion Analyser that extracts kinematic features from monocular video without markers; confidence-aware multi-view evaluation; adaptive therapy personalisation; an explainable-by-design architecture providing interpretable rationales; healthcare-compliant governance; and real-time biofeedback at 15–30 FPS.

Read the paper

Research team

Led by two principal investigators across Europe and New Zealand

MŠ

Martin Štufi, Ph.D.

Principal Investigator · Solutia s.r.o.

Prague, Czech Republic

Project coordinator for RehabVision under the TWIST programme. Leads the platform engineering, clinical integration, and deployment strategy. Responsible for the ICIST 2026 paper, system architecture, and the end-to-end validation pipeline.

https://www.stufi.cz

Prof. Boris Bačić, Ph.D.

Research Partner · Auckland University of Technology

Auckland, New Zealand

Key research partner and co-author. Leads the computer-vision and AI methodology at AUT, including the reference pose-estimation pipeline described in arXiv 2412.20733. Brings deep expertise in markerless motion analysis for sport and rehabilitation.

https://academics.aut.ac.nz/boris.bacic

Auckland University of Technology (AUT) — international research partner for the RehabVision AI pipeline

Research questions

Four questions spanning validation and clinical applicability

RQ1

Kinematic accuracy

What measurement accuracy (MAE, RMSE, Pearson correlation) can the monocular CV pipeline achieve for joint flexion/extension angles compared to goniometric reference across front-view and side-view configurations?

RQ2

Real-time performance

What end-to-end latency and throughput (FPS) are achievable on consumer hardware to support real-time biofeedback within clinically acceptable thresholds (< 100 ms)?

RQ3

Movement quality detection

How reliably can confidence-aware, view-specific rules detect clinically relevant issues (insufficient ROM < 60°, valgus deviation > 15°, irregular cadence, knee-past-toe violations) compared to physiotherapist annotations?

RQ4

Environmental robustness

Under home-use conditions (lighting 50–1000 lux, distance 1–3 m, partial occlusions), which factors most significantly affect tracking reliability?

Preliminary results

Validated across five subjects, 150 repetitions

Kinematic accuracy

3.2° MAE

Side-view MAE = 3.2°, RMSE = 3.9°, r = 0.94. Front-view MAE = 4.7°, RMSE = 5.4°, r = 0.89. Side-view approaches clinical threshold (< 5°).

Real-time performance

25–30FPS

Inference latency 12–18 ms per frame; effective FPS: 25–30 (locked), scalable to 60 FPS. Memory < 150 MB; CPU: 45% at 720p@30FPS — all within < 100 ms threshold.

Movement quality

0.87F1

ROM detector F1 = 0.87; valgus F1 = 0.84; KPT Guard F1 = 0.89; overall κ = 0.74 (substantial agreement).

Environmental robustness

150–800lux

Reliable tracking within 150–800 lux and 1.5–2.5 m distance. Performance degrades gracefully outside this envelope.

Platform details

Platform architecture

RehabVision implements a container-based microservices architecture comprising five components: (1) Computer Vision Service processing monocular video using MediaPipe Pose or Google ML Kit; (2) AI Motion Analysis Service applying interpretable rule-based detectors; (3) Clinical Backend API orchestrating workflows; (4) Web/Mobile Frontend rendering real-time biofeedback; and (5) AR/VR Runtime for future immersive guidance.

Privacy by design

Raw video never leaves the patient's device without explicit informed consent. Processing produces a stick-figure overlay with no faces or identifiable features. Session output is a CSV time series (schema v0.1) plus the anonymised overlay MP4 — no PII in file names, metadata, or telemetry.

What we measure

Per session: range of motion (ROM = maxAngle − minAngle), repetition count, mean and maximum knee angle, time in target band, frame-level exercise correctness (F1 > 0.83), and a confidence continuity check. Mean absolute errors below 5° for knee angles with latency of 12–18 ms per frame.

Current scope — knee PoC

Phase 1 (F1) covers a knee-focused proof of concept with four rule-based detectors: Insufficient ROM (< 60°), Irregular Cadence (CV > 0.4), Valgus Deviation (> 15°), and Knee-Past-Toe Guard. Extension to hip, ankle, and shoulder joints is planned for Phase 2.