AI-Enhanced Real-Time Translation Systems in Mixed-Reality Spaces
DOI:
https://doi.org/10.63345/Keywords:
Mixed Reality Translation, Neural Machine Translation, Real Time AI, User Study, Latency, Word Error RateAbstract
Real‑time translation within mixed‑reality (MR) spaces holds transformative potential for cross‑lingual collaboration, education, and social interaction by enabling participants to converse naturally without language barriers. Traditional approaches rely on tethered devices or mobile applications that disrupt immersion. In this work, we present an AI‑enhanced MR translation system that integrates optimized neural machine translation (NMT) models with head‑mounted displays (HMDs) and edge‑cloud inference. We detail the system’s architecture—including on‑device audio capture, low‑latency streaming to an edge server, and subtitle rendering in the user’s field of view—and describe model compression and streaming strategies that balance translation quality against computational constraints. To evaluate the approach, we conducted a within‑subjects user study with thirty bilingual participants across four language pairs (English–Spanish, English–Mandarin, English–Arabic, English–Hindi). Each participant completed six conversational tasks using both a baseline NMT‑MR prototype and the AI‑enhanced system. We measured translation accuracy via word error rate (WER), end‑to‑end latency from speaker utterance to subtitle display, and user satisfaction through Likert‑scale questionnaires. The AI‑enhanced system yielded a 15.2 percentage‑point reduction in WER (28.5% → 13.3%), a 0.35 s latency decrease (1.12 s → 0.77 s), and a 22.6% increase in satisfaction ratings (3.1 → 3.8 on a 5‑point scale). Paired t‑tests confirmed significance (p < .001) with large effect sizes. We discuss design guidelines for deploying real‑time NMT in MR, including trade‑offs in model size, streaming granularity, and network reliability. Finally, we outline future directions: fully on‑device quantized inference, support for additional low‑resource languages, and multimodal translations incorporating gesture and visual context.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.