Multimodal speech- and body gesture-based text input system ... with AEGIS input

At this link, you can download a paper (use the "View on wise..." link) on SpeeG, which combines a modified version of Dasher, Sphinx and Microsoft's Kinect sensor to create a system for multimodal input. See also this link.

SpeeG is a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. The controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface (AEGIS outcome) combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, the innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.