A single, fully integrated architecture
The modular Embedded ViaVoice architecture provides fully integrated, automatic speech recognition, speech synthesis through text-to-speech (TTS) and other technology engines supporting the full-feature requirements of an application with minimal processor utilization and memory requirements. A single architecture with consistent application interfaces enables Embedded ViaVoice to support solutions from low-resource Personal Navigation Devices to high-performance in-vehicle solutions to Java™ technology. This single-architecture implementation is a particular advantage to applications that need to span a broad range of platform capacities, as well as solutions where significant growth in capacity is a requirement.
A broad language base
Embedded ViaVoice is available in a broad set of languages to provide speech-recognition and speech-synthesis capabilities through the support of a worldwide network of IBM speech research and development laboratories. High-quality embedded concatenative TTS (eCTTS) capabilities provide more-human-sounding speech synthesis to support more-advanced applications. To learn more about IBM's continuing development of other language models for ASR and voices for TTS, as well as its continuous improvement of existing languages, contact your IBM representative.
High recognition accuracy
The Embedded ViaVoice recognition engine is based on small units of speech, called phonemes. This phoneme-based model uses finite state grammars to support highly accurate and noise-robust, continuous speech recognition. Through a comprehensive and vigorous research and development effort, IBM has significantly reduced the word-error rate of Embedded ViaVoice over the past several years.
Large vocabulary recognition
The maximum vocabulary supported by Embedded ViaVoice has grown by a factor of 25 over the past four years. Embedded ViaVoice supports the recognition of lists of a virtually limitless number of words, bounded only by the platform's processing and memory resources.
Services and workshops
Porting and integration services include porting to a new operating system, recompiling for different processor architecture or modifying the embedded audio layer to use a new driver or codec. Alternatively, with a device adaptation kit, IBM supplies the tools that enable you to perform and test the audio adaptations yourself.
IBM can provide on-site classes to application developers about the Embedded ViaVoice Software Developer Kit (SDK). Customized development workshops are also available to provide skills transfer and instruction on application development, evaluation methodology and tools, so you can design and tune your own system to suit your organization's business needs.
Support for multiple programming models
Many small-footprint, embedded applications use Embedded ViaVoice through its C/C++ language application interface.
IBM expertise in voice
IBM's sustained research and development investment in speech recognition and synthesis for more than 30 years has resulted in multiple advances, including Embedded ViaVoice. IBM Embedded ViaVoice software enables you to gain competitive advantage in today's fast-moving marketplace - and offers a clear path to future growth through a single, fully integrated architecture.
Functionality
Portable, event-driven architecture
Fully integrated automatic speech recognition (ASR) and text-to-speech (TTS)
Low processor utilization
Small static and dynamic footprint
Scalable, modular architecture
Single-threading and multithreading support
Runtime event notification
Unsupervised adaptation to speakers
Optional speaker enrollment
Phoneme-based
Speaker-independent
Accuracy and robustness
Very large vocabulary recognition, exceeding 200 000 spoken words in real time
Freeform commands combining statistical language models and semantic interpretation
Tunable rejection to address nonspeech sounds and out-of-vocabulary words
Advanced front-end noise suppression
Support for vendor-supplied noise suppression
Enhanced speech and silence detection
Continuous and discrete digit recognition
Spell-mode capable
Word and phrase confidence scoring
Detection and adaptation for gender
Pronunciation confusability reporting
N-Best and homonym support
Grammar weights
Solution-development tools
Eclipse technology-based IBM Embedded Voice Toolkit, Version 6.0, including a customized integrated development environment (IDE) for embedded speech developers
Application-creation wizards
Grammar editor and templates
Vocabulary testing and analysis
Pronunciation compiler and variant generator
Gain-control tuning tool
Tracing and debugging interface
Device adaptation kit
Flexibility
Broad language coverage
Additional languages in development
JSAPI and extensions
Automatic gain adjustment
Multiple listening modes, including push to talk, push to activate and always listening
Run-time language switching
Run-time pronunciation manipulation
Scalable acoustic models
11/16/22kHz sampling rates
Signal-to-noise (SNR) feedback
Voice tags from text or acoustic input
Embedded baseform generation
Grammar and compiler support
Scalable vocabulary support
Built-in grammar compiler
Finite state grammars
Multiple grammar formats, including Speech Recognition Grammar Specification (SRGS), Backus-Naur Format (BNF) and Java Speech Grammar Format (JSGF)
Annotations
Statistical language models
Dynamic and unlimited vocabularies
Precompiled and runtime grammars
Speech synthesis (TTS)
Unlimited pronunciation domain
Multiple voices
Customizable voices
Dictionary support
Indexing support, and pause-and-resume capabilities
Adjustable performance-tuning parameters
API for phoneme generation
Manual override of automatic synthesis
SSML support
Processors currently supported*
Hitachi SH4
Motorola PowerPC
IBM PowerPC® processor
Intel® x86
Intel StrongARM
Intel XScale
Blackfin 539 DSP
MIPS
*Others can be added based on customer requirements.
Operating Systems Supported
Windows XP
Windows 2000
Windows CE / Windows Mobile
QNX
Linux
Embedded Linux
T-Engine
MicroItron
VxWorks
RTXC
Languages Offered
Automatic Speech Recognition (ASR)
US English
North American Spanish
Canadian French
UK English
French
Italian
German
Spanish
Dutch
Japanese
Mandarin Chinese
European Portuguese
Swedish
Korean
Concatenative Text-to-speech (eCTTS)
US English
North American Spanish
Canadian French
UK English
German
French
Italian
Spanish
Japanese
Dutch
Formant Text-to-speech
US English
North American Spanish
Canadian French
UK English
German
French
Italian
Spanish
Japanese
Dutch
Simplified Chinese
Brazilian Portuguese
Korean
Traditional Chinese

