Speech AI | Skelter Labs
Speech AI
Speech recognition and synthesis
Skelter Labs’ speech recognition engine guarantees high accuracy even for complex sentences that are spoken by multiple people or that are poorly structured. Speech AI can use speech synthesis to learn the voice of a specific person, or create a new voice to convert the text into a natural voice. Through this technology, a text-based interface is converted to a voice-based interface, and customer communication is automated.
Speech AI
Speech recognition and synthesis
Skelter Labs’ speech recognition engine guarantees high accuracy even for complex sentences that are spoken by multiple people or that are poorly structured. Speech AI can use speech synthesis to learn the voice of a specific person, or create a new voice to convert the text into a natural voice. Through this technology, a text-based interface is converted to a voice-based interface, and customer communication is automated.

Speech-to-Text
Voices flowing in from various environments are accurately converted to text. Specialized for noisy public places such as theaters, coffee shops and other public facilities, Skelter Labs’ speech recognition engine boasts the highest accuracy in environment-specific tests conducted while a video clip is played or during a phone call, compared to its competitors.
The speech recognition engine applying an end-to-end learning method based on deep learning has greatly improved the accuracy of speech recognition in difficult environments involving multiple speakers and inaccurate pronunciation. The engine has been commercialized as an API, and additionally provides customized learning for various specialized fields, such as finance and communication services. Skelter Labs is currently focusing on speaker diarization, voice filter and speaker separation technologies, exploring the unprecedented possibilities of utilizing speech recognition technology.

Speech-to-Text
Voices flowing in from various environments are accurately converted to text. Specialized for noisy public places such as theaters, coffee shops and other public facilities, Skelter Labs’ speech recognition engine boasts the highest accuracy in environment-specific tests conducted while a video clip is played or during a phone call, compared to its competitors.
The speech recognition engine applying an end-to-end learning method based on deep learning has greatly improved the accuracy of speech recognition in difficult environments involving multiple speakers and inaccurate pronunciation. The engine has been commercialized as an API, and additionally provides customized learning for various specialized fields, such as finance and communication services. Skelter Labs is currently focusing on speaker diarization, voice filter and speaker separation technologies, exploring the unprecedented possibilities of utilizing speech recognition technology.

Text-to-Speech
Text is recognized and delivered as a human-like voice with natural intonation and pronunciation. Skelter Labs has created two adult female voices and one adult male voice, and supports the creation of similar voices based on recorded data.
Skelter Labs intends to commercialize a more natural speech synthesis engine based on the GAN-based vocoder. In addition, a voice font service that can be produced with little training data is currently in research and development.
* Speech synthesis engine is provided as a part of solution, and not being sold solely

Text-to-Speech
Text is recognized and delivered as a human-like voice with natural intonation and pronunciation. Skelter Labs has created two adult female voices and one adult male voice, and supports the creation of similar voices based on recorded data.
Skelter Labs intends to commercialize a more natural speech synthesis engine based on the GAN-based vocoder. In addition, a voice font service that can be produced with little training data is currently in research and development.
* Speech synthesis engine is provided as a part of solution, and not being sold solely