In the mid to late 1990s, personal computers started to become powerful enough to enable users to speak to them and for the computers to speak back


Speech Recognition Using the Microsoft Speech Application Platform


Download 44.5 Kb.
bet2/2
Sana28.10.2023
Hajmi44.5 Kb.
#1732140
1   2
Bog'liq
applying computer

Speech Recognition Using the Microsoft Speech Application Platform


Speech recognition using the Microsoft Speech Application Platform (Speech Platform) is a process with two distinct phases.
The first phase involves recording the user's speech and delivering the audio recording and an application grammar to an SR engine, which converts the speech in the audio recording to text as described earlier. The route through which the audio and grammar are delivered to the speech recognition engine differs slightly depending on the client that calls the speech application. In a Telephony Scenario and in a Windows Mobile-based Pocket PC 2003 (Pocket PC) Multimodal Scenario, the audio and grammar are sent to the Speech Engine Services (SES) component of Microsoft Speech Server (MSS). SES loads the application grammar, and then sends the audio and grammar to the Speech API (SAPI). SAPI parses the grammar into the appropriate rules, properties and phrases, and passes the parsed grammar and audio to an available SR engine, which performs the actual recognition work. In a Desktop Multimodal Scenario, the Speech Add-in for Microsoft Internet Explorer (Speech Add-in) loads the grammar, and then instantiates a shared SAPI SR engine. The Speech Add-in passes the grammar and audio to SAPI, SAPI parses the grammar, and passes the parsed grammar and audio to the SR engine, which then performs the recognition.
The second phase involves semantic analysis of the resulting recognition text in order to determine its meaning. The recognizer iteratively compares the recognition text to the rules in the application's grammar. When the recognizer matches recognized text to a series of rules in a grammar, the recognizer produces an XML output stream, using Semantic Markup Language (SML), to represent semantic output. The semantic output contains recognition confidence values, recognized text, and can also contain semantic values that the developer assigns using semantic interpretation markup. Developers use the information in the SML output to infer the meaning of what the user said.

A Simple Example Grammar


The following grammar is a simple English-language grammar for vehicle trading. The root rule, called ruleVTrade, defines the structure of the sentence or phrase that can be recognized using this grammar. It defines optional words and phrases like "I want" and "please," and references two other grammar rules. The first referenced rule is called ruleAction, and it accepts words like "buy" and "sell." The second rule, ruleVehicle, accepts the words "car," "auto" and "truck."
Download 44.5 Kb.

Do'stlaringiz bilan baham:
1   2




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling