Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
355 views
in Technique[技术] by (71.8m points)

iphone - Building openears compatible language model

I am doing some development on speech to text and text to speech and I found the OpenEars API very useful.

The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. So I decided to find a big English language model to feed the API speech recognizer engine. But I failed to understand the format of the voxfourge english data model to use with OpenEars.

Do anyone have any idea that how can I get the .languagemodel and .dic file for English language to work with OpenEars?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Regarding LM Formats:

AFAIK most Language Models use the ARPA standard for Language Models. Sphinx / CMU language models are compiled into binary format. You'd need the source format to convert a Sphinx LM into another format. Most other Language Models are in text format.

I'd recommend using the HTK Speech Recognition Toolkit ; Detailed Documentation here: http://htk.eng.cam.ac.uk/ftp/software/htkbook_html.tar.gz

Here's also a description of CMU's SLM Toolkit: http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html

Here's an example of a language model in ARPA format I found on the net: http://www.arborius.net/~jphekman/sphinx/full/index.html

You probably want to create an ARPA LM first, then convert it into any binary format if needed.

In General:

To build a language model, you need lots and lots of training data - to determine what the probability of any other word in your vocabulary is, after observing the current input to this point in time.

You can't just "make" a language model by just adding the words you want to recognize - you also need a lot of training data (= typical input you observe when running your speech recognition application).

A Language Model is not just a word list -- it estimates the probability of the next token (word) in the input. To estimate those probabilities, you need to run a training process, which goes over training data (e.g. historic data), and observes word frequencies there to estimate above mentioned probabilities.

For your problem, maybe as a quick solution, just assume all words have the same frequency / probability.

  1. create a dictionary with the words you want to recognize (N words in dictionary)

  2. create a language model which has 1/N as the probability for each word (uni-gram language model)

you can then interpolate that uni-gram language model (LM) with another LM for a bigger corpus using HTK Toolkit


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...