Regarding LM Formats:
AFAIK most Language Models use the ARPA standard for Language Models. Sphinx / CMU language models are compiled into binary format. You'd need the source format to convert a Sphinx LM into another format. Most other Language Models are in text format.
I'd recommend using the HTK Speech Recognition Toolkit ; Detailed Documentation here: http://htk.eng.cam.ac.uk/ftp/software/htkbook_html.tar.gz
Here's also a description of CMU's SLM Toolkit: http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html
Here's an example of a language model in ARPA format I found on the net: http://www.arborius.net/~jphekman/sphinx/full/index.html
You probably want to create an ARPA LM first, then convert it into any binary format if needed.
In General:
To build a language model, you need lots and lots of training data - to determine what the probability of any other word in your vocabulary is, after observing the current input to this point in time.
You can't just "make" a language model by just adding the words you want to recognize - you also need a lot of training data (= typical input you observe when running your speech recognition application).
A Language Model is not just a word list -- it estimates the probability of the next token (word) in the input.
To estimate those probabilities, you need to run a training process, which goes over training data (e.g. historic data), and observes word frequencies there to estimate above mentioned probabilities.
For your problem, maybe as a quick solution, just assume all words have the same frequency / probability.
create a dictionary with the words you want to recognize (N words in dictionary)
create a language model which has 1/N as the probability for each word (uni-gram language model)
you can then interpolate that uni-gram language model (LM) with another LM for a bigger corpus using HTK Toolkit
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…