I'm writing a program to transcribe audio using CMU Sphinx. I'm not happy with the quality and I thought maybe I could find a better model. But I don't really understand the difference between the models available. There are the models that are in the sphinx4-data jar and then I found this page, https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/, but I don't fully understand what the differences are. And I'm not even sure what files to use.
There is the Accoustic Model, Dictionary and Language Model. I'd like my program to be as general as possible, i.e., to be able to transcribe any speech (English, to start with). What are the best models to use?
2.1m questions
2.1m answers
60 comments
57.0k users