Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
721 views
in Technique[技术] by (71.8m points)

signal processing - convert sound to list of phonemes in python

How do I convert any sound signal to a list phonemes?

I.e the actual methodology and/or code to go from a digital signal to a list of phonemes that the sound recording is made from.
eg:

lPhonemes = audio_to_phonemes(aSignal)

where for example

from scipy.io.wavfile import read
iSampleRate, aSignal = read(sRecordingDir)

aSignal = #numpy array for the recorded word 'hear'
lPhonemes = ['HH', 'IY1', 'R']

I need the function audio_to_phonemes

Not all sounds are language words, so I cannot just use something that uses the google API for example.

Edit
I don't want audio to words, I want audio to phonemes. Most libraries seem to not output that. Any library you recommend needs to be able to output the ordered list of phonemes that the sound is made up of. And it needs to be in python.

I would also love to know how the process of sound to phonemes works. If not for implementation purposes, then for interest sake.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Accurate phoneme recognition is not easy to archive because phonemes itself are pretty loosely defined. Even in good audio the best possible systems today have about 18% phoneme error rate (you can check LSTM-RNN results on TIMIT published by Alex Graves).

In CMUSphinx phoneme recognition in Python is done like this:

from os import environ, path

from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

MODELDIR = "../../../model"
DATADIR = "../../../test/data"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-allphone', path.join(MODELDIR, 'en-us/en-us-phone.lm.dmp'))
config.set_float('-lw', 2.0)
config.set_float('-beam', 1e-10)
config.set_float('-pbeam', 1e-10)

# Decode streaming data.
decoder = Decoder(config)

decoder.start_utt()
stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')
while True:
  buf = stream.read(1024)
  if buf:
    decoder.process_raw(buf, False, False)
  else:
    break
decoder.end_utt()

hypothesis = decoder.hyp()
print ('Phonemes: ', [seg.word for seg in decoder.seg()])

You need to checkout latest pocketsphinx from github in order to run this example. Result should look like this:

  ('Best phonemes: ', ['SIL', 'G', 'OW', 'F', 'AO', 'R', 'W', 'ER', 'D', 'T', 'AE', 'N', 'NG', 'IY', 'IH', 'ZH', 'ER', 'Z', 'S', 'V', 'SIL'])

See also the wiki page


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...