The aubio libraries have been wrapped with SWIG and can thus be used by Python. Among their many features include several methods for pitch detection/estimation including the YIN algorithm and some harmonic comb algorithms.
However, if you want something simpler, I wrote some code for pitch estimation some time ago and you can take it or leave it. It won't be as accurate as using the algorithms in aubio, but it might be good enough for your needs. I basically just took the FFT of the data times a window (a Blackman window in this case), squared the FFT values, found the bin that had the highest value, and used a quadratic interpolation around the peak using the log of the max value and its two neighboring values to find the fundamental frequency. The quadratic interpolation I took from some paper that I found.
It works fairly well on test tones, but it will not be as robust or as accurate as the other methods mentioned above. The accuracy can be increased by increasing the chunk size (or reduced by decreasing it). The chunk size should be a multiple of 2 to make full use of the FFT. Also, I am only determining the fundamental pitch for each chunk with no overlap. I used PyAudio to play the sound through while writing out the estimated pitch.
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…