I have wanted to write a simple script for showing the spectrum of system sound, but never got to pass getting system sound. A few days ago, I tried PyAudio again, it looked promising. It wasnt working, probably because I didnt configure my system correctly.
Note
The video has been remade and the code is slightly changed in order to have figure at exact size. (2015-10-20T04:11:19Z)
Contents
1 Video and Code
After few tests and tweaks, here is a video:
Yes, I am sure there are some programs do those, but I just want to write a small one by my own, even I dont plan to use it or to improve it.
You can get the code on Gist.
2 Explanation
First is the variables:
SAVE = 0.0 TITLE = '' FPS = 25.0 nFFT = 512 BUF_SIZE = 4 * nFFT FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 44100
Most of them should be easy to guess what they are for. SAVE is for saving video and audio files, if its >0, then the value is how long, in seconds, will be saved. FPS is the frame per second of the graph.
def main(): fig = plt.figure() x_f = 1.0 * np.arange(-nFFT / 2 + 1, nFFT / 2) / nFFT * RATE ax = fig.add_subplot(111, title=TITLE, xlim=(x_f[0], x_f[-1]), ylim=(0, 2 * np.pi * nFFT**2 / RATE)) ax.set_yscale('symlog', linthreshy=nFFT**0.5) line, = ax.plot(x_f, np.zeros(nFFT - 1))
This part, A figure is initialized. The frequency range of the result of FFT is calculated, for charting. Since the sampling RATE is 44,100 Hz, so, the frequency is up to 22,050 Hz. Because two channels, left and right, are plotted, I use [-22,050, 22,050], so it can be plotted as X-axis. For Y-axis, its in symlog scale, so small values will be plotted in linear scale while others still in logarithmic scale.
def change_xlabel(evt): labels = [label.get_text().replace(u'\u2212', '') for label in ax.get_xticklabels()] ax.set_xticklabels(labels) fig.canvas.mpl_disconnect(drawid) drawid = fig.canvas.mpl_connect('draw_event', change_xlabel)
In this part, the labels in X-axis are modified, the minus sign is removed, so it will look like all positive numbers as it should be. Its kind of tricky, I cant find reliable way to get initial labels. I have to hook up the draw_event to make sure the labels is generated already before updating them.
p = pyaudio.PyAudio() MAX_y = 2.0**(p.get_sample_size(FORMAT) * 8 - 1) frames = None wf = None if SAVE: frames = int(FPS * SAVE) wf = wave.open('temp.wav', 'wb') wf.setnchannels(CHANNELS) wf.setsampwidth(p.get_sample_size(FORMAT)) wf.setframerate(RATE) stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=BUF_SIZE) ani = animation.FuncAnimation(fig, animate, frames, init_func=lambda: init(line), fargs=(line, stream, wf, MAX_y), interval=1000.0/FPS, blit=True)
Initializations of PyAudio, wave, and graph. MAX_y is only used for normalizing signal. If use paFloat32, then its already in range of [-1, 1]. Because of the function of wave saving, paInt16 will be easier to code with.
For graph, Matplotlib animation is used, its an easy way to generate animated graph and you can even save video file. animate and init functions will be explained soon.
if SAVE: ani.save('temp.mp4', fps=FPS) else: plt.show() stream.stop_stream() stream.close() p.terminate() if SAVE: wf.close()
The graph is started. As you can say, its either saving video or showing the graph. I cant find a way to have them both, that is making encoding on-the-fly. You can call plt.show() afterwards, but that will be just a replay.
The rest is the clean-up, properly close everything.
def init(line): line.set_ydata(np.zeros(nFFT - 1)) return line,
Its important to have init for animation, or the first generated frame by animate function will be used as clear frame. Setting data to all zeros, so the frame will be clean.
def animate(i, line, stream, wf, MAX_y): N = max(stream.get_read_available() / nFFT, 1) * nFFT data = stream.read(N) if SAVE: wf.writeframes(data) y = np.array(struct.unpack("%dh" % (N * CHANNELS), data)) / MAX_y y_L = y[::2] y_R = y[1::2] Y_L = np.fft.fft(y_L, nFFT) Y_R = np.fft.fft(y_R, nFFT) Y = abs(np.hstack((Y_L[-nFFT/2:-1], Y_R[:nFFT/2]))) line.set_ydata(Y) return line,
animate generates a frame. It read frames from audio stream, nFFT as a unit, and read as many units as frames available, or at least one unit. Because FPS will not match RATE / nFFT, there will be some leftover data, as long as it reaches a unit, it will be grabbed.
Data is unpacked after, which is byte string, normalize it, split them into two lists, then run FFT. After that, combine them into one list, so it can be plotted along with x_f as X-axis.
3 Performance
The FPS is 25 frames per seconds and it looks smooth, however when I use save(), Matplotlib can only generate 3.25 FPS of PNG images, via figure save function. Matplotlib saves all the frames then uses FFmpeg to encode them into one video file. I have modified the code to read a wave file, so it dont have to do it in real-time.
There might be some way to improve figure save function, but this is all for now.
I was browsing your blog posts, and this interested me.
ReplyDeleteSo you dont use scipy.io to read wave file, but you use wave module.
From my past experiences (just couple of times) with numpy/scipy/mpl I found that IO is main show stopper. If I need to analyze real audio track, and not just 2sec sample, its all very slow, CPU intensive and memory hungry. I guess I should try your code and deduce the answer, but maybe you wont mind commenting?
Did you also try something like numba, i.e. http://continuum.io/blog/simple-wave-simulation-with-numba-and-pygame?
Numba is very fast, same timing as cython, but much easier, although I havent tried above wave simulation, but its on my todo together with your example. Hopefully I can learn some trick to read real tracks in the future without nag
I have never used scipy.io or numba.
ReplyDeleteThe thing is numpy usage in my code isnt heavy, because there are only two fft() calls, the rest is just simple operations. The bottleneck is the frame saving. If you just run my script, not using SAVE, the FPS is fine. 25 fps no problem with my computer for watching animation, but when you try to save, 3.25 fps.
The code in that link doesnt seem to cost much, but 2 fps is really slow.