I have wanted to write a simple script for showing the spectrum of system sound, but never got to pass getting system sound. A few days ago, I tried PyAudio again, it looked promising. It wasnt working, probably because I didnt configure my system correctly.

Note

The video has been remade and the code is slightly changed in order to have figure at exact size. (2015-10-20T04:11:19Z)

1   Video and Code

After few tests and tweaks, here is a video:

https://i.ytimg.com/vi/hiGB_AP6iTo/maxresdefault.jpg

Yes, I am sure there are some programs do those, but I just want to write a small one by my own, even I dont plan to use it or to improve it.

You can get the code on Gist.

2   Explanation

First is the variables:

SAVE = 0.0
TITLE = ''
FPS = 25.0

nFFT = 512
BUF_SIZE = 4 * nFFT
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100

Most of them should be easy to guess what they are for. SAVE is for saving video and audio files, if its >0, then the value is how long, in seconds, will be saved. FPS is the frame per second of the graph.

def main():

  fig = plt.figure()

  x_f = 1.0 * np.arange(-nFFT / 2 + 1, nFFT / 2) / nFFT * RATE
  ax = fig.add_subplot(111, title=TITLE, xlim=(x_f[0], x_f[-1]),
                       ylim=(0, 2 * np.pi * nFFT**2 / RATE))
  ax.set_yscale('symlog', linthreshy=nFFT**0.5)

  line, = ax.plot(x_f, np.zeros(nFFT - 1))

This part, A figure is initialized. The frequency range of the result of FFT is calculated, for charting. Since the sampling RATE is 44,100 Hz, so, the frequency is up to 22,050 Hz. Because two channels, left and right, are plotted, I use [-22,050, 22,050], so it can be plotted as X-axis. For Y-axis, its in symlog scale, so small values will be plotted in linear scale while others still in logarithmic scale.

def change_xlabel(evt):
  labels = [label.get_text().replace(u'\u2212', '')
            for label in ax.get_xticklabels()]
  ax.set_xticklabels(labels)
  fig.canvas.mpl_disconnect(drawid)
drawid = fig.canvas.mpl_connect('draw_event', change_xlabel)

In this part, the labels in X-axis are modified, the minus sign is removed, so it will look like all positive numbers as it should be. Its kind of tricky, I cant find reliable way to get initial labels. I have to hook up the draw_event to make sure the labels is generated already before updating them.

p = pyaudio.PyAudio()
MAX_y = 2.0**(p.get_sample_size(FORMAT) * 8 - 1)

frames = None
wf = None
if SAVE:
  frames = int(FPS * SAVE)
  wf = wave.open('temp.wav', 'wb')
  wf.setnchannels(CHANNELS)
  wf.setsampwidth(p.get_sample_size(FORMAT))
  wf.setframerate(RATE)

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=BUF_SIZE)

ani = animation.FuncAnimation(fig, animate, frames,
    init_func=lambda: init(line), fargs=(line, stream, wf, MAX_y),
    interval=1000.0/FPS, blit=True)

Initializations of PyAudio, wave, and graph. MAX_y is only used for normalizing signal. If use paFloat32, then its already in range of [-1, 1]. Because of the function of wave saving, paInt16 will be easier to code with.

For graph, Matplotlib animation is used, its an easy way to generate animated graph and you can even save video file. animate and init functions will be explained soon.

if SAVE:
  ani.save('temp.mp4', fps=FPS)
else:
  plt.show()

stream.stop_stream()
stream.close()
p.terminate()

if SAVE:
  wf.close()

The graph is started. As you can say, its either saving video or showing the graph. I cant find a way to have them both, that is making encoding on-the-fly. You can call plt.show() afterwards, but that will be just a replay.

The rest is the clean-up, properly close everything.

def init(line):

  line.set_ydata(np.zeros(nFFT - 1))
  return line,

Its important to have init for animation, or the first generated frame by animate function will be used as clear frame. Setting data to all zeros, so the frame will be clean.

def animate(i, line, stream, wf, MAX_y):

  N = max(stream.get_read_available() / nFFT, 1) * nFFT
  data = stream.read(N)
  if SAVE:
    wf.writeframes(data)

  y = np.array(struct.unpack("%dh" % (N * CHANNELS), data)) / MAX_y
  y_L = y[::2]
  y_R = y[1::2]

  Y_L = np.fft.fft(y_L, nFFT)
  Y_R = np.fft.fft(y_R, nFFT)

  Y = abs(np.hstack((Y_L[-nFFT/2:-1], Y_R[:nFFT/2])))

  line.set_ydata(Y)
  return line,

animate generates a frame. It read frames from audio stream, nFFT as a unit, and read as many units as frames available, or at least one unit. Because FPS will not match RATE / nFFT, there will be some leftover data, as long as it reaches a unit, it will be grabbed.

Data is unpacked after, which is byte string, normalize it, split them into two lists, then run FFT. After that, combine them into one list, so it can be plotted along with x_f as X-axis.

3   Performance

The FPS is 25 frames per seconds and it looks smooth, however when I use save(), Matplotlib can only generate 3.25 FPS of PNG images, via figure save function. Matplotlib saves all the frames then uses FFmpeg to encode them into one video file. I have modified the code to read a wave file, so it dont have to do it in real-time.

There might be some way to improve figure save function, but this is all for now.