Spectrogram plugin

The Spectrogram plugin renders a frequency-over-time heat map below (or alongside) the waveform. It’s useful when the waveform alone isn’t enough — for inspecting pitch, harmonics, noise, or speech formants. Each vertical column represents one FFT window; color encodes intensity so you can see which frequencies are present at each moment in the audio.

Setup

Install wavesurfer.js if you haven’t already, then import and register the plugin:

import WaveSurfer from 'wavesurfer.js'
import SpectrogramPlugin from 'wavesurfer.js/dist/plugins/spectrogram.esm.js'

const ws = WaveSurfer.create({
  container: '#waveform',
  url: '/audio/demo.mp3',
  plugins: [
    SpectrogramPlugin.create({
      container: '#spectrogram',
      labels: true,
      scale: 'mel',
    }),
  ],
})

The container option accepts a CSS selector string or an HTMLElement. If you omit it, the spectrogram is appended directly inside the wavesurfer wrapper element.

See the live Spectrogram example to experiment with the options in the browser.

Options

Option Type Default Description
container string | HTMLElement wavesurfer wrapper Where to render the spectrogram.
fftSamples number 512 Number of samples per FFT window. Must be a power of 2. Controls frequency resolution — higher values give finer frequency bins but coarser time resolution, and also increase compute time.
height number 200 Height of the spectrogram in CSS pixels.
frequencyMin number 0 Lowest frequency (Hz) to display.
frequencyMax number half of sample rate Highest frequency (Hz) to display. Set to sampleRate / 2 to show the full range.
scale string 'mel' Frequency axis scale. See accepted values below.
gainDB number 20 Brightness offset in dB. Increase if the display looks too dark; decrease if it looks washed out.
rangeDB number 80 Dynamic range in dB. Signals more than this many dB below the gain threshold are rendered as black.
colorMap number[][] | 'gray' | 'igray' | 'roseus' built-in Color palette. Pass a named preset ('gray', 'igray', 'roseus') or a 256-entry array of [r, g, b, alpha] floats in the 0–1 range.
labels boolean false Show frequency labels on the left edge.
labelsBackground string transparent Background fill for the labels overlay canvas.
labelsColor string '#fff' Color for frequency number labels.
labelsHzColor string same as labelsColor Color for Hz/kHz unit labels.
splitChannels boolean false Render a separate spectrogram row for each audio channel.
windowFunc string 'hann' FFT window function. Accepted values: 'bartlett', 'bartlettHann', 'blackman', 'cosine', 'gauss', 'hamming', 'hann', 'lanczoz', 'rectangular', 'triangular'.
noverlap number auto Overlap between consecutive FFT windows in samples. Must be less than fftSamples. Auto-calculated from the canvas width if omitted.
useWebWorker boolean false Offload FFT calculations to a Web Worker to keep the main thread responsive.

Scale accepted values

The scale option controls how frequencies are distributed along the vertical axis:

  • 'linear' — equal Hz spacing from bottom to top.
  • 'logarithmic' — logarithmic Hz spacing; mirrors how audio editors like Audacity present frequency.
  • 'mel' — Mel scale, based on pitch perception; the default.
  • 'bark' — Bark psychoacoustical scale.
  • 'erb' — Equivalent Rectangular Bandwidth scale.

fftSamples and image size

fftSamples controls frequency resolution, not the pixel dimensions of the rendered image. The plugin resamples the computed FFT data to fit the current canvas width, so changing fftSamples will not scale the spectrogram wider or taller — use height for vertical size and the wavesurfer minPxPerSec option for horizontal zoom.

For speech or music, scale: 'mel' with fftSamples: 512 is a good starting point. For detailed pitch analysis, try scale: 'logarithmic' with fftSamples: 2048.

Performance

Spectrogram rendering is CPU-intensive. The plugin runs an FFT over every window of audio and then resamples the result to match the current pixel width. For long files this can take a noticeable amount of time, and zooming forces a re-render.

Zooming with large files can freeze the browser tab. Each zoom change triggers a full re-render of the spectrogram. With a 10-minute file and fftSamples: 2048, the FFT pass alone can take several seconds on the main thread. Users have reported the tab becoming unresponsive during zoom.

Practical recommendations:

  • Keep fftSamples reasonable. Values of 256 or 512 are usually sufficient and render much faster than 2048 or 4096.
  • Work with shorter files. Consider splitting long recordings into segments before rendering a spectrogram, or only load the portion the user is viewing.
  • Pre-decode waveform peaks. If you supply pre-computed waveform peaks to wavesurfer, it skips the audio decode step on load — the spectrogram still needs to decode, but the waveform appears immediately. See Pre-decoded peaks.
  • Enable the Web Worker. Set useWebWorker: true to move FFT computation off the main thread. The spectrogram will appear slightly later, but the page stays responsive.
SpectrogramPlugin.create({
  fftSamples: 512,        // keep low for performance
  useWebWorker: true,     // non-blocking FFT
  scale: 'mel',
})

Live microphone spectrogram

You can feed live microphone audio through the spectrogram by combining the Record plugin with wavesurfer’s Web Audio capabilities. The Record plugin captures the microphone stream and can pass decoded audio data to wavesurfer in real time; the Spectrogram plugin then renders it as audio arrives.

The general approach:

  1. Use the Record plugin to start recording from getUserMedia.
  2. Connect the microphone MediaStream to a wavesurfer instance configured with the Spectrogram plugin.
  3. Because live audio arrives incrementally, the spectrogram re-renders as new decoded data is available.

See the Record plugin documentation for the full microphone setup and the Web Audio page for details on connecting custom AudioNode graphs to wavesurfer.