Spectrogram plugin
The Spectrogram plugin renders a frequency-over-time heat map below (or alongside) the waveform. It’s useful when the waveform alone isn’t enough — for inspecting pitch, harmonics, noise, or speech formants. Each vertical column represents one FFT window; color encodes intensity so you can see which frequencies are present at each moment in the audio.
Setup
Install wavesurfer.js if you haven’t already, then import and register the plugin:
import WaveSurfer from 'wavesurfer.js'
import SpectrogramPlugin from 'wavesurfer.js/dist/plugins/spectrogram.esm.js'
const ws = WaveSurfer.create({
container: '#waveform',
url: '/audio/demo.mp3',
plugins: [
SpectrogramPlugin.create({
container: '#spectrogram',
labels: true,
scale: 'mel',
}),
],
})
The container option accepts a CSS selector string or an HTMLElement. If you omit it, the spectrogram is appended directly inside the wavesurfer wrapper element.
See the live Spectrogram example to experiment with the options in the browser.
Options
| Option | Type | Default | Description |
|---|---|---|---|
container |
string | HTMLElement |
wavesurfer wrapper | Where to render the spectrogram. |
fftSamples |
number |
512 |
Number of samples per FFT window. Must be a power of 2. Controls frequency resolution — higher values give finer frequency bins but coarser time resolution, and also increase compute time. |
height |
number |
200 |
Height of the spectrogram in CSS pixels. |
frequencyMin |
number |
0 |
Lowest frequency (Hz) to display. |
frequencyMax |
number |
half of sample rate | Highest frequency (Hz) to display. Set to sampleRate / 2 to show the full range. |
scale |
string |
'mel' |
Frequency axis scale. See accepted values below. |
gainDB |
number |
20 |
Brightness offset in dB. Increase if the display looks too dark; decrease if it looks washed out. |
rangeDB |
number |
80 |
Dynamic range in dB. Signals more than this many dB below the gain threshold are rendered as black. |
colorMap |
number[][] | 'gray' | 'igray' | 'roseus' |
built-in | Color palette. Pass a named preset ('gray', 'igray', 'roseus') or a 256-entry array of [r, g, b, alpha] floats in the 0–1 range. |
labels |
boolean |
false |
Show frequency labels on the left edge. |
labelsBackground |
string |
transparent | Background fill for the labels overlay canvas. |
labelsColor |
string |
'#fff' |
Color for frequency number labels. |
labelsHzColor |
string |
same as labelsColor |
Color for Hz/kHz unit labels. |
splitChannels |
boolean |
false |
Render a separate spectrogram row for each audio channel. |
windowFunc |
string |
'hann' |
FFT window function. Accepted values: 'bartlett', 'bartlettHann', 'blackman', 'cosine', 'gauss', 'hamming', 'hann', 'lanczoz', 'rectangular', 'triangular'. |
noverlap |
number |
auto | Overlap between consecutive FFT windows in samples. Must be less than fftSamples. Auto-calculated from the canvas width if omitted. |
useWebWorker |
boolean |
false |
Offload FFT calculations to a Web Worker to keep the main thread responsive. |
Scale accepted values
The scale option controls how frequencies are distributed along the vertical axis:
'linear'— equal Hz spacing from bottom to top.'logarithmic'— logarithmic Hz spacing; mirrors how audio editors like Audacity present frequency.'mel'— Mel scale, based on pitch perception; the default.'bark'— Bark psychoacoustical scale.'erb'— Equivalent Rectangular Bandwidth scale.
fftSamples and image size
fftSamples controls frequency resolution, not the pixel dimensions of the rendered image. The plugin resamples the computed FFT data to fit the current canvas width, so changing fftSamples will not scale the spectrogram wider or taller — use height for vertical size and the wavesurfer minPxPerSec option for horizontal zoom.
For speech or music, scale: 'mel' with fftSamples: 512 is a good starting point. For detailed pitch analysis, try scale: 'logarithmic' with fftSamples: 2048.
Performance
Spectrogram rendering is CPU-intensive. The plugin runs an FFT over every window of audio and then resamples the result to match the current pixel width. For long files this can take a noticeable amount of time, and zooming forces a re-render.
Zooming with large files can freeze the browser tab. Each zoom change triggers a full re-render of the spectrogram. With a 10-minute file and fftSamples: 2048, the FFT pass alone can take several seconds on the main thread. Users have reported the tab becoming unresponsive during zoom.
Practical recommendations:
- Keep
fftSamplesreasonable. Values of256or512are usually sufficient and render much faster than2048or4096. - Work with shorter files. Consider splitting long recordings into segments before rendering a spectrogram, or only load the portion the user is viewing.
- Pre-decode waveform peaks. If you supply pre-computed waveform peaks to wavesurfer, it skips the audio decode step on load — the spectrogram still needs to decode, but the waveform appears immediately. See Pre-decoded peaks.
- Enable the Web Worker. Set
useWebWorker: trueto move FFT computation off the main thread. The spectrogram will appear slightly later, but the page stays responsive.
SpectrogramPlugin.create({
fftSamples: 512, // keep low for performance
useWebWorker: true, // non-blocking FFT
scale: 'mel',
})
Live microphone spectrogram
You can feed live microphone audio through the spectrogram by combining the Record plugin with wavesurfer’s Web Audio capabilities. The Record plugin captures the microphone stream and can pass decoded audio data to wavesurfer in real time; the Spectrogram plugin then renders it as audio arrives.
The general approach:
- Use the Record plugin to start recording from
getUserMedia. - Connect the microphone
MediaStreamto a wavesurfer instance configured with the Spectrogram plugin. - Because live audio arrives incrementally, the spectrogram re-renders as new decoded data is available.
See the Record plugin documentation for the full microphone setup and the Web Audio page for details on connecting custom AudioNode graphs to wavesurfer.