librosa.vqt(y, *, sr=22050, hop_length=512, fmin=None, n_bins=84, gamma=None, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window='hann', scale=True, pad_mode='constant', res_type='soxr_hq', dtype=None)[source]

Compute the variable-Q transform of an audio signal.

This implementation is based on the recursive sub-sampling method described by 1.


Schörkhuber, Christian, Anssi Klapuri, Nicki Holighaus, and Monika Dörfler. “A Matlab toolbox for efficient perfect reconstruction time-frequency transforms with log-frequency resolution.” In Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society, 2014.

ynp.ndarray [shape=(…, n)]

audio time series. Multi-channel is supported.

srnumber > 0 [scalar]

sampling rate of y

hop_lengthint > 0 [scalar]

number of samples between successive VQT columns.

fminfloat > 0 [scalar]

Minimum frequency. Defaults to C1 ~= 32.70 Hz

n_binsint > 0 [scalar]

Number of frequency bins, starting at fmin

gammanumber > 0 [scalar]

Bandwidth offset for determining filter lengths.

If gamma=0, produces the constant-Q transform.

If ‘gamma=None’, gamma will be calculated such that filter bandwidths are equal to a constant fraction of the equivalent rectangular bandwidths (ERB). This is accomplished by solving for the gamma which gives:

B_k = alpha * f_k + gamma = C * ERB(f_k),

where B_k is the bandwidth of filter k with center frequency f_k, alpha is the inverse of what would be the constant Q-factor, and C = alpha / 0.108 is the constant fraction across all filters.

Here we use ERB(f_k) = 24.7 + 0.108 * f_k, the best-fit curve derived from experimental data in 2.


Glasberg, Brian R., and Brian CJ Moore. “Derivation of auditory filter shapes from notched-noise data.” Hearing research 47.1-2 (1990): 103-138.

bins_per_octaveint > 0 [scalar]

Number of bins per octave

tuningNone or float

Tuning offset in fractions of a bin.

If None, tuning will be automatically estimated from the signal.

The minimum frequency of the resulting VQT will be modified to fmin * 2**(tuning / bins_per_octave).

filter_scalefloat > 0

Filter scale factor. Small values (<1) use shorter windows for improved time resolution.

norm{inf, -inf, 0, float > 0}

Type of norm to use for basis function normalization. See librosa.util.normalize.

sparsityfloat in [0, 1)

Sparsify the VQT basis by discarding up to sparsity fraction of the energy in each basis.

Set sparsity=0 to disable sparsification.

windowstr, tuple, number, or function

Window specification for the basis filters. See filters.get_window for details.


If True, scale the VQT response by square-root the length of each channel’s filter. This is analogous to norm='ortho' in FFT.

If False, do not scale the VQT. This is analogous to norm=None in FFT.


Padding mode for centered frame analysis.

See also: librosa.stft and numpy.pad.


The resampling mode for recursive downsampling.


The dtype of the output array. By default, this is inferred to match the numerical precision of the input signal.

VQTnp.ndarray [shape=(…, n_bins, t), dtype=np.complex]

Variable-Q value each frequency at each time.

See also



This function caches at level 20.


Generate and plot a variable-Q power spectrum

>>> import matplotlib.pyplot as plt
>>> y, sr = librosa.load(librosa.ex('choice'), duration=5)
>>> C = np.abs(librosa.cqt(y, sr=sr))
>>> V = np.abs(librosa.vqt(y, sr=sr))
>>> fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
>>> librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max),
...                          sr=sr, x_axis='time', y_axis='cqt_note', ax=ax[0])
>>> ax[0].set(title='Constant-Q power spectrum', xlabel=None)
>>> ax[0].label_outer()
>>> img = librosa.display.specshow(librosa.amplitude_to_db(V, ref=np.max),
...                                sr=sr, x_axis='time', y_axis='cqt_note', ax=ax[1])
>>> ax[1].set_title('Variable-Q power spectrum')
>>> fig.colorbar(img, ax=ax, format="%+2.0f dB")