Caution

You're reading the documentation for a development version. For the latest released version, please have a look at 0.9.1.

librosa.vqt¶

librosa.vqt(y, *, sr=22050, hop_length=512, fmin=None, n_bins=84, gamma=None, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window='hann', scale=True, pad_mode='constant', res_type=None, dtype=None)[source]¶

Compute the variable-Q transform of an audio signal.

This implementation is based on the recursive sub-sampling method described by 1.

1: Schörkhuber, Christian, Anssi Klapuri, Nicki Holighaus, and Monika Dörfler. “A Matlab toolbox for efficient perfect reconstruction time-frequency transforms with log-frequency resolution.” In Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society, 2014.

Parameters

ynp.ndarray [shape=(…, n)]

audio time series. Multi-channel is supported.

srnumber > 0 [scalar]

sampling rate of y

hop_lengthint > 0 [scalar]

number of samples between successive VQT columns.

fminfloat > 0 [scalar]

Minimum frequency. Defaults to C1 ~= 32.70 Hz

n_binsint > 0 [scalar]

Number of frequency bins, starting at fmin

gammanumber > 0 [scalar]

Bandwidth offset for determining filter lengths.

If gamma=0, produces the constant-Q transform.

If ‘gamma=None’, gamma will be calculated such that filter bandwidths are equal to a constant fraction of the equivalent rectangular bandwidths (ERB). This is accomplished by solving for the gamma which gives:

B_k = alpha * f_k + gamma = C * ERB(f_k),

where B_k is the bandwidth of filter k with center frequency f_k, alpha is the inverse of what would be the constant Q-factor, and C = alpha / 0.108 is the constant fraction across all filters.

Here we use ERB(f_k) = 24.7 + 0.108 * f_k, the best-fit curve derived from experimental data in 2.

2: Glasberg, Brian R., and Brian CJ Moore. “Derivation of auditory filter shapes from notched-noise data.” Hearing research 47.1-2 (1990): 103-138.

bins_per_octaveint > 0 [scalar]

Number of bins per octave

tuningNone or float

Tuning offset in fractions of a bin.

If None, tuning will be automatically estimated from the signal.

The minimum frequency of the resulting VQT will be modified to fmin * 2**(tuning / bins_per_octave).

filter_scalefloat > 0

Filter scale factor. Small values (<1) use shorter windows for improved time resolution.

norm{inf, -inf, 0, float > 0}

Type of norm to use for basis function normalization. See librosa.util.normalize.

sparsityfloat in [0, 1)

Sparsify the VQT basis by discarding up to sparsity fraction of the energy in each basis.

Set sparsity=0 to disable sparsification.

windowstr, tuple, number, or function

Window specification for the basis filters. See filters.get_window for details.

scalebool

If True, scale the VQT response by square-root the length of each channel’s filter. This is analogous to norm='ortho' in FFT.

If False, do not scale the VQT. This is analogous to norm=None in FFT.

pad_modestring

Padding mode for centered frame analysis.

See also: librosa.stft and numpy.pad.

res_typestring [optional]

The resampling mode for recursive downsampling.

By default, vqt will adaptively select a resampling mode which trades off accuracy at high frequencies for efficiency at low frequencies.

You can override this by specifying a resampling mode as supported by librosa.resample. For example, res_type='fft' will use a high-quality, but potentially slow FFT-based down-sampling, while res_type='polyphase' will use a fast, but potentially inaccurate down-sampling.

dtypenp.dtype

The dtype of the output array. By default, this is inferred to match the numerical precision of the input signal.

Returns

VQTnp.ndarray [shape=(…, n_bins, t), dtype=np.complex]: Variable-Q value each frequency at each time.