Caution
You're reading the documentation for a development version. For the latest released version, please have a look at 0.9.1.
librosa.vqt¶
- librosa.vqt(y, *, sr=22050, hop_length=512, fmin=None, n_bins=84, gamma=None, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window='hann', scale=True, pad_mode='constant', res_type=None, dtype=None)[source]¶
Compute the variable-Q transform of an audio signal.
This implementation is based on the recursive sub-sampling method described by 1.
- 1
Schörkhuber, Christian, Anssi Klapuri, Nicki Holighaus, and Monika Dörfler. “A Matlab toolbox for efficient perfect reconstruction time-frequency transforms with log-frequency resolution.” In Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society, 2014.
- Parameters
- ynp.ndarray [shape=(…, n)]
audio time series. Multi-channel is supported.
- srnumber > 0 [scalar]
sampling rate of
y
- hop_lengthint > 0 [scalar]
number of samples between successive VQT columns.
- fminfloat > 0 [scalar]
Minimum frequency. Defaults to C1 ~= 32.70 Hz
- n_binsint > 0 [scalar]
Number of frequency bins, starting at
fmin
- gammanumber > 0 [scalar]
Bandwidth offset for determining filter lengths.
If
gamma=0
, produces the constant-Q transform.If ‘gamma=None’, gamma will be calculated such that filter bandwidths are equal to a constant fraction of the equivalent rectangular bandwidths (ERB). This is accomplished by solving for the gamma which gives:
B_k = alpha * f_k + gamma = C * ERB(f_k),
where
B_k
is the bandwidth of filterk
with center frequencyf_k
, alpha is the inverse of what would be the constant Q-factor, andC = alpha / 0.108
is the constant fraction across all filters.Here we use
ERB(f_k) = 24.7 + 0.108 * f_k
, the best-fit curve derived from experimental data in 2.- 2
Glasberg, Brian R., and Brian CJ Moore. “Derivation of auditory filter shapes from notched-noise data.” Hearing research 47.1-2 (1990): 103-138.
- bins_per_octaveint > 0 [scalar]
Number of bins per octave
- tuningNone or float
Tuning offset in fractions of a bin.
If
None
, tuning will be automatically estimated from the signal.The minimum frequency of the resulting VQT will be modified to
fmin * 2**(tuning / bins_per_octave)
.- filter_scalefloat > 0
Filter scale factor. Small values (<1) use shorter windows for improved time resolution.
- norm{inf, -inf, 0, float > 0}
Type of norm to use for basis function normalization. See
librosa.util.normalize
.- sparsityfloat in [0, 1)
Sparsify the VQT basis by discarding up to
sparsity
fraction of the energy in each basis.Set
sparsity=0
to disable sparsification.- windowstr, tuple, number, or function
Window specification for the basis filters. See
filters.get_window
for details.- scalebool
If
True
, scale the VQT response by square-root the length of each channel’s filter. This is analogous tonorm='ortho'
in FFT.If
False
, do not scale the VQT. This is analogous tonorm=None
in FFT.- pad_modestring
Padding mode for centered frame analysis.
See also:
librosa.stft
andnumpy.pad
.- res_typestring [optional]
The resampling mode for recursive downsampling.
By default,
vqt
will adaptively select a resampling mode which trades off accuracy at high frequencies for efficiency at low frequencies.You can override this by specifying a resampling mode as supported by
librosa.resample
. For example,res_type='fft'
will use a high-quality, but potentially slow FFT-based down-sampling, whileres_type='polyphase'
will use a fast, but potentially inaccurate down-sampling.- dtypenp.dtype
The dtype of the output array. By default, this is inferred to match the numerical precision of the input signal.
- Returns
- VQTnp.ndarray [shape=(…, n_bins, t), dtype=np.complex]
Variable-Q value each frequency at each time.
See also
Notes
This function caches at level 20.
Examples
Generate and plot a variable-Q power spectrum
>>> import matplotlib.pyplot as plt >>> y, sr = librosa.load(librosa.ex('choice'), duration=5) >>> C = np.abs(librosa.cqt(y, sr=sr)) >>> V = np.abs(librosa.vqt(y, sr=sr)) >>> fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True) >>> librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max), ... sr=sr, x_axis='time', y_axis='cqt_note', ax=ax[0]) >>> ax[0].set(title='Constant-Q power spectrum', xlabel=None) >>> ax[0].label_outer() >>> img = librosa.display.specshow(librosa.amplitude_to_db(V, ref=np.max), ... sr=sr, x_axis='time', y_axis='cqt_note', ax=ax[1]) >>> ax[1].set_title('Variable-Q power spectrum') >>> fig.colorbar(img, ax=ax, format="%+2.0f dB")