librosa.feature.inverse.mfcc_to_audio
- librosa.feature.inverse.mfcc_to_audio(mfcc, *, n_mels=128, dct_type=2, norm='ortho', ref=1.0, lifter=0, **kwargs)[source]
Convert Mel-frequency cepstral coefficients to a time-domain audio signal
This function is primarily a convenience wrapper for the following steps:
Convert mfcc to Mel power spectrum (
mfcc_to_mel
)Convert Mel power spectrum to time-domain audio (
mel_to_audio
)
- Parameters:
- mfccnp.ndarray [shape=(…, n_mfcc, n)]
The Mel-frequency cepstral coefficients
- n_melsint > 0
The number of Mel frequencies
- dct_type{1, 2, 3}
Discrete cosine transform (DCT) type By default, DCT type-2 is used.
- normNone or ‘ortho’
If
dct_type
is 2 or 3, settingnorm='ortho'
uses an orthonormal DCT basis. Normalization is not supported fordct_type=1
.- reffloat
Reference power for (inverse) decibel calculation
- lifternumber >= 0
- If
lifter>0
, apply inverse liftering (inverse cepstral filtering):: M[n, :] <- M[n, :] / (1 + sin(pi * (n + 1) / lifter)) * lifter / 2
- If
- **kwargsadditional keyword arguments to pass through to
mel_to_audio
- Mnp.ndarray [shape=(…, n_mels, n), non-negative]
The spectrogram as produced by feature.melspectrogram
- srnumber > 0 [scalar]
sampling rate of the underlying signal
- n_fftint > 0 [scalar]
number of FFT components in the resulting STFT
- hop_lengthNone or int > 0
The hop length of the STFT. If not provided, it will default to
n_fft // 4
- win_lengthNone or int > 0
The window length of the STFT. By default, it will equal
n_fft
- windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]
A window specification as supported by stft or istft
- centerboolean
If True, the STFT is assumed to use centered frames. If False, the STFT is assumed to use left-aligned frames.
- pad_modestring
If
center=True
, the padding mode to use at the edges of the signal. By default, STFT uses zero padding.- powerfloat > 0 [scalar]
Exponent for the magnitude melspectrogram
- n_iterint > 0
The number of iterations for Griffin-Lim
- lengthNone or int > 0
If provided, the output
y
is zero-padded or clipped to exactlylength
samples.- dtypenp.dtype
Real numeric type for the time-domain signal. Default is 32-bit float.
- **kwargsadditional keyword arguments for Mel filter bank parameters
- fminfloat >= 0 [scalar]
lowest frequency (in Hz)
- fmaxfloat >= 0 [scalar]
highest frequency (in Hz). If None, use
fmax = sr / 2.0
- htkbool [scalar]
use HTK formula instead of Slaney
- Returns:
- ynp.ndarray [shape=(…, n)]
A time-domain signal reconstructed from mfcc