librosa.feature.inverse.mfcc_to_audio

librosa.feature.inverse.mfcc_to_audio(mfcc, *, n_mels=128, dct_type=2, norm='ortho', ref=1.0, lifter=0, **kwargs)[source]

Convert Mel-frequency cepstral coefficients to a time-domain audio signal

This function is primarily a convenience wrapper for the following steps:

  1. Convert mfcc to Mel power spectrum (mfcc_to_mel)

  2. Convert Mel power spectrum to time-domain audio (mel_to_audio)

Parameters:
mfccnp.ndarray [shape=(…, n_mfcc, n)]

The Mel-frequency cepstral coefficients

n_melsint > 0

The number of Mel frequencies

dct_type{1, 2, 3}

Discrete cosine transform (DCT) type By default, DCT type-2 is used.

normNone or ‘ortho’

If dct_type is 2 or 3, setting norm='ortho' uses an orthonormal DCT basis. Normalization is not supported for dct_type=1.

reffloat

Reference power for (inverse) decibel calculation

lifternumber >= 0
If lifter>0, apply inverse liftering (inverse cepstral filtering)::

M[n, :] <- M[n, :] / (1 + sin(pi * (n + 1) / lifter)) * lifter / 2

**kwargsadditional keyword arguments to pass through to mel_to_audio
Mnp.ndarray [shape=(…, n_mels, n), non-negative]

The spectrogram as produced by feature.melspectrogram

srnumber > 0 [scalar]

sampling rate of the underlying signal

n_fftint > 0 [scalar]

number of FFT components in the resulting STFT

hop_lengthNone or int > 0

The hop length of the STFT. If not provided, it will default to n_fft // 4

win_lengthNone or int > 0

The window length of the STFT. By default, it will equal n_fft

windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]

A window specification as supported by stft or istft

centerboolean

If True, the STFT is assumed to use centered frames. If False, the STFT is assumed to use left-aligned frames.

pad_modestring

If center=True, the padding mode to use at the edges of the signal. By default, STFT uses zero padding.

powerfloat > 0 [scalar]

Exponent for the magnitude melspectrogram

n_iterint > 0

The number of iterations for Griffin-Lim

lengthNone or int > 0

If provided, the output y is zero-padded or clipped to exactly length samples.

dtypenp.dtype

Real numeric type for the time-domain signal. Default is 32-bit float.

**kwargsadditional keyword arguments for Mel filter bank parameters
fminfloat >= 0 [scalar]

lowest frequency (in Hz)

fmaxfloat >= 0 [scalar]

highest frequency (in Hz). If None, use fmax = sr / 2.0

htkbool [scalar]

use HTK formula instead of Slaney

Returns:
ynp.ndarray [shape=(…, n)]

A time-domain signal reconstructed from mfcc