Caution

You're reading an old version of this documentation. If you want up-to-date information, please have a look at 0.10.2.

librosa.istft

librosa.istft(stft_matrix, *, hop_length=None, win_length=None, n_fft=None, window='hann', center=True, dtype=None, length=None, out=None)[source]

Inverse short-time Fourier transform (ISTFT).

Converts a complex-valued spectrogram stft_matrix to time-series y by minimizing the mean squared error between stft_matrix and STFT of y as described in [1] up to Section 2 (reconstruction from MSTFT).

In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified stft_matrix.

Parameters:

stft_matrixnp.ndarray [shape=(…, 1 + n_fft//2, t)]

STFT matrix from stft

hop_lengthint > 0 [scalar]

Number of frames between STFT columns. If unspecified, defaults to win_length // 4.

win_lengthint <= n_fft = 2 * (stft_matrix.shape[0] - 1)

When reconstructing the time series, each frame is windowed and each sample is normalized by the sum of squared window according to the window function (see below).

If unspecified, defaults to n_fft.

n_fftint > 0 or None

The number of samples per frame in the input spectrogram. By default, this will be inferred from the shape of stft_matrix. However, if an odd frame length was used, you can specify the correct length by setting n_fft.

windowstring, tuple, number, function, np.ndarray [shape=(n_fft,)]

a window specification (string, tuple, or number); see scipy.signal.get_window
a window function, such as scipy.signal.windows.hann
a user-specified window vector of length n_fft

centerboolean

If True, D is assumed to have centered frames.
If False, D is assumed to have left-aligned frames.

dtypenumeric type

Real numeric type for y. Default is to match the numerical precision of the input spectrogram.

lengthint > 0, optional

If provided, the output y is zero-padded or clipped to exactly length samples.

outnp.ndarray or None

A pre-allocated, complex-valued array to store the reconstructed signal y. This must be of the correct shape for the given input parameters.

If not provided, a new array is allocated and returned.

Returns:

ynp.ndarray [shape=(…, n)]: time domain signal reconstructed from stft_matrix. If stft_matrix contains more than two axes (e.g., from a stereo input signal), then y will match shape on the leading dimensions.

See also

stft: Short-time Fourier Transform

Notes

This function caches at level 30.

Examples

>>> y, sr = librosa.load(librosa.ex('trumpet'))
>>> D = librosa.stft(y)
>>> y_hat = librosa.istft(D)
>>> y_hat
array([-1.407e-03, -4.461e-04, ...,  5.131e-06, -1.417e-05],
      dtype=float32)

Exactly preserving length of the input signal requires explicit padding. Otherwise, a partial frame at the end of y will not be represented.

>>> n = len(y)
>>> n_fft = 2048
>>> y_pad = librosa.util.fix_length(y, size=n + n_fft // 2)
>>> D = librosa.stft(y_pad, n_fft=n_fft)
>>> y_out = librosa.istft(D, length=n)
>>> np.max(np.abs(y - y_out))
8.940697e-08