librosa.istft¶
- librosa.istft(stft_matrix, *, hop_length=None, win_length=None, n_fft=None, window='hann', center=True, dtype=None, length=None)[source]¶
Inverse short-time Fourier transform (ISTFT).
Converts a complex-valued spectrogram
stft_matrix
to time-seriesy
by minimizing the mean squared error betweenstft_matrix
and STFT ofy
as described in 1 up to Section 2 (reconstruction from MSTFT).In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified
stft_matrix
.- 1
D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
- Parameters
- stft_matrixnp.ndarray [shape=(…, 1 + n_fft//2, t)]
STFT matrix from
stft
- hop_lengthint > 0 [scalar]
Number of frames between STFT columns. If unspecified, defaults to
win_length // 4
.- win_lengthint <= n_fft = 2 * (stft_matrix.shape[0] - 1)
When reconstructing the time series, each frame is windowed and each sample is normalized by the sum of squared window according to the
window
function (see below).If unspecified, defaults to
n_fft
.- n_fftint > 0 or None
The number of samples per frame in the input spectrogram. By default, this will be inferred from the shape of
stft_matrix
. However, if an odd frame length was used, you can specify the correct length by settingn_fft
.- windowstring, tuple, number, function, np.ndarray [shape=(n_fft,)]
a window specification (string, tuple, or number); see
scipy.signal.get_window
a window function, such as
scipy.signal.windows.hann
a user-specified window vector of length
n_fft
- centerboolean
If
True
,D
is assumed to have centered frames.If
False
,D
is assumed to have left-aligned frames.
- dtypenumeric type
Real numeric type for
y
. Default is to match the numerical precision of the input spectrogram.- lengthint > 0, optional
If provided, the output
y
is zero-padded or clipped to exactlylength
samples.
- Returns
- ynp.ndarray [shape=(…, n)]
time domain signal reconstructed from
stft_matrix
. Ifstft_matrix
contains more than two axes (e.g., from a stereo input signal), theny
will match shape on the leading dimensions.
See also
stft
Short-time Fourier Transform
Notes
This function caches at level 30.
Examples
>>> y, sr = librosa.load(librosa.ex('trumpet')) >>> D = librosa.stft(y) >>> y_hat = librosa.istft(D) >>> y_hat array([-1.407e-03, -4.461e-04, ..., 5.131e-06, -1.417e-05], dtype=float32)
Exactly preserving length of the input signal requires explicit padding. Otherwise, a partial frame at the end of
y
will not be represented.>>> n = len(y) >>> n_fft = 2048 >>> y_pad = librosa.util.fix_length(y, size=n + n_fft // 2) >>> D = librosa.stft(y_pad, n_fft=n_fft) >>> y_out = librosa.istft(D, length=n) >>> np.max(np.abs(y - y_out)) 8.940697e-08