librosa.stft¶

librosa.
stft
(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=None, pad_mode='reflect')[source]¶ Shorttime Fourier transform (STFT).
The STFT represents a signal in the timefrequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows.
This function returns a complexvalued matrix D such that
np.abs(D[f, t])
is the magnitude of frequency binf
at framet
, andnp.angle(D[f, t])
is the phase of frequency binf
at framet
.
The integers
t
andf
can be converted to physical units by means of the utility functions frames_to_sample andfft_frequencies
. Parameters
 ynp.ndarray [shape=(n,)], realvalued
input signal
 n_fftint > 0 [scalar]
length of the windowed signal after padding with zeros. The number of rows in the STFT matrix
D
is(1 + n_fft/2)
. The default value,n_fft=2048
samples, corresponds to a physical duration of 93 milliseconds at a sample rate of 22050 Hz, i.e. the default sample rate in librosa. This value is well adapted for music signals. However, in speech processing, the recommended value is 512, corresponding to 23 milliseconds at a sample rate of 22050 Hz. In any case, we recommend settingn_fft
to a power of two for optimizing the speed of the fast Fourier transform (FFT) algorithm. hop_lengthint > 0 [scalar]
number of audio samples between adjacent STFT columns.
Smaller values increase the number of columns in
D
without affecting the frequency resolution of the STFT.If unspecified, defaults to
win_length // 4
(see below). win_lengthint <= n_fft [scalar]
Each frame of audio is windowed by
window
of lengthwin_length
and then padded with zeros to matchn_fft
.Smaller values improve the temporal resolution of the STFT (i.e. the ability to discriminate impulses that are closely spaced in time) at the expense of frequency resolution (i.e. the ability to discriminate pure tones that are closely spaced in frequency). This effect is known as the timefrequency localization tradeoff and needs to be adjusted according to the properties of the input signal
y
.If unspecified, defaults to
win_length = n_fft
. windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]
Either:
a window specification (string, tuple, or number); see
scipy.signal.get_window
a window function, such as
scipy.signal.windows.hann
a vector or array of length
n_fft
Defaults to a raised cosine window (‘hann’), which is adequate for most applications in audio signal processing.
 centerboolean
If
True
, the signaly
is padded so that frameD[:, t]
is centered aty[t * hop_length]
.If
False
, thenD[:, t]
begins aty[t * hop_length]
.Defaults to
True
, which simplifies the alignment ofD
onto a time grid by means oflibrosa.frames_to_samples
. Note, however, thatcenter
must be set to False when analyzing signals withlibrosa.stream
. dtypenp.dtype, optional
Complex numeric type for
D
. Default is inferred to match the precision of the input signal. pad_modestring or function
If
center=True
, this argument is passed to np.pad for padding the edges of the signaly
. By default (pad_mode="reflect"
),y
is padded on both sides with its own reflection, mirrored around its first and last sample respectively. Ifcenter=False
, this argument is ignored.
 Returns
 Dnp.ndarray [shape=(1 + n_fft/2, n_frames), dtype=dtype]
Complexvalued matrix of shortterm Fourier transform coefficients.
See also
istft
Inverse STFT
reassigned_spectrogram
Timefrequency reassigned spectrogram
Notes
This function caches at level 20.
Examples
>>> y, sr = librosa.load(librosa.ex('trumpet')) >>> S = np.abs(librosa.stft(y)) >>> S array([[5.395e03, 3.332e03, ..., 9.862e07, 1.201e05], [3.244e03, 2.690e03, ..., 9.536e07, 1.201e05], ..., [7.523e05, 3.722e05, ..., 1.188e04, 1.031e03], [7.640e05, 3.944e05, ..., 5.180e04, 1.346e03]], dtype=float32)
Use leftaligned frames, instead of centered frames
>>> S_left = librosa.stft(y, center=False)
Use a shorter hop length
>>> D_short = librosa.stft(y, hop_length=64)
Display a spectrogram
>>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> img = librosa.display.specshow(librosa.amplitude_to_db(S, ... ref=np.max), ... y_axis='log', x_axis='time', ax=ax) >>> ax.set_title('Power spectrogram') >>> fig.colorbar(img, ax=ax, format="%+2.0f dB")