librosa.feature.chroma_stft

librosa.feature.chroma_stft(*, y=None, sr=22050, S=None, norm=inf, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='constant', tuning=None, n_chroma=12, **kwargs)[source]

Compute a chromagram from a waveform or power spectrogram.

This implementation is derived from chromagram_E [1]

Parameters:
ynp.ndarray [shape=(…, n)] or None

audio time series. Multi-channel is supported.

srnumber > 0 [scalar]

sampling rate of y

Snp.ndarray [shape=(…, d, t)] or None

power spectrogram

normfloat or None

Column-wise normalization. See librosa.util.normalize for details. If None, no normalization is performed.

n_fftint > 0 [scalar]

FFT window size if provided y, sr instead of S

hop_lengthint > 0 [scalar]

hop length if provided y, sr instead of S

win_lengthint <= n_fft [scalar]

Each frame of audio is windowed by window(). The window will be of length win_length and then padded with zeros to match n_fft. If unspecified, defaults to win_length = n_fft.

windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]
centerboolean
  • If True, the signal y is padded so that frame t is centered at y[t * hop_length].

  • If False, then frame t begins at y[t * hop_length]

pad_modestring

If center=True, the padding mode to use at the edges of the signal. By default, STFT uses zero padding.

tuningfloat [scalar] or None.

Deviation from A440 tuning in fractional chroma bins. If None, it is automatically estimated.

n_chromaint > 0 [scalar]

Number of chroma bins to produce (12 by default).

**kwargsadditional keyword arguments to parameterize chroma filters.
ctroctfloat > 0 [scalar]
octwidthfloat > 0 or None [scalar]

ctroct and octwidth specify a dominance window: a Gaussian weighting centered on ctroct (in octs, A0 = 27.5Hz) and with a gaussian half-width of octwidth. Set octwidth to None to use a flat weighting.

normfloat > 0 or np.inf

Normalization factor for each filter

base_cbool

If True, the filter bank will start at ‘C’. If False, the filter bank will start at ‘A’.

dtypenp.dtype

The data type of the output basis. By default, uses 32-bit (single-precision) floating point.

Returns:
chromagramnp.ndarray [shape=(…, n_chroma, t)]

Normalized energy for each chroma bin at each frame.

See also

librosa.filters.chroma

Chroma filter bank construction

librosa.util.normalize

Vector normalization

Examples

>>> y, sr = librosa.load(librosa.ex('nutcracker'), duration=15)
>>> librosa.feature.chroma_stft(y=y, sr=sr)
array([[1.   , 0.962, ..., 0.143, 0.278],
       [0.688, 0.745, ..., 0.103, 0.162],
       ...,
       [0.468, 0.598, ..., 0.18 , 0.342],
       [0.681, 0.702, ..., 0.553, 1.   ]], dtype=float32)

Use an energy (magnitude) spectrum instead of power spectrogram

>>> S = np.abs(librosa.stft(y))
>>> chroma = librosa.feature.chroma_stft(S=S, sr=sr)
>>> chroma
array([[1.   , 0.973, ..., 0.527, 0.569],
       [0.774, 0.81 , ..., 0.518, 0.506],
       ...,
       [0.624, 0.73 , ..., 0.611, 0.644],
       [0.766, 0.822, ..., 0.92 , 1.   ]], dtype=float32)

Use a pre-computed power spectrogram with a larger frame

>>> S = np.abs(librosa.stft(y, n_fft=4096))**2
>>> chroma = librosa.feature.chroma_stft(S=S, sr=sr)
>>> chroma
array([[0.994, 0.873, ..., 0.169, 0.227],
       [0.735, 0.64 , ..., 0.141, 0.135],
       ...,
       [0.6  , 0.937, ..., 0.214, 0.257],
       [0.743, 0.937, ..., 0.684, 0.815]], dtype=float32)
>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(nrows=2, sharex=True)
>>> img = librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
...                                y_axis='log', x_axis='time', ax=ax[0])
>>> fig.colorbar(img, ax=[ax[0]])
>>> ax[0].label_outer()
>>> img = librosa.display.specshow(chroma, y_axis='chroma', x_axis='time', ax=ax[1])
>>> fig.colorbar(img, ax=[ax[1]])
../_images/librosa-feature-chroma_stft-1.png