librosa.piptrack(*, y=None, sr=22050, S=None, n_fft=2048, hop_length=None, fmin=150.0, fmax=4000.0, threshold=0.1, win_length=None, window='hann', center=True, pad_mode='constant', ref=None)[source]

Pitch tracking on thresholded parabolically-interpolated STFT.

This implementation uses the parabolic interpolation method described by 1.


ynp.ndarray [shape=(…, n)] or None

audio signal. Multi-channel is supported..

srnumber > 0 [scalar]

audio sampling rate of y

Snp.ndarray [shape=(…, d, t)] or None

magnitude or power spectrogram

n_fftint > 0 [scalar] or None

number of FFT bins to use, if y is provided.

hop_lengthint > 0 [scalar] or None

number of samples to hop

thresholdfloat in (0, 1)

A bin in spectrum S is considered a pitch when it is greater than threshold * ref(S).

By default, ref(S) is taken to be max(S, axis=0) (the maximum value in each column).

fminfloat > 0 [scalar]

lower frequency cutoff.

fmaxfloat > 0 [scalar]

upper frequency cutoff.

win_lengthint <= n_fft [scalar]

Each frame of audio is windowed by window. The window will be of length win_length and then padded with zeros to match n_fft.

If unspecified, defaults to win_length = n_fft.

windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]
  • If True, the signal y is padded so that frame t is centered at y[t * hop_length].

  • If False, then frame t begins at y[t * hop_length]


If center=True, the padding mode to use at the edges of the signal. By default, STFT uses zero-padding.

See also: np.pad.

refscalar or callable [default=np.max]

If scalar, the reference value against which S is compared for determining pitches.

If callable, the reference value is computed as ref(S, axis=0).

pitches, magnitudesnp.ndarray [shape=(…, d, t)]

Where d is the subset of FFT bins within fmin and fmax.

pitches[..., f, t] contains instantaneous frequency at bin f, time t

magnitudes[..., f, t] contains the corresponding magnitudes.

Both pitches and magnitudes take value 0 at bins of non-maximal magnitude.


This function caches at level 30.

One of S or y must be provided. If S is not given, it is computed from y using the default parameters of librosa.stft.


Computing pitches from a waveform input

>>> y, sr = librosa.load(librosa.ex('trumpet'))
>>> pitches, magnitudes = librosa.piptrack(y=y, sr=sr)

Or from a spectrogram input

>>> S = np.abs(librosa.stft(y))
>>> pitches, magnitudes = librosa.piptrack(S=S, sr=sr)

Or with an alternate reference value for pitch detection, where values above the mean spectral energy in each frame are counted as pitches

>>> pitches, magnitudes = librosa.piptrack(S=S, sr=sr, threshold=1,
...                                        ref=np.mean)