Caution
You're reading an old version of this documentation. If you want up-to-date information, please have a look at 0.10.0.
librosa.piptrack
- librosa.piptrack(y=None, sr=22050, S=None, n_fft=2048, hop_length=None, fmin=150.0, fmax=4000.0, threshold=0.1, win_length=None, window='hann', center=True, pad_mode='reflect', ref=None)[source]
Pitch tracking on thresholded parabolically-interpolated STFT.
This implementation uses the parabolic interpolation method described by [1].
- Parameters:
- y: np.ndarray [shape=(n,)] or None
audio signal
- srnumber > 0 [scalar]
audio sampling rate of
y
- S: np.ndarray [shape=(d, t)] or None
magnitude or power spectrogram
- n_fftint > 0 [scalar] or None
number of FFT bins to use, if
y
is provided.- hop_lengthint > 0 [scalar] or None
number of samples to hop
- thresholdfloat in (0, 1)
A bin in spectrum
S
is considered a pitch when it is greater thanthreshold * ref(S)
.By default,
ref(S)
is taken to bemax(S, axis=0)
(the maximum value in each column).- fminfloat > 0 [scalar]
lower frequency cutoff.
- fmaxfloat > 0 [scalar]
upper frequency cutoff.
- win_lengthint <= n_fft [scalar]
Each frame of audio is windowed by
window
. The window will be of length win_length and then padded with zeros to matchn_fft
.If unspecified, defaults to
win_length = n_fft
.- windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]
a window specification (string, tuple, or number); see
scipy.signal.get_window
a window function, such as
scipy.signal.windows.hann
a vector or array of length
n_fft
- centerboolean
If
True
, the signaly
is padded so that framet
is centered aty[t * hop_length]
.If
False
, then framet
begins aty[t * hop_length]
- pad_modestring
If
center=True
, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.- refscalar or callable [default=np.max]
If scalar, the reference value against which
S
is compared for determining pitches.If callable, the reference value is computed as
ref(S, axis=0)
.
- Returns:
- pitches, magnitudesnp.ndarray [shape=(d, t)]
Where
d
is the subset of FFT bins withinfmin
andfmax
.pitches[f, t]
contains instantaneous frequency at binf
, timet
magnitudes[f, t]
contains the corresponding magnitudes.Both
pitches
andmagnitudes
take value 0 at bins of non-maximal magnitude.
Notes
This function caches at level 30.
One of
S
ory
must be provided. IfS
is not given, it is computed fromy
using the default parameters oflibrosa.stft
.Examples
Computing pitches from a waveform input
>>> y, sr = librosa.load(librosa.ex('trumpet')) >>> pitches, magnitudes = librosa.piptrack(y=y, sr=sr)
Or from a spectrogram input
>>> S = np.abs(librosa.stft(y)) >>> pitches, magnitudes = librosa.piptrack(S=S, sr=sr)
Or with an alternate reference value for pitch detection, where values above the mean spectral energy in each frame are counted as pitches
>>> pitches, magnitudes = librosa.piptrack(S=S, sr=sr, threshold=1, ... ref=np.mean)