Caution
You're reading an old version of this documentation. If you want up-to-date information, please have a look at 0.9.1.
librosa.core.piptrack¶
- librosa.core.piptrack(y=None, sr=22050, S=None, n_fft=2048, hop_length=None, fmin=150.0, fmax=4000.0, threshold=0.1, win_length=None, window='hann', center=True, pad_mode='reflect', ref=None)[source]¶
Pitch tracking on thresholded parabolically-interpolated STFT.
This implementation uses the parabolic interpolation method described by [1].
- Parameters
- y: np.ndarray [shape=(n,)] or None
audio signal
- srnumber > 0 [scalar]
audio sampling rate of y
- S: np.ndarray [shape=(d, t)] or None
magnitude or power spectrogram
- n_fftint > 0 [scalar] or None
number of FFT bins to use, if y is provided.
- hop_lengthint > 0 [scalar] or None
number of samples to hop
- thresholdfloat in (0, 1)
A bin in spectrum S is considered a pitch when it is greater than threshold*ref(S).
By default, ref(S) is taken to be max(S, axis=0) (the maximum value in each column).
- fminfloat > 0 [scalar]
lower frequency cutoff.
- fmaxfloat > 0 [scalar]
upper frequency cutoff.
- win_lengthint <= n_fft [scalar]
Each frame of audio is windowed by window(). The window will be of length win_length and then padded with zeros to match n_fft.
If unspecified, defaults to
win_length = n_fft
.- windowstring, tuple, number, function, or np.ndarray [shape=(n_fft,)]
a window specification (string, tuple, or number); see
scipy.signal.get_window
a window function, such as
scipy.signal.hanning
a vector or array of length n_fft
- centerboolean
If True, the signal y is padded so that frame t is centered at y[t * hop_length].
If False, then frame t begins at y[t * hop_length]
- pad_modestring
If center=True, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.
- refscalar or callable [default=np.max]
If scalar, the reference value against which S is compared for determining pitches.
If callable, the reference value is computed as ref(S, axis=0).
- .. note::
One of S or y must be provided.
If S is not given, it is computed from y using the default parameters of
librosa.core.stft
.
- Returns
- pitchesnp.ndarray [shape=(d, t)]
- magnitudesnp.ndarray [shape=(d,t)]
Where d is the subset of FFT bins within fmin and fmax.
pitches[f, t] contains instantaneous frequency at bin f, time t
magnitudes[f, t] contains the corresponding magnitudes.
Both pitches and magnitudes take value 0 at bins of non-maximal magnitude.
Notes
This function caches at level 30.
Examples
Computing pitches from a waveform input
>>> y, sr = librosa.load(librosa.util.example_audio_file()) >>> pitches, magnitudes = librosa.piptrack(y=y, sr=sr)
Or from a spectrogram input
>>> S = np.abs(librosa.stft(y)) >>> pitches, magnitudes = librosa.piptrack(S=S, sr=sr)
Or with an alternate reference value for pitch detection, where values above the mean spectral energy in each frame are counted as pitches
>>> pitches, magnitudes = librosa.piptrack(S=S, sr=sr, threshold=1, ... ref=np.mean)