Core IO and DSP

Audio processing

load(path[, sr, mono, offset, duration, ...])

Load an audio file as a floating point time series.

stream(path, block_length, frame_length, ...)

Stream audio in fixed-length buffers.


Force an audio signal down to mono by averaging samples across channels.

resample(y, orig_sr, target_sr[, res_type, ...])

Resample a time series from orig_sr to target_sr

get_duration([y, sr, S, n_fft, hop_length, ...])

Compute the duration (in seconds) of an audio time series, feature matrix, or filename.


Get the sampling rate for a given file.

autocorrelate(y[, max_size, axis])

Bounded auto-correlation

lpc(y, order)

Linear Prediction Coefficients via Burg's method

zero_crossings(y[, threshold, ...])

Find the zero-crossings of a signal y: indices i such that sign(y[i]) != sign(y[j]).

clicks([times, frames, sr, hop_length, ...])

Returns a signal with the signal click placed at each specified time

tone(frequency[, sr, length, duration, phi])

Returns a pure tone signal.

chirp(fmin, fmax[, sr, length, duration, ...])

Returns a chirp signal that goes from frequency fmin to frequency fmax

mu_compress(x[, mu, quantize])

mu-law compression

mu_expand(x[, mu, quantize])

mu-law expansion

Spectral representations

stft(y[, n_fft, hop_length, win_length, ...])

Short-time Fourier transform (STFT).

istft(stft_matrix[, hop_length, win_length, ...])

Inverse short-time Fourier transform (ISTFT).

reassigned_spectrogram(y[, sr, S, n_fft, ...])

Time-frequency reassigned spectrogram.

cqt(y[, sr, hop_length, fmin, n_bins, ...])

Compute the constant-Q transform of an audio signal.

icqt(C[, sr, hop_length, fmin, ...])

Compute the inverse constant-Q transform.

hybrid_cqt(y[, sr, hop_length, fmin, ...])

Compute the hybrid constant-Q transform of an audio signal.

pseudo_cqt(y[, sr, hop_length, fmin, ...])

Compute the pseudo constant-Q transform of an audio signal.

iirt(y[, sr, win_length, hop_length, ...])

Time-frequency representation using IIR filters [Rd4077732470d-1].

fmt(y[, t_min, n_fmt, kind, beta, ...])

The fast Mellin transform (FMT) [R6343f8d4cac9-1] of a uniformly sampled signal y.

griffinlim(S[, n_iter, hop_length, ...])

Approximate magnitude spectrogram inversion using the "fast" Griffin-Lim algorithm [R047f50301c96-1] [R047f50301c96-2].

griffinlim_cqt(C[, n_iter, sr, hop_length, ...])

Approximate constant-Q magnitude spectrogram inversion using the "fast" Griffin-Lim algorithm [Re33fb425db1f-1] [Re33fb425db1f-2].

interp_harmonics(x, freqs, h_range[, kind, ...])

Compute the energy at harmonics of time-frequency representation.

salience(S, freqs, h_range[, weights, ...])

Harmonic salience function.

phase_vocoder(D, rate[, hop_length])

Phase vocoder.

magphase(D[, power])

Separate a complex-valued spectrogram D into its magnitude (S) and phase (P) components, so that D = S * P.


Get the FFT library currently used by librosa


Set the FFT library used by librosa.

Magnitude scaling

amplitude_to_db(S[, ref, amin, top_db])

Convert an amplitude spectrogram to dB-scaled spectrogram.

db_to_amplitude(S_db[, ref])

Convert a dB-scaled spectrogram to an amplitude spectrogram.

power_to_db(S[, ref, amin, top_db])

Convert a power spectrogram (amplitude squared) to decibel (dB) units

db_to_power(S_db[, ref])

Convert a dB-scale spectrogram to a power spectrogram.

perceptual_weighting(S, frequencies, **kwargs)

Perceptual weighting of a power spectrogram:

A_weighting(frequencies[, min_db])

Compute the A-weighting of a set of frequencies.

pcen(S[, sr, hop_length, gain, bias, power, ...])

Per-channel energy normalization (PCEN) [Rb388d53f6b92-1]

Time and frequency conversion

frames_to_samples(frames[, hop_length, n_fft])

Converts frame indices to audio sample indices.

frames_to_time(frames[, sr, hop_length, n_fft])

Converts frame counts to time (seconds).

samples_to_frames(samples[, hop_length, n_fft])

Converts sample indices into STFT frames.

samples_to_time(samples[, sr])

Convert sample indices to time (in seconds).

time_to_frames(times[, sr, hop_length, n_fft])

Converts time stamps into STFT frames.

time_to_samples(times[, sr])

Convert timestamps (in seconds) to sample indices.

blocks_to_frames(blocks, block_length)

Convert block indices to frame indices

blocks_to_samples(blocks, block_length, ...)

Convert block indices to sample indices

blocks_to_time(blocks, block_length, ...)

Convert block indices to time (in seconds)

hz_to_note(frequencies, **kwargs)

Convert one or more frequencies (in Hz) to the nearest note names.


Get MIDI note number(s) for given frequencies


Get the frequency (Hz) of MIDI note(s)

midi_to_note(midi[, octave, cents])

Convert one or more MIDI numbers to note strings.

note_to_hz(note, **kwargs)

Convert one or more note names to frequency (Hz)

note_to_midi(note[, round_midi])

Convert one or more spelled notes to MIDI number(s).

hz_to_mel(frequencies[, htk])

Convert Hz to Mels

hz_to_octs(frequencies[, tuning, ...])

Convert frequencies (Hz) to (fractional) octave numbers.

mel_to_hz(mels[, htk])

Convert mel bin numbers to frequencies

octs_to_hz(octs[, tuning, bins_per_octave, A440])

Convert octaves numbers to frequencies.

fft_frequencies([sr, n_fft])

Alternative implementation of np.fft.fftfreq

cqt_frequencies(n_bins, fmin[, ...])

Compute the center frequencies of Constant-Q bins.

mel_frequencies([n_mels, fmin, fmax, htk])

Compute an array of acoustic frequencies tuned to the mel scale.

tempo_frequencies(n_bins[, hop_length, sr])

Compute the frequencies (in beats per minute) corresponding to an onset auto-correlation or tempogram matrix.

fourier_tempo_frequencies([sr, win_length, ...])

Compute the frequencies (in beats per minute) corresponding to a Fourier tempogram matrix.

samples_like(X[, hop_length, n_fft, axis])

Return an array of sample indices to match the time axis from a feature matrix.

times_like(X[, sr, hop_length, n_fft, axis])

Return an array of time values to match the time axis from a feature matrix.

Pitch and tuning

estimate_tuning([y, sr, S, n_fft, ...])

Estimate the tuning of an audio time series or spectrogram input.

pitch_tuning(frequencies[, resolution, ...])

Given a collection of pitches, estimate its tuning offset (in fractions of a bin) relative to A440=440.0Hz.

piptrack([y, sr, S, n_fft, hop_length, ...])

Pitch tracking on thresholded parabolically-interpolated STFT.


ifgram(y[, sr, n_fft, hop_length, ...])

Compute the instantaneous frequency (as a proportion of the sampling rate) obtained as the time-derivative of the phase of the complex spectrum as described by [Ra44d590316d7-1].