You're reading the documentation for a development version. For the latest released version, please have a look at 0.10.2.


librosa.effects.trim(y, *, top_db=60, ref=<function amax>, frame_length=2048, hop_length=512, aggregate=<function amax>)[source]

Trim leading and trailing silence from an audio signal.

Silence is defined as segments of the audio signal that are top_db decibels (or more) quieter than a reference level, ref. By default, ref is set to the signal’s maximum RMS value. It’s important to note that if the entire signal maintains a uniform RMS value, there will be no segments considered quieter than the maximum, leading to no trimming. This implies that a completely silent signal will remain untrimmed with the default ref setting. In these situations, an explicit value for ref (in decibels) should be used instead.

ynp.ndarray, shape=(…, n)

Audio signal. Multi-channel is supported.

top_dbnumber > 0

The threshold (in decibels) below reference to consider as silence

refnumber or callable

The reference amplitude. By default, it uses np.max and compares to the peak amplitude in the signal.

frame_lengthint > 0

The number of samples per analysis frame

hop_lengthint > 0

The number of samples between analysis frames

aggregatecallable [default: np.max]

Function to aggregate across channels (if y.ndim > 1)

y_trimmednp.ndarray, shape=(…, m)

The trimmed signal

indexnp.ndarray, shape=(2,)

the interval of y corresponding to the non-silent region: y_trimmed = y[index[0]:index[1]] (for mono) or y_trimmed = y[:, index[0]:index[1]] (for stereo).


>>> # Load some audio
>>> y, sr = librosa.load(librosa.ex('choice'))
>>> # Trim the beginning and ending silence
>>> yt, index = librosa.effects.trim(y)
>>> # Print the durations
>>> print(librosa.get_duration(y, sr=sr), librosa.get_duration(yt, sr=sr))
25.025986394557822 25.007891156462584