librosa.effects.trim

librosa.effects.trim(y, *, top_db=60, ref=<function max>, frame_length=2048, hop_length=512, aggregate=<function max>)[source]

Trim leading and trailing silence from an audio signal.

Silence is defined as segments of the audio signal that are top_db decibels (or more) quieter than a reference level, ref. By default, ref is set to the signal’s maximum RMS value. It’s important to note that if the entire signal maintains a uniform RMS value, there will be no segments considered quieter than the maximum, leading to no trimming. This implies that a completely silent signal will remain untrimmed with the default ref setting. In these situations, an explicit value for ref (in decibels) should be used instead.

Parameters:

ynp.ndarray, shape=(…, n): Audio signal. Multi-channel is supported.
top_dbnumber: The threshold (in decibels) below reference to consider as silence. You can also use a negative value for top_db to treat any value below ref + |top_db| as silent. This will only make sense if ref is not np.max.
refnumber or callable: The reference amplitude. By default, it uses np.max and compares to the peak amplitude in the signal.
frame_lengthint > 0: The number of samples per analysis frame
hop_lengthint > 0: The number of samples between analysis frames
aggregatecallable [default: np.max]: Function to aggregate across channels (if y.ndim > 1)

Returns:

y_trimmednp.ndarray, shape=(…, m): The trimmed signal
indexnp.ndarray, shape=(2,): the interval of y corresponding to the non-silent region: y_trimmed = y[index[0]:index[1]] (for mono) or y_trimmed = y[:, index[0]:index[1]] (for stereo).

Examples

>>> # Load some audio
>>> y, sr = librosa.load(librosa.ex('choice'))
>>> # Trim the beginning and ending silence
>>> yt, index = librosa.effects.trim(y)
>>> # Print the durations
>>> print(librosa.get_duration(y, sr=sr), librosa.get_duration(yt, sr=sr))
25.025986394557822 25.007891156462584