Caution

You're reading an old version of this documentation. If you want up-to-date information, please have a look at 0.9.1.

librosa.decompose.hpss¶

librosa.decompose.hpss(S, kernel_size=31, power=2.0, mask=False, margin=1.0)[source]¶

Median-filtering harmonic percussive source separation (HPSS).

If margin = 1.0, decomposes an input spectrogram S = H + P where H contains the harmonic components, and P contains the percussive components.

If margin > 1.0, decomposes an input spectrogram S = H + P + R where R contains residual components not included in H or P.

This implementation is based upon the algorithm described by [1] and [2].

1: Fitzgerald, Derry. “Harmonic/percussive separation using median filtering.” 13th International Conference on Digital Audio Effects (DAFX10), Graz, Austria, 2010.
2(1,2): Driedger, Müller, Disch. “Extending harmonic-percussive separation of audio.” 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, 2014.

Parameters

Snp.ndarray [shape=(d, n)]

input spectrogram. May be real (magnitude) or complex.

kernel_sizeint or tuple (kernel_harmonic, kernel_percussive)

kernel size(s) for the median filters.

If scalar, the same size is used for both harmonic and percussive.
If tuple, the first value specifies the width of the harmonic filter, and the second value specifies the width of the percussive filter.

powerfloat > 0 [scalar]

Exponent for the Wiener filter when constructing soft mask matrices.

maskbool

Return the masking matrices instead of components.

Masking matrices contain non-negative real values that can be used to measure the assignment of energy from S into harmonic or percussive components.

Components can be recovered by multiplying S * mask_H or S * mask_P.

marginfloat or tuple (margin_harmonic, margin_percussive)

margin size(s) for the masks (as described in [2])

If scalar, the same size is used for both harmonic and percussive.
If tuple, the first value specifies the margin of the harmonic mask, and the second value specifies the margin of the percussive mask.

Returns

harmonicnp.ndarray [shape=(d, n)]: harmonic component (or mask)
percussivenp.ndarray [shape=(d, n)]: percussive component (or mask)

See also

util.softmask

Notes

This function caches at level 30.

Examples

Separate into harmonic and percussive

>>> y, sr = librosa.load(librosa.util.example_audio_file(), duration=15)
>>> D = librosa.stft(y)
>>> H, P = librosa.decompose.hpss(D)

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(3, 1, 1)
>>> librosa.display.specshow(librosa.amplitude_to_db(np.abs(D),
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Full power spectrogram')
>>> plt.subplot(3, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(np.abs(H),
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Harmonic power spectrogram')
>>> plt.subplot(3, 1, 3)
>>> librosa.display.specshow(librosa.amplitude_to_db(np.abs(P),
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Percussive power spectrogram')
>>> plt.tight_layout()
>>> plt.show()

../_images/librosa-decompose-hpss-1_00_00.png

Or with a narrower horizontal filter

>>> H, P = librosa.decompose.hpss(D, kernel_size=(13, 31))

Just get harmonic/percussive masks, not the spectra

>>> mask_H, mask_P = librosa.decompose.hpss(D, mask=True)
>>> mask_H
array([[  1.000e+00,   1.469e-01, ...,   2.648e-03,   2.164e-03],
       [  1.000e+00,   2.368e-01, ...,   9.413e-03,   7.703e-03],
       ...,
       [  8.869e-01,   5.673e-02, ...,   4.603e-02,   1.247e-05],
       [  7.068e-01,   2.194e-02, ...,   4.453e-02,   1.205e-05]], dtype=float32)
>>> mask_P
array([[  2.858e-05,   8.531e-01, ...,   9.974e-01,   9.978e-01],
       [  1.586e-05,   7.632e-01, ...,   9.906e-01,   9.923e-01],
       ...,
       [  1.131e-01,   9.433e-01, ...,   9.540e-01,   1.000e+00],
       [  2.932e-01,   9.781e-01, ...,   9.555e-01,   1.000e+00]], dtype=float32)

Separate into harmonic/percussive/residual components by using a margin > 1.0

>>> H, P = librosa.decompose.hpss(D, margin=3.0)
>>> R = D - (H+P)
>>> y_harm = librosa.core.istft(H)
>>> y_perc = librosa.core.istft(P)
>>> y_resi = librosa.core.istft(R)

Get a more isolated percussive component by widening its margin

>>> H, P = librosa.decompose.hpss(D, margin=(1.0,5.0))