.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_vocal_separation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_vocal_separation.py: ================ Vocal separation ================ This notebook demonstrates a simple technique for separating vocals (and other sporadic foreground signals) from accompanying instrumentation. .. warning:: This example is primarily of historical interest, and we do not recommend this as a competitive method for vocal source separation. For a more recent treatment of vocal and music source separation, please refer to `Open Source Tools & Data for Music Source Separation `_ [1]_. This is based on the "REPET-SIM" method of `Rafii and Pardo, 2012 `_ [2]_, but includes a couple of modifications and extensions: - FFT windows overlap by 1/4, instead of 1/2 - Non-local filtering is converted into a soft mask by Wiener filtering. This is similar in spirit to the soft-masking method used by `Fitzgerald, 2012 `_ [3]_, but is a bit more numerically stable in practice. .. [1] Manilow, Ethan, Prem Seetharaman, and Justin Salamon. "Open source tools & data for music source separation." 2020. .. [2] Rafii, Zafar, and Bryan Pardo. "Music/Voice Separation Using the Similarity Matrix." In ISMIR, pp. 583-588. 2012. .. [3] FitzGerald, Derry. "Vocal separation using nearest neighbours and median filtering." 23rd IET Irish Signals and Systems Conference, Maynooth. (2012): 98-98. .. GENERATED FROM PYTHON SOURCE LINES 38-50 .. code-block:: Python # Code source: Brian McFee # License: ISC ################## # Standard imports import numpy as np import matplotlib.pyplot as plt from IPython.display import Audio import librosa .. GENERATED FROM PYTHON SOURCE LINES 51-52 Load an example with vocals. .. GENERATED FROM PYTHON SOURCE LINES 52-61 .. code-block:: Python y, sr = librosa.load(librosa.ex('fishin'), duration=120) # And compute the spectrogram magnitude and phase S_full, phase = librosa.magphase(librosa.stft(y)) # Play back a 5-second excerpt with vocals Audio(data=y[10*sr:15*sr], rate=sr) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 62-63 Plot a 5-second slice of the spectrum .. GENERATED FROM PYTHON SOURCE LINES 63-69 .. code-block:: Python idx = slice(*librosa.time_to_frames([10, 15], sr=sr)) fig, ax = plt.subplots() img = librosa.display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax) fig.colorbar(img, ax=ax) .. image-sg:: /auto_examples/images/sphx_glr_plot_vocal_separation_001.png :alt: plot vocal separation :srcset: /auto_examples/images/sphx_glr_plot_vocal_separation_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 70-74 The wiggly lines above are due to the vocal component. Our goal is to separate them from the accompanying instrumentation. .. GENERATED FROM PYTHON SOURCE LINES 74-95 .. code-block:: Python # We'll compare frames using cosine similarity, and aggregate similar frames # by taking their (per-frequency) median value. # # To avoid being biased by local continuity, we constrain similar frames to be # separated by at least 2 seconds. # # This suppresses sparse/non-repetetitive deviations from the average spectrum, # and works well to discard vocal elements. S_filter = librosa.decompose.nn_filter(S_full, aggregate=np.median, metric='cosine', width=int(librosa.time_to_frames(2, sr=sr))) # The output of the filter shouldn't be greater than the input # if we assume signals are additive. Taking the pointwise minimum # with the input spectrum forces this. S_filter = np.minimum(S_full, S_filter) .. GENERATED FROM PYTHON SOURCE LINES 96-98 The raw filter output can be used as a mask, but it sounds better if we use soft-masking. .. GENERATED FROM PYTHON SOURCE LINES 98-119 .. code-block:: Python # We can also use a margin to reduce bleed between the vocals and instrumentation masks. # Note: the margins need not be equal for foreground and background separation margin_i, margin_v = 2, 10 power = 2 mask_i = librosa.util.softmask(S_filter, margin_i * (S_full - S_filter), power=power) mask_v = librosa.util.softmask(S_full - S_filter, margin_v * S_filter, power=power) # Once we have the masks, simply multiply them with the input spectrum # to separate the components S_foreground = mask_v * S_full S_background = mask_i * S_full .. GENERATED FROM PYTHON SOURCE LINES 120-121 Plot the same slice, but separated into its foreground and background .. GENERATED FROM PYTHON SOURCE LINES 121-141 .. code-block:: Python # sphinx_gallery_thumbnail_number = 2 fig, ax = plt.subplots(nrows=3, sharex=True, sharey=True) img = librosa.display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax[0]) ax[0].set(title='Full spectrum') ax[0].label_outer() librosa.display.specshow(librosa.amplitude_to_db(S_background[:, idx], ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax[1]) ax[1].set(title='Background') ax[1].label_outer() librosa.display.specshow(librosa.amplitude_to_db(S_foreground[:, idx], ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax[2]) ax[2].set(title='Foreground') fig.colorbar(img, ax=ax) .. image-sg:: /auto_examples/images/sphx_glr_plot_vocal_separation_002.png :alt: Full spectrum, Background, Foreground :srcset: /auto_examples/images/sphx_glr_plot_vocal_separation_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 142-145 Recover the foreground audio from the masked spectrogram. To do this, we'll need to re-introduce the phase information that we had previously set aside. .. GENERATED FROM PYTHON SOURCE LINES 145-149 .. code-block:: Python y_foreground = librosa.istft(S_foreground * phase) # Play back a 5-second excerpt with vocals Audio(data=y_foreground[10*sr:15*sr], rate=sr) .. raw:: html


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 19.445 seconds) .. _sphx_glr_download_auto_examples_plot_vocal_separation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_vocal_separation.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_vocal_separation.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_