.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_vocal_separation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_plot_vocal_separation.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_vocal_separation.py:


================
Vocal separation
================

This notebook demonstrates a simple technique for separating vocals (and
other sporadic foreground signals) from accompanying instrumentation.

This is based on the "REPET-SIM" method of `Rafii and Pardo, 2012
<http://www.cs.northwestern.edu/~zra446/doc/Rafii-Pardo%20-%20Music-Voice%20Separation%20using%20the%20Similarity%20Matrix%20-%20ISMIR%202012.pdf>`_, but includes a couple of modifications and extensions:

    - FFT windows overlap by 1/4, instead of 1/2
    - Non-local filtering is converted into a soft mask by Wiener filtering.
      This is similar in spirit to the soft-masking method used by `Fitzgerald, 2012
      <http://arrow.dit.ie/cgi/viewcontent.cgi?article=1086&context=argcon>`_,
      but is a bit more numerically stable in practice.

.. GENERATED FROM PYTHON SOURCE LINES 19-32

.. code-block:: default


    # Code source: Brian McFee
    # License: ISC

    ##################
    # Standard imports
    from __future__ import print_function
    import numpy as np
    import matplotlib.pyplot as plt
    import librosa

    import librosa.display


.. GENERATED FROM PYTHON SOURCE LINES 33-34

Load an example with vocals.

.. GENERATED FROM PYTHON SOURCE LINES 34-41

.. code-block:: default

    y, sr = librosa.load('audio/Cheese_N_Pot-C_-_16_-_The_Raps_Well_Clean_Album_Version.mp3', duration=120)


    # And compute the spectrogram magnitude and phase
    S_full, phase = librosa.magphase(librosa.stft(y))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/core/audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
      warnings.warn('PySoundFile failed. Trying audioread instead.')


.. GENERATED FROM PYTHON SOURCE LINES 42-43

Plot a 5-second slice of the spectrum

.. GENERATED FROM PYTHON SOURCE LINES 43-50

.. code-block:: default

    idx = slice(*librosa.time_to_frames([30, 35], sr=sr))
    plt.figure(figsize=(12, 4))
    librosa.display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max),
                             y_axis='log', x_axis='time', sr=sr)
    plt.colorbar()
    plt.tight_layout()


.. image-sg:: /auto_examples/images/sphx_glr_plot_vocal_separation_001.png
   :alt: plot vocal separation
   :srcset: /auto_examples/images/sphx_glr_plot_vocal_separation_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/display.py:862: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
      scaler(mode, **kwargs)
    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/display.py:862: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
      scaler(mode, **kwargs)
    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/display.py:862: MatplotlibDeprecationWarning: The 'linscaley' parameter of __init__() has been renamed 'linscale' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
      scaler(mode, **kwargs)


.. GENERATED FROM PYTHON SOURCE LINES 51-55

The wiggly lines above are due to the vocal component.
Our goal is to separate them from the accompanying
instrumentation.


.. GENERATED FROM PYTHON SOURCE LINES 55-76

.. code-block:: default


    # We'll compare frames using cosine similarity, and aggregate similar frames
    # by taking their (per-frequency) median value.
    #
    # To avoid being biased by local continuity, we constrain similar frames to be
    # separated by at least 2 seconds.
    #
    # This suppresses sparse/non-repetetitive deviations from the average spectrum,
    # and works well to discard vocal elements.

    S_filter = librosa.decompose.nn_filter(S_full,
                                           aggregate=np.median,
                                           metric='cosine',
                                           width=int(librosa.time_to_frames(2, sr=sr)))

    # The output of the filter shouldn't be greater than the input
    # if we assume signals are additive.  Taking the pointwise minimium
    # with the input spectrum forces this.
    S_filter = np.minimum(S_full, S_filter)


.. GENERATED FROM PYTHON SOURCE LINES 77-79

The raw filter output can be used as a mask,
but it sounds better if we use soft-masking.

.. GENERATED FROM PYTHON SOURCE LINES 79-100

.. code-block:: default


    # We can also use a margin to reduce bleed between the vocals and instrumentation masks.
    # Note: the margins need not be equal for foreground and background separation
    margin_i, margin_v = 2, 10
    power = 2

    mask_i = librosa.util.softmask(S_filter,
                                   margin_i * (S_full - S_filter),
                                   power=power)

    mask_v = librosa.util.softmask(S_full - S_filter,
                                   margin_v * S_filter,
                                   power=power)

    # Once we have the masks, simply multiply them with the input spectrum
    # to separate the components

    S_foreground = mask_v * S_full
    S_background = mask_i * S_full


.. GENERATED FROM PYTHON SOURCE LINES 101-102

Plot the same slice, but separated into its foreground and background

.. GENERATED FROM PYTHON SOURCE LINES 102-124

.. code-block:: default


    # sphinx_gallery_thumbnail_number = 2

    plt.figure(figsize=(12, 8))
    plt.subplot(3, 1, 1)
    librosa.display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max),
                             y_axis='log', sr=sr)
    plt.title('Full spectrum')
    plt.colorbar()

    plt.subplot(3, 1, 2)
    librosa.display.specshow(librosa.amplitude_to_db(S_background[:, idx], ref=np.max),
                             y_axis='log', sr=sr)
    plt.title('Background')
    plt.colorbar()
    plt.subplot(3, 1, 3)
    librosa.display.specshow(librosa.amplitude_to_db(S_foreground[:, idx], ref=np.max),
                             y_axis='log', x_axis='time', sr=sr)
    plt.title('Foreground')
    plt.colorbar()
    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_plot_vocal_separation_002.png
   :alt: Full spectrum, Background, Foreground
   :srcset: /auto_examples/images/sphx_glr_plot_vocal_separation_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/display.py:862: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
      scaler(mode, **kwargs)
    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/display.py:862: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
      scaler(mode, **kwargs)
    /tmp/tmpfl4ra6qp/b0064fe7dbe8048b1d4148e61a568b6fe3fca91b/librosa/display.py:862: MatplotlibDeprecationWarning: The 'linscaley' parameter of __init__() has been renamed 'linscale' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
      scaler(mode, **kwargs)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  28.369 seconds)


.. _sphx_glr_download_auto_examples_plot_vocal_separation.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_vocal_separation.py <plot_vocal_separation.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_vocal_separation.ipynb <plot_vocal_separation.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_