.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_viterbi.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_viterbi.py: ================ Viterbi decoding ================ This notebook demonstrates how to use Viterbi decoding to impose temporal smoothing on frame-wise state predictions. Our working example will be the problem of silence/non-silence detection. .. GENERATED FROM PYTHON SOURCE LINES 12-24 .. code-block:: Python # Code source: Brian McFee # License: ISC ################## # Standard imports import numpy as np import matplotlib.pyplot as plt import librosa import librosa.display .. GENERATED FROM PYTHON SOURCE LINES 25-26 Load an example signal .. GENERATED FROM PYTHON SOURCE LINES 26-40 .. code-block:: Python y, sr = librosa.load(librosa.ex('trumpet')) # And compute the spectrogram magnitude and phase S_full, phase = librosa.magphase(librosa.stft(y)) ################### # Plot the spectrum fig, ax = plt.subplots() img = librosa.display.specshow(librosa.amplitude_to_db(S_full, ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax) fig.colorbar(img, ax=ax); .. image-sg:: /auto_examples/images/sphx_glr_plot_viterbi_001.png :alt: plot viterbi :srcset: /auto_examples/images/sphx_glr_plot_viterbi_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 41-44 As you can see, there are periods of silence and non-silence throughout this recording. .. GENERATED FROM PYTHON SOURCE LINES 44-55 .. code-block:: Python # As a first step, we can plot the root-mean-square (RMS) curve rms = librosa.feature.rms(y=y)[0] times = librosa.frames_to_time(np.arange(len(rms))) fig, ax = plt.subplots() ax.plot(times, rms) ax.axhline(0.02, color='r', alpha=0.5) ax.set(xlabel='Time', ylabel='RMS'); .. image-sg:: /auto_examples/images/sphx_glr_plot_viterbi_002.png :alt: plot viterbi :srcset: /auto_examples/images/sphx_glr_plot_viterbi_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 56-60 The red line at 0.02 indicates a reasonable threshold for silence detection. However, the RMS curve occasionally dips below the threshold momentarily, and we would prefer the detector to not count these brief dips as silence. This is where the Viterbi algorithm comes in handy! .. GENERATED FROM PYTHON SOURCE LINES 62-72 As a first step, we will convert the raw RMS score into a likelihood (probability) by logistic mapping :math:`P[V=1 | x] = \frac{\exp(x - \tau)}{1 + \exp(x - \tau)}` where :math:`x` denotes the RMS value and :math:`\tau=0.02` is our threshold. The variable :math:`V` indicates whether the signal is non-silent (1) or silent (0). We'll normalize the RMS by its standard deviation to expand the range of the probability vector .. GENERATED FROM PYTHON SOURCE LINES 72-76 .. code-block:: Python r_normalized = (rms - 0.02) / np.std(rms) p = np.exp(r_normalized) / (1 + np.exp(r_normalized)) .. GENERATED FROM PYTHON SOURCE LINES 77-78 We can plot the probability curve over time: .. GENERATED FROM PYTHON SOURCE LINES 78-85 .. code-block:: Python fig, ax = plt.subplots() ax.plot(times, p, label='P[V=1|x]') ax.axhline(0.5, color='r', alpha=0.5, label='Descision threshold') ax.set(xlabel='Time') ax.legend(); .. image-sg:: /auto_examples/images/sphx_glr_plot_viterbi_003.png :alt: plot viterbi :srcset: /auto_examples/images/sphx_glr_plot_viterbi_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 86-89 which looks much like the first plot, but with the decision threshold shifted to 0.5. A simple silence detector would classify each frame independently of its neighbors, which would result in the following plot: .. GENERATED FROM PYTHON SOURCE LINES 89-100 .. code-block:: Python plt.figure(figsize=(12, 6)) fig, ax = plt.subplots(nrows=2, sharex=True) librosa.display.specshow(librosa.amplitude_to_db(S_full, ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax[0]) ax[0].label_outer() ax[1].step(times, p>=0.5, label='Non-silent') ax[1].set(ylim=[0, 1.05]) ax[1].legend() .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_examples/images/sphx_glr_plot_viterbi_004.png :alt: plot viterbi :srcset: /auto_examples/images/sphx_glr_plot_viterbi_004.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/images/sphx_glr_plot_viterbi_005.png :alt: plot viterbi :srcset: /auto_examples/images/sphx_glr_plot_viterbi_005.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 101-109 We can do better using the Viterbi algorithm. We'll use state 0 to indicate silent, and 1 to indicate non-silent. We'll assume that a silent frame is equally likely to be followed by silence or non-silence, but that non-silence is slightly more likely to be followed by non-silence. This is accomplished by building a self-loop transition matrix, where `transition[i, j]` is the probability of moving from state `i` to state `j` in the next frame. .. GENERATED FROM PYTHON SOURCE LINES 109-113 .. code-block:: Python transition = librosa.sequence.transition_loop(2, [0.5, 0.6]) print(transition) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0.5 0.5] [0.4 0.6]] .. GENERATED FROM PYTHON SOURCE LINES 114-116 Our `p` variable only indicates the probability of non-silence, so we need to also compute the probability of silence as its complement. .. GENERATED FROM PYTHON SOURCE LINES 116-120 .. code-block:: Python full_p = np.vstack([1 - p, p]) print(full_p) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0.13145119 0.12914789 0.10023135 0.08308673 0.0896498 0.09849459 0.11708623 0.15954721 0.15366054 0.10710979 0.07477838 0.05496722 0.0554986 0.07781726 0.11711758 0.16114128 0.18928099 0.21416843 0.21213973 0.19609505 0.20449305 0.2270965 0.27209795 0.3001222 0.17316502 0.09887975 0.06782663 0.05833912 0.07995546 0.13456148 0.20417827 0.24610686 0.2866913 0.28002036 0.27008408 0.27990294 0.3091812 0.3000993 0.1969521 0.12827969 0.08002245 0.05841959 0.07621807 0.11728209 0.20397991 0.24574691 0.23458302 0.18702763 0.13608563 0.14634806 0.12772375 0.1288073 0.1530726 0.17516875 0.2165522 0.27420092 0.33675003 0.43482685 0.48404706 0.25782114 0.17491055 0.16958952 0.15419865 0.19936448 0.28010595 0.27197605 0.2870167 0.31801683 0.31980097 0.3321712 0.35508192 0.3624373 0.35120147 0.31828165 0.29999107 0.29278505 0.30191094 0.33024365 0.36096853 0.40480107 0.45284116 0.49093616 0.54277027 0.56993824 0.5704266 0.5268004 0.25468153 0.14759523 0.09926701 0.07283568 0.07515341 0.08709419 0.10409516 0.14761192 0.21817732 0.32771122 0.5189303 0.55528986 0.56176543 0.5374347 0.37758958 0.25777513 0.19084197 0.14156103 0.11547786 0.10913092 0.10770041 0.1312679 0.16778678 0.19980133 0.21077996 0.1821512 0.16594243 0.16683942 0.17637831 0.188272 0.20692706 0.21882313 0.23622775 0.2530697 0.25842535 0.26530695 0.27669293 0.2868473 0.30373114 0.32258862 0.33911967 0.36671662 0.40503937 0.44620156 0.5000478 0.549015 0.5662594 0.57089055 0.5719327 0.5729972 0.5738338 0.57463896 0.57614064 0.57710177 0.57834494 0.5796151 0.5806234 0.5815384 0.5821718 0.5833256 0.5844451 0.58522177 0.5857115 0.58568585 0.5859337 0.58618915 0.58630896 0.5865722 0.5867528 0.586874 0.58707404 0.587249 0.5873786 0.5875926 0.5878352 0.5880511 0.588247 0.5883261 0.5883925 0.5884396 0.58849066 0.5886303 0.58867866 0.5886854 0.5886949 0.5887034 0.5887745 0.5888357 0.5888435 0.58883864 0.58883625 0.58885485 0.58887374 0.5888771 0.58888704 0.5889026 0.5889103 0.5889286 0.5889324 0.58894205 0.5889672 0.5889791 0.5889935 0.58900595 0.5890088 0.5890169 0.58902466 0.5890299 0.58903694 0.5890435 0.5890446 0.58903813 0.58903635 0.5890361 0.58903944 0.58904475 0.5890448 0.5890424 0.5890394 0.5890368 0.5890384 0.5890396 0.5890429 0.5890487 0.5890497 0.5890506 0.589049 0.58904666 0.5890452 0.5890439 0.58904463 0.5890455 0.58904576 0.5890466 0.5890502 0.5890561 0.58905953 0.589064 0.5890665 0.58906525 0.5890669 0.589068 0.5890698 0.5890719 ] [0.8685488 0.8708521 0.89976865 0.9169133 0.9103502 0.9015054 0.88291377 0.8404528 0.84633946 0.8928902 0.9252216 0.9450328 0.9445014 0.92218274 0.8828824 0.8388587 0.810719 0.7858316 0.7878603 0.80390495 0.79550695 0.7729035 0.72790205 0.6998778 0.826835 0.90112025 0.9321734 0.9416609 0.92004454 0.8654385 0.7958217 0.75389314 0.7133087 0.71997964 0.7299159 0.72009706 0.6908188 0.6999007 0.8030479 0.8717203 0.91997755 0.9415804 0.92378193 0.8827179 0.7960201 0.7542531 0.765417 0.81297237 0.8639144 0.85365194 0.87227625 0.8711927 0.8469274 0.82483125 0.7834478 0.7257991 0.66324997 0.56517315 0.51595294 0.74217886 0.82508945 0.8304105 0.84580135 0.8006355 0.71989405 0.72802395 0.7129833 0.6819832 0.680199 0.6678288 0.6449181 0.6375627 0.6487985 0.68171835 0.7000089 0.70721495 0.69808906 0.66975635 0.63903147 0.5951989 0.54715884 0.50906384 0.45722976 0.43006176 0.42957345 0.4731996 0.7453185 0.8524048 0.900733 0.9271643 0.9248466 0.9129058 0.89590484 0.8523881 0.7818227 0.6722888 0.48106965 0.44471014 0.43823457 0.4625653 0.6224104 0.7422249 0.809158 0.85843897 0.88452214 0.8908691 0.8922996 0.8687321 0.8322132 0.8001987 0.78922004 0.8178488 0.83405757 0.8331606 0.8236217 0.811728 0.79307294 0.78117687 0.76377225 0.7469303 0.74157465 0.73469305 0.7233071 0.7131527 0.69626886 0.6774114 0.6608803 0.6332834 0.59496063 0.55379844 0.49995217 0.450985 0.43374065 0.42910945 0.42806736 0.42700282 0.42616615 0.42536104 0.4238594 0.42289823 0.4216551 0.42038485 0.41937658 0.41846162 0.41782823 0.41667438 0.41555488 0.41477823 0.41428855 0.41431412 0.41406628 0.41381082 0.413691 0.41342783 0.41324723 0.41312596 0.41292593 0.41275102 0.4126214 0.41240737 0.41216478 0.41194892 0.411753 0.41167387 0.41160753 0.4115604 0.41150934 0.41136965 0.41132134 0.41131464 0.41130513 0.41129664 0.4112255 0.4111643 0.41115648 0.41116136 0.41116375 0.41114515 0.4111263 0.4111229 0.41111296 0.41109738 0.41108972 0.41107145 0.41106758 0.41105795 0.41103283 0.41102085 0.4110065 0.41099402 0.41099116 0.4109831 0.4109753 0.4109701 0.41096306 0.4109565 0.41095543 0.41096187 0.41096365 0.41096386 0.41096056 0.41095525 0.41095516 0.41095757 0.4109606 0.41096318 0.4109616 0.41096038 0.41095707 0.41095132 0.41095033 0.4109494 0.41095105 0.41095334 0.41095477 0.41095608 0.41095537 0.4109545 0.41095427 0.41095343 0.4109498 0.41094393 0.41094047 0.41093597 0.4109335 0.41093475 0.41093308 0.410932 0.41093022 0.41092807]] .. GENERATED FROM PYTHON SOURCE LINES 121-124 Now, we're ready to decode! We'll use `viterbi_discriminative` here, since the inputs are state likelihoods conditional on data (in our case, data is rms). .. GENERATED FROM PYTHON SOURCE LINES 124-138 .. code-block:: Python states = librosa.sequence.viterbi_discriminative(full_p, transition) # sphinx_gallery_thumbnail_number = 5 fig, ax = plt.subplots(nrows=2, sharex=True) librosa.display.specshow(librosa.amplitude_to_db(S_full, ref=np.max), y_axis='log', x_axis='time', sr=sr, ax=ax[0]) ax[0].label_outer() ax[1].step(times, p>=0.5, label='Frame-wise') ax[1].step(times, states, linestyle='--', color='orange', label='Viterbi') ax[1].set(ylim=[0, 1.05]) ax[1].legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_viterbi_006.png :alt: plot viterbi :srcset: /auto_examples/images/sphx_glr_plot_viterbi_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 139-144 Note how the Viterbi output has fewer state changes than the frame-wise predictor, and it is less sensitive to momentary dips in energy. This is controlled directly by the transition matrix. A higher self-transition probability means that the decoder is less likely to change states. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.172 seconds) .. _sphx_glr_download_auto_examples_plot_viterbi.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_viterbi.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_viterbi.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_