Advanced I/O Use Cases

This section covers advanced use cases for input and output which go beyond the I/O functionality currently provided by librosa.

Read specific formats

librosa uses soundfile and audioread for reading audio. As of v0.7, librosa uses soundfile by default, and falls back on audioread only when dealing with codecs unsupported by soundfile. For a list of codecs supported by soundfile, see the libsndfile documentation.

Warning

audioread support is deprecated as of librosa 0.10.0, and will be removed completely in version 1.0.

Note

See installation instruction for PySoundFile here.

Librosa’s load function is meant for the common case where you want to load an entire (fragment of a) recording into memory, but some applications require more flexibility. In these cases, we recommend using soundfile directly. Reading audio files using soundfile is similar to the method in librosa. One important difference is that the read data is of shape (nb_samples, nb_channels) compared to (nb_channels, nb_samples) in librosa.core.load. Also the signal is not resampled to 22050 Hz by default, hence it would need be transposed and resampled for further processing in librosa. The following example is equivalent to librosa.load(librosa.util.ex(‘trumpet’)):

1import librosa
2import soundfile as sf
3
4# Get example audio file
5filename = librosa.ex('trumpet')
6
7data, samplerate = sf.read(filename, dtype='float32')
8data = data.T
9data_22k = librosa.resample(data, samplerate, 22050)

Blockwise Reading

For large audio signals it could be beneficial to not load the whole audio file into memory. Librosa 0.7 introduced a streaming interface, which can be used to work on short fragments of audio sequentially. librosa.stream cuts an input file into blocks of audio, which correspond to a given number of frames, which can be iterated over as in the following example:

 1import librosa
 2
 3sr = librosa.get_samplerate('/path/to/file.wav')
 4
 5# Set the frame parameters to be equivalent to the librosa defaults
 6# in the file's native sampling rate
 7frame_length = (2048 * sr) // 22050
 8hop_length = (512 * sr) // 22050
 9
10# Stream the data, working on 128 frames at a time
11stream = librosa.stream('path/to/file.wav',
12                        block_length=128,
13                        frame_length=frame_length,
14                        hop_length=hop_length)
15
16chromas = []
17for y in stream:
18   chroma_block = librosa.feature.chroma_stft(y=y, sr=sr,
19                                              n_fft=frame_length,
20                                              hop_length=hop_length,
21                                              center=False)
22   chromas.append(chromas)

In this example, each audio fragment y will consist of 128 frames worth of samples, or more specifically, len(y) == frame_length + (block_length - 1) * hop_length. Each fragment y will overlap with the subsequent fragment by frame_length - hop_length samples, which ensures that stream processing will provide equivalent results to if the entire sequence was processed in one step (assuming padding / centering is disabled).

For more details about the streaming interface, refer to librosa.stream.

Read file-like objects

If you want to read audio from file-like objects (also called virtual files) you can use soundfile as well. (This will also work with librosa.load and librosa.stream, provided that the underlying codec is supported by soundfile.)

E.g.: read files from zip compressed archives:

1import zipfile as zf
2import soundfile as sf
3import io
4
5with zf.ZipFile('test.zip') as myzip:
6    with myzip.open('stereo_file.wav') as myfile:
7        tmp = io.BytesIO(myfile.read())
8        data, samplerate = sf.read(tmp)

Download and read from URL:

1import soundfile as sf
2import io
3
4from six.moves.urllib.request import urlopen
5
6url = "https://raw.githubusercontent.com/librosa/librosa/master/tests/data/test1_44100.wav"
7
8data, samplerate = sf.read(io.BytesIO(urlopen(url).read()))

Write out audio files

PySoundFile provides output functionality that can be used directly with numpy array audio buffers:

 1import numpy as np
 2import soundfile as sf
 3
 4rate = 44100
 5data = np.random.uniform(-1, 1, size=(rate * 10, 2))
 6
 7# Write out audio as 24bit PCM WAV
 8sf.write('stereo_file.wav', data, samplerate, subtype='PCM_24')
 9
10# Write out audio as 24bit Flac
11sf.write('stereo_file.flac', data, samplerate, format='flac', subtype='PCM_24')
12
13# Write out audio as 16bit OGG
14sf.write('stereo_file.ogg', data, samplerate, format='ogg', subtype='vorbis')