Caution

You're reading the documentation for a development version. For the latest released version, please have a look at 0.11.0.

librosa.stream

librosa.stream(path, *, block_length, frame_length, hop_length, sr=None, mono=True, offset=0.0, duration=None, fill_value=None, res_type='soxr_hq', dtype=<class 'numpy.float32'>)[source]

Stream audio in fixed-length buffers.

This is primarily useful for processing large files that won’t fit entirely in memory at once.

Instead of loading the entire audio signal into memory (as in load, this function produces blocks of audio spanning a fixed number of frames at a specified frame length and hop length.

While this function strives for similar behavior to load, there are a few caveats that users should be aware of:

This function does not return audio buffers directly. It returns a generator, which you can iterate over to produce blocks of audio. A block, in this context, refers to a buffer of audio which spans a given number of (potentially overlapping) frames.

Automatic sample-rate conversion is supported, but not all combinations of block_length, frame_length, and hop_length will be compatible with all sampling rates. Specifically, (block_length * hop_length * native_sr) / sr must evaluate to an exact integer, where native_sr is the original sampling rate of the stream prior to resampling.

Many analyses require access to the entire signal to behave correctly, such as resample, cqt, or beat_track, so these methods will not be appropriate for streamed data.

The block_length parameter specifies how many frames of audio will be produced per block. Larger values will consume more memory, but will be more efficient to process down-stream. The best value will ultimately depend on your application and other system constraints.

By default, most librosa analyses (e.g., short-time Fourier transform) assume centered frames, which requires padding the signal at the beginning and end. This will not work correctly when the signal is carved into blocks, because it would introduce padding in the middle of the signal. To disable this feature, use center=False in all frame-based analyses.

If you break out of the generator loop early, the underlying audio file handle and resampling buffers will remain open until the generator object is garbage-collected. To explicitly release resources, call the generator’s .close() method.

See the examples below for proper usage of this function.

Parameters:

pathstr, int, sf.SoundFile, or file-like object

path to the input file to stream.

Any codec supported by soundfile is permitted here.

An existing soundfile.SoundFile object may also be provided.

block_lengthint > 0

The number of frames to include in each block.

Note that at the end of the file, there may not be enough data to fill an entire block, resulting in a shorter block by default. To pad the signal out so that blocks are always full length, set fill_value (see below).

frame_lengthint > 0

The number of samples per frame.

hop_lengthint > 0

The number of samples to advance between frames.

Note that by when hop_length < frame_length, neighboring frames will overlap. Similarly, the last frame of one block will overlap with the first frame of the next block.

srnumber > 0 [scalar]

target sampling rate. If not provided, the original sampling rate of the file will be used.

monobool

Convert the signal to mono during streaming

offsetfloat

Start reading after this time (in seconds)

If negative, it will be interpreted relative to the end of the file.

durationfloat

Only load up to this much audio (in seconds)

fill_valuefloat [optional]

If padding the signal to produce constant-length blocks, this value will be used at the end of the signal.

In most cases, fill_value=0 (silence) is expected, but you may specify any value here.

res_typestr

Resample type, must be one of the following:

‘soxr_vhq’, ‘soxr_hq’, ‘soxr_mq’ or ‘soxr_lq’: soxr Very high-, High-, Medium-, Low-quality FFT-based bandlimited interpolation. 'soxr_hq' is the default setting of soxr.
‘soxr_qq’: soxr Quick cubic interpolation (very fast, but not bandlimited)

dtypenumeric type

data type of audio buffers to be produced

Yields:

ynp.ndarray: An audio buffer of (at most) (block_length-1) * hop_length + frame_length samples.

See also

load
get_samplerate
soundfile.blocks

Examples

Apply a short-term Fourier transform to blocks of 256 frames at a time. Note that streaming operation requires left-aligned frames, so we must set center=False to avoid padding artifacts.

>>> filename = librosa.ex('brahms')
>>> sr = librosa.get_samplerate(filename)
>>> stream = librosa.stream(filename,
...                       block_length=256,
...                       frame_length=4096,
...                       hop_length=1024)
>>> for y_block in stream:
...     D_block = librosa.stft(y_block, center=False)

Or compute a mel spectrogram over a stream, using a shorter frame and non-overlapping windows

>>> filename = librosa.ex('brahms')
>>> sr = librosa.get_samplerate(filename)
>>> stream = librosa.stream(filename,
...                         block_length=256,
...                         frame_length=2048,
...                         hop_length=2048)
>>> for y_block in stream:
...     m_block = librosa.feature.melspectrogram(y=y_block, sr=sr,
...                                              n_fft=2048,
...                                              hop_length=2048,
...                                              center=False)