- librosa.stream(path, *, block_length, frame_length, hop_length, mono=True, offset=0.0, duration=None, fill_value=None, dtype=<class 'numpy.float32'>)¶
Stream audio in fixed-length buffers.
This is primarily useful for processing large files that won’t fit entirely in memory at once.
Instead of loading the entire audio signal into memory (as in
load, this function produces blocks of audio spanning a fixed number of frames at a specified frame length and hop length.
While this function strives for similar behavior to
load, there are a few caveats that users should be aware of:
This function does not return audio buffers directly. It returns a generator, which you can iterate over to produce blocks of audio. A block, in this context, refers to a buffer of audio which spans a given number of (potentially overlapping) frames.
Automatic sample-rate conversion is not supported. Audio will be streamed in its native sample rate, so no default values are provided for
hop_length. It is recommended that you first get the sampling rate for the file in question, using
get_samplerate, and set these parameters accordingly.
block_lengthparameter specifies how many frames of audio will be produced per block. Larger values will consume more memory, but will be more efficient to process down-stream. The best value will ultimately depend on your application and other system constraints.
By default, most librosa analyses (e.g., short-time Fourier transform) assume centered frames, which requires padding the signal at the beginning and end. This will not work correctly when the signal is carved into blocks, because it would introduce padding in the middle of the signal. To disable this feature, use
center=Falsein all frame-based analyses.
See the examples below for proper usage of this function.
- pathstring, int, sf.SoundFile, or file-like object
path to the input file to stream.
Any codec supported by
soundfileis permitted here.
soundfile.SoundFileobject may also be provided.
- block_lengthint > 0
The number of frames to include in each block.
Note that at the end of the file, there may not be enough data to fill an entire block, resulting in a shorter block by default. To pad the signal out so that blocks are always full length, set
- frame_lengthint > 0
The number of samples per frame.
- hop_lengthint > 0
The number of samples to advance between frames.
Note that by when
hop_length < frame_length, neighboring frames will overlap. Similarly, the last frame of one block will overlap with the first frame of the next block.
Convert the signal to mono during streaming
Start reading after this time (in seconds)
Only load up to this much audio (in seconds)
- fill_valuefloat [optional]
If padding the signal to produce constant-length blocks, this value will be used at the end of the signal.
In most cases,
fill_value=0(silence) is expected, but you may specify any value here.
- dtypenumeric type
data type of audio buffers to be produced
An audio buffer of (at most)
(block_length-1) * hop_length + frame_lengthsamples.
Apply a short-term Fourier transform to blocks of 256 frames at a time. Note that streaming operation requires left-aligned frames, so we must set
center=Falseto avoid padding artifacts.
>>> filename = librosa.ex('brahms') >>> sr = librosa.get_samplerate(filename) >>> stream = librosa.stream(filename, ... block_length=256, ... frame_length=4096, ... hop_length=1024) >>> for y_block in stream: ... D_block = librosa.stft(y_block, center=False)
Or compute a mel spectrogram over a stream, using a shorter frame and non-overlapping windows
>>> filename = librosa.ex('brahms') >>> sr = librosa.get_samplerate(filename) >>> stream = librosa.stream(filename, ... block_length=256, ... frame_length=2048, ... hop_length=2048) >>> for y_block in stream: ... m_block = librosa.feature.melspectrogram(y=y_block, sr=sr, ... n_fft=2048, ... hop_length=2048, ... center=False)