You're reading an old version of this documentation. If you want up-to-date information, please have a look at 0.10.1.


librosa.util.frame(x, frame_length, hop_length, axis=-1)[source]

Slice a data array into (overlapping) frames.

This implementation uses low-level stride manipulation to avoid making a copy of the data. The resulting frame representation is a new view of the same input data.

However, if the input data is not contiguous in memory, a warning will be issued and the output will be a full copy, rather than a view of the input data.

For example, a one-dimensional input x = [0, 1, 2, 3, 4, 5, 6] can be framed with frame length 3 and hop length 2 in two ways. The first (axis=-1), results in the array x_frames:

[[0, 2, 4],
 [1, 3, 5],
 [2, 4, 6]]

where each column x_frames[:, i] contains a contiguous slice of the input x[i * hop_length : i * hop_length + frame_length].

The second way (axis=0) results in the array x_frames:

[[0, 1, 2],
 [2, 3, 4],
 [4, 5, 6]]

where each row x_frames[i] contains a contiguous slice of the input.

This generalizes to higher dimensional inputs, as shown in the examples below. In general, the framing operation increments by 1 the number of dimensions, adding a new “frame axis” either to the end of the array (axis=-1) or the beginning of the array (axis=0).


Array to frame

frame_lengthint > 0 [scalar]

Length of the frame

hop_lengthint > 0 [scalar]

Number of steps to advance between frames

axis0 or -1

The axis along which to frame.

If axis=-1 (the default), then x is framed along its last dimension. x must be “F-contiguous” in this case.

If axis=0, then x is framed along its first dimension. x must be “C-contiguous” in this case.

x_framesnp.ndarray [shape=(…, frame_length, N_FRAMES) or (N_FRAMES, frame_length, …)]

A framed view of x, for example with axis=-1 (framing on the last dimension):

x_frames[..., j] == x[..., j * hop_length : j * hop_length + frame_length]

If axis=0 (framing on the first dimension), then:

x_frames[j] = x[j * hop_length : j * hop_length + frame_length]

If x is not an np.ndarray.

If x.shape[axis] < frame_length, there is not enough data to fill one frame.

If hop_length < 1, frames cannot advance.

If axis is not 0 or -1. Framing is only supported along the first or last axis.

See also


Convert data to F-contiguous representation


Convert data to C-contiguous representation


information about the memory layout of a numpy ndarray.


Extract 2048-sample frames from monophonic signal with a hop of 64 samples per frame

>>> y, sr = librosa.load(librosa.ex('trumpet'))
>>> frames = librosa.util.frame(y, frame_length=2048, hop_length=64)
>>> frames
array([[-1.407e-03, -2.604e-02, ..., -1.795e-05, -8.108e-06],
       [-4.461e-04, -3.721e-02, ..., -1.573e-05, -1.652e-05],
       [ 7.960e-02, -2.335e-01, ..., -6.815e-06,  1.266e-05],
       [ 9.568e-02, -1.252e-01, ...,  7.397e-06, -1.921e-05]],
>>> y.shape
>>> frames.shape
(2048, 1806)

Or frame along the first axis instead of the last:

>>> frames = librosa.util.frame(y, frame_length=2048, hop_length=64, axis=0)
>>> frames.shape
(1806, 2048)

Frame a stereo signal:

>>> y, sr = librosa.load(librosa.ex('trumpet', hq=True), mono=False)
>>> y.shape
(2, 117601)
>>> frames = librosa.util.frame(y, frame_length=2048, hop_length=64)
(2, 2048, 1806)

Carve an STFT into fixed-length patches of 32 frames with 50% overlap

>>> y, sr = librosa.load(librosa.ex('trumpet'))
>>> S = np.abs(librosa.stft(y))
>>> S.shape
(1025, 230)
>>> S_patch = librosa.util.frame(S, frame_length=32, hop_length=16)
>>> S_patch.shape
(1025, 32, 13)
>>> # The first patch contains the first 32 frames of S
>>> np.allclose(S_patch[:, :, 0], S[:, :32])
>>> # The second patch contains frames 16 to 16+32=48, and so on
>>> np.allclose(S_patch[:, :, 1], S[:, 16:48])