Skip to content

AudioFile

AudioFile extends File and provides additional methods for working with audio files.

AudioFile instances are created when a DataChain is initialized from storage with the type="audio" parameter:

import datachain as dc

chain = dc.read_storage("s3://bucket-name/", type="audio")

There are additional models for working with audio files:

  • AudioFragment - represents a fragment of an audio file.

These are virtual models that do not create physical files. Instead, they are used to represent the data in the AudioFile these models are referring to. If you need to save the data, you can use the save method of these models, allowing you to save data locally or upload it to a storage service.

For a complete example of audio processing with DataChain, see Audio-to-Text with Whisper β€” a speech recognition pipeline that uses AudioFile, AudioFragment, and Audio to chunk audio files and transcribe them.

AudioFile

AudioFile(**kwargs)

Bases: File

A data model for handling audio files.

This model inherits from the File model and provides additional functionality for reading audio files, extracting audio fragments, and splitting audio into fragments.

Source code in datachain/lib/file.py
def __init__(self, **kwargs):
    super().__init__(**kwargs)
    self._catalog = None
    self._caching_enabled: bool = False
    self._download_cb: Callback = DEFAULT_CALLBACK

get_fragment

get_fragment(start: float, end: float) -> AudioFragment

Returns an audio fragment from the specified time range. It does not download the file, neither it actually extracts the fragment. It returns a Model representing the audio fragment, which can be used to read or save it later.

Parameters:

  • start (float) –

    The start time of the fragment in seconds.

  • end (float) –

    The end time of the fragment in seconds.

Returns:

  • AudioFragment ( AudioFragment ) –

    A Model representing the audio fragment.

Source code in datachain/lib/file.py
def get_fragment(self, start: float, end: float) -> "AudioFragment":
    """
    Returns an audio fragment from the specified time range. It does not
    download the file, neither it actually extracts the fragment. It returns
    a Model representing the audio fragment, which can be used to read or save
    it later.

    Args:
        start (float): The start time of the fragment in seconds.
        end (float): The end time of the fragment in seconds.

    Returns:
        AudioFragment: A Model representing the audio fragment.
    """
    if start < 0 or end < 0 or start >= end:
        raise ValueError(
            f"Can't get audio fragment for '{self.path}', "
            f"invalid time range: ({start:.3f}, {end:.3f})"
        )

    return AudioFragment(audio=self, start=start, end=end)

get_fragments

get_fragments(
    duration: float,
    start: float = 0,
    end: float | None = None,
) -> Iterator[AudioFragment]

Splits the audio into multiple fragments of a specified duration.

Parameters:

  • duration (float) –

    The duration of each audio fragment in seconds.

  • start (float, default: 0 ) –

    The starting time in seconds (default: 0).

  • end (float, default: None ) –

    The ending time in seconds. If None, the entire remaining audio is processed (default: None).

Returns:

Note

If end is not specified, number of samples will be taken from the audio file, this means audio file needs to be downloaded.

Source code in datachain/lib/file.py
def get_fragments(
    self,
    duration: float,
    start: float = 0,
    end: float | None = None,
) -> "Iterator[AudioFragment]":
    """
    Splits the audio into multiple fragments of a specified duration.

    Args:
        duration (float): The duration of each audio fragment in seconds.
        start (float): The starting time in seconds (default: 0).
        end (float, optional): The ending time in seconds. If None, the entire
                               remaining audio is processed (default: None).

    Returns:
        Iterator[AudioFragment]: An iterator yielding audio fragments.

    Note:
        If end is not specified, number of samples will be taken from the
        audio file, this means audio file needs to be downloaded.
    """
    if duration <= 0:
        raise ValueError("duration must be a positive float")
    if start < 0:
        raise ValueError("start must be a non-negative float")

    if end is None:
        end = self.get_info().duration

    if end < 0:
        raise ValueError("end must be a non-negative float")
    if start >= end:
        raise ValueError("start must be less than end")

    while start < end:
        yield self.get_fragment(start, min(start + duration, end))
        start += duration

get_info

get_info() -> Audio

Retrieves metadata and information about the audio file. It does not download the file if possible, only reads its header. It is thus might be a good idea to disable caching and prefetching for UDF if you only need audio metadata.

Returns:

  • Audio ( Audio ) –

    A Model containing audio metadata such as duration, sample rate, channels, and codec details.

Source code in datachain/lib/file.py
def get_info(self) -> "Audio":
    """
    Retrieves metadata and information about the audio file. It does not
    download the file if possible, only reads its header. It is thus might be
    a good idea to disable caching and prefetching for UDF if you only need
    audio metadata.

    Returns:
        Audio: A Model containing audio metadata such as duration,
               sample rate, channels, and codec details.
    """
    from .audio import audio_info

    return audio_info(self)

save

save(
    output: str,
    format: str | None = None,
    start: float = 0,
    end: float | None = None,
    client_config: dict | None = None,
) -> AudioFile

Save audio file or extract fragment to specified format.

Parameters:

  • output (str) –

    Output directory path

  • format (str | None, default: None ) –

    Output format ('wav', 'mp3', etc). Defaults to source format

  • start (float, default: 0 ) –

    Start time in seconds (>= 0). Defaults to 0

  • end (float | None, default: None ) –

    End time in seconds. If None, extracts to end of file

  • client_config (dict | None, default: None ) –

    Optional client configuration

Returns:

  • AudioFile ( AudioFile ) –

    New audio file with format conversion/extraction applied

Examples:

audio.save("/path", "mp3") # Entire file to MP3 audio.save("s3://bucket/path", "wav", start=2.5) # From 2.5s to end as WAV audio.save("/path", "flac", start=1, end=3) # 1-3s fragment as FLAC

Source code in datachain/lib/file.py
def save(  # type: ignore[override]
    self,
    output: str,
    format: str | None = None,
    start: float = 0,
    end: float | None = None,
    client_config: dict | None = None,
) -> "AudioFile":
    """Save audio file or extract fragment to specified format.

    Args:
        output: Output directory path
        format: Output format ('wav', 'mp3', etc). Defaults to source format
        start: Start time in seconds (>= 0). Defaults to 0
        end: End time in seconds. If None, extracts to end of file
        client_config: Optional client configuration

    Returns:
        AudioFile: New audio file with format conversion/extraction applied

    Examples:
        audio.save("/path", "mp3")                        # Entire file to MP3
        audio.save("s3://bucket/path", "wav", start=2.5)  # From 2.5s to end as WAV
        audio.save("/path", "flac", start=1, end=3)       # 1-3s fragment as FLAC
    """
    from .audio import save_audio

    return save_audio(self, output, format, start, end)

AudioFragment

Bases: DataModel

A data model for representing an audio fragment.

This model represents a specific fragment within an audio file with defined start and end times. It allows access to individual fragments and provides functionality for reading and saving audio fragments as separate audio files.

Attributes:

  • audio (AudioFile) –

    The audio file containing the audio fragment.

  • start (float) –

    The starting time of the audio fragment in seconds.

  • end (float) –

    The ending time of the audio fragment in seconds.

get_np

get_np() -> tuple[ndarray, int]

Returns the audio fragment as a NumPy array with sample rate.

Returns:

  • tuple[ndarray, int] –

    tuple[ndarray, int]: A tuple containing the audio data as a NumPy array and the sample rate.

Source code in datachain/lib/file.py
def get_np(self) -> tuple["ndarray", int]:
    """
    Returns the audio fragment as a NumPy array with sample rate.

    Returns:
        tuple[ndarray, int]: A tuple containing the audio data as a NumPy array
                           and the sample rate.
    """
    from .audio import audio_to_np

    duration = self.end - self.start
    return audio_to_np(self.audio, self.start, duration)

read_bytes

read_bytes(format: str = 'wav') -> bytes

Returns the audio fragment as audio bytes.

Parameters:

  • format (str, default: 'wav' ) –

    The desired audio format (e.g., 'wav', 'mp3'). Defaults to 'wav'.

Returns:

  • bytes ( bytes ) –

    The encoded audio fragment as bytes.

Source code in datachain/lib/file.py
def read_bytes(self, format: str = "wav") -> bytes:
    """
    Returns the audio fragment as audio bytes.

    Args:
        format (str): The desired audio format (e.g., 'wav', 'mp3').
                     Defaults to 'wav'.

    Returns:
        bytes: The encoded audio fragment as bytes.
    """
    from .audio import audio_to_bytes

    duration = self.end - self.start
    return audio_to_bytes(self.audio, format, self.start, duration)

save

save(output: str, format: str | None = None) -> AudioFile

Saves the audio fragment as a new audio file.

If output is a remote path, the audio file will be uploaded to remote storage.

Parameters:

  • output (str) –

    The destination path, which can be a local file path or a remote URL.

  • format (str, default: None ) –

    The output audio format (e.g., 'wav', 'mp3'). If None, the format is inferred from the file extension.

Returns:

  • AudioFile ( AudioFile ) –

    A Model representing the saved audio file.

Source code in datachain/lib/file.py
def save(self, output: str, format: str | None = None) -> "AudioFile":
    """
    Saves the audio fragment as a new audio file.

    If `output` is a remote path, the audio file will be uploaded to remote storage.

    Args:
        output (str): The destination path, which can be a local file path
                      or a remote URL.
        format (str, optional): The output audio format (e.g., 'wav', 'mp3').
                                If None, the format is inferred from the
                                file extension.

    Returns:
        AudioFile: A Model representing the saved audio file.
    """
    from .audio import save_audio

    return save_audio(self.audio, output, format, self.start, self.end)

Audio

Bases: DataModel

A data model representing metadata for an audio file.

Attributes:

  • sample_rate (int) –

    The sample rate of the audio (samples per second). Defaults to -1 if unknown.

  • channels (int) –

    The number of audio channels. Defaults to -1 if unknown.

  • duration (float) –

    The total duration of the audio in seconds. Defaults to -1.0 if unknown.

  • samples (int) –

    The total number of samples in the audio. Defaults to -1 if unknown.

  • format (str) –

    The format of the audio file (e.g., 'wav', 'mp3'). Defaults to an empty string.

  • codec (str) –

    The codec used for encoding the audio. Defaults to an empty string.

  • bit_rate (int) –

    The bit rate of the audio in bits per second. Defaults to -1 if unknown.

get_channel_name staticmethod

get_channel_name(
    num_channels: int, channel_idx: int
) -> str

Map channel index to meaningful name based on common audio formats

Source code in datachain/lib/file.py
@staticmethod
def get_channel_name(num_channels: int, channel_idx: int) -> str:
    """Map channel index to meaningful name based on common audio formats"""
    channel_mappings = {
        1: ["Mono"],
        2: ["Left", "Right"],
        4: ["W", "X", "Y", "Z"],  # First-order Ambisonics
        6: ["FL", "FR", "FC", "LFE", "BL", "BR"],  # 5.1 surround
        8: ["FL", "FR", "FC", "LFE", "BL", "BR", "SL", "SR"],  # 7.1 surround
    }

    if num_channels in channel_mappings:
        channels = channel_mappings[num_channels]
        if 0 <= channel_idx < len(channels):
            return channels[channel_idx]

    return f"Ch{channel_idx + 1}"