AudioFile
AudioFile extends File and provides additional methods for working with audio files.
AudioFile instances are created when a DataChain is initialized from storage with the type="audio" parameter:
There are additional models for working with audio files:
AudioFragment- represents a fragment of an audio file.
These are virtual models that do not create physical files.
Instead, they are used to represent the data in the AudioFile these models are referring to.
If you need to save the data, you can use the save method of these models,
allowing you to save data locally or upload it to a storage service.
For a complete example of audio processing with DataChain, see
Audio-to-Text with Whisper β
a speech recognition pipeline that uses AudioFile, AudioFragment, and Audio
to chunk audio files and transcribe them.
AudioFile
Bases: File
A data model for handling audio files.
This model inherits from the File model and provides additional functionality
for reading audio files, extracting audio fragments, and splitting audio into
fragments.
Source code in datachain/lib/file.py
get_fragment
get_fragment(start: float, end: float) -> AudioFragment
Returns an audio fragment from the specified time range. It does not download the file, neither it actually extracts the fragment. It returns a Model representing the audio fragment, which can be used to read or save it later.
Parameters:
-
start(float) βThe start time of the fragment in seconds.
-
end(float) βThe end time of the fragment in seconds.
Returns:
-
AudioFragment(AudioFragment) βA Model representing the audio fragment.
Source code in datachain/lib/file.py
get_fragments
get_fragments(
duration: float,
start: float = 0,
end: float | None = None,
) -> Iterator[AudioFragment]
Splits the audio into multiple fragments of a specified duration.
Parameters:
-
duration(float) βThe duration of each audio fragment in seconds.
-
start(float, default:0) βThe starting time in seconds (default: 0).
-
end(float, default:None) βThe ending time in seconds. If None, the entire remaining audio is processed (default: None).
Returns:
-
Iterator[AudioFragment]βIterator[AudioFragment]: An iterator yielding audio fragments.
Note
If end is not specified, number of samples will be taken from the audio file, this means audio file needs to be downloaded.
Source code in datachain/lib/file.py
get_info
get_info() -> Audio
Retrieves metadata and information about the audio file. It does not download the file if possible, only reads its header. It is thus might be a good idea to disable caching and prefetching for UDF if you only need audio metadata.
Returns:
-
Audio(Audio) βA Model containing audio metadata such as duration, sample rate, channels, and codec details.
Source code in datachain/lib/file.py
save
save(
output: str,
format: str | None = None,
start: float = 0,
end: float | None = None,
client_config: dict | None = None,
) -> AudioFile
Save audio file or extract fragment to specified format.
Parameters:
-
output(str) βOutput directory path
-
format(str | None, default:None) βOutput format ('wav', 'mp3', etc). Defaults to source format
-
start(float, default:0) βStart time in seconds (>= 0). Defaults to 0
-
end(float | None, default:None) βEnd time in seconds. If None, extracts to end of file
-
client_config(dict | None, default:None) βOptional client configuration
Returns:
-
AudioFile(AudioFile) βNew audio file with format conversion/extraction applied
Examples:
audio.save("/path", "mp3") # Entire file to MP3 audio.save("s3://bucket/path", "wav", start=2.5) # From 2.5s to end as WAV audio.save("/path", "flac", start=1, end=3) # 1-3s fragment as FLAC
Source code in datachain/lib/file.py
AudioFragment
Bases: DataModel
A data model for representing an audio fragment.
This model represents a specific fragment within an audio file with defined start and end times. It allows access to individual fragments and provides functionality for reading and saving audio fragments as separate audio files.
Attributes:
-
audio(AudioFile) βThe audio file containing the audio fragment.
-
start(float) βThe starting time of the audio fragment in seconds.
-
end(float) βThe ending time of the audio fragment in seconds.
get_np
Returns the audio fragment as a NumPy array with sample rate.
Returns:
-
tuple[ndarray, int]βtuple[ndarray, int]: A tuple containing the audio data as a NumPy array and the sample rate.
Source code in datachain/lib/file.py
read_bytes
Returns the audio fragment as audio bytes.
Parameters:
-
format(str, default:'wav') βThe desired audio format (e.g., 'wav', 'mp3'). Defaults to 'wav'.
Returns:
-
bytes(bytes) βThe encoded audio fragment as bytes.
Source code in datachain/lib/file.py
save
Saves the audio fragment as a new audio file.
If output is a remote path, the audio file will be uploaded to remote storage.
Parameters:
-
output(str) βThe destination path, which can be a local file path or a remote URL.
-
format(str, default:None) βThe output audio format (e.g., 'wav', 'mp3'). If None, the format is inferred from the file extension.
Returns:
-
AudioFile(AudioFile) βA Model representing the saved audio file.
Source code in datachain/lib/file.py
Audio
Bases: DataModel
A data model representing metadata for an audio file.
Attributes:
-
sample_rate(int) βThe sample rate of the audio (samples per second). Defaults to -1 if unknown.
-
channels(int) βThe number of audio channels. Defaults to -1 if unknown.
-
duration(float) βThe total duration of the audio in seconds. Defaults to -1.0 if unknown.
-
samples(int) βThe total number of samples in the audio. Defaults to -1 if unknown.
-
format(str) βThe format of the audio file (e.g., 'wav', 'mp3'). Defaults to an empty string.
-
codec(str) βThe codec used for encoding the audio. Defaults to an empty string.
-
bit_rate(int) βThe bit rate of the audio in bits per second. Defaults to -1 if unknown.
get_channel_name
staticmethod
Map channel index to meaningful name based on common audio formats