Audio

Description

Default class for audio recordings matching a Sequence, typically the voice of the subject of the motion capture. This class allows to perform a variety of transformations of the audio stream, such as getting the envelope, pitch and formants of the speech.

Initialisation

class krajjat.classes.audio.Audio(path_or_samples, frequency=None, name=None, condition=None, verbosity=1)

Default class for audio clips matching a Sequence, typically the voice of the subject of the motion capture. This class allows to perform a variety of transformations of the audio stream, such as getting the envelope, pitch and formants of the speech.

New in version 2.0.

Parameters:
  • path_or_samples (str or list(int) or numpy.ndarray(int)) – The path to the audio file, or a list containing the samples of an audio file. If the file is a path, it should either point to a .wav file, or to a file containing the timestamps and samples in a text form (.json, .csv, .tsv, .txt or .mat). It is also possible to point to a folder containing one file per sample. See Audio formats for the acceptable file types.

  • frequency (int or float, optional) – The frequency, in Hz (or samples per sec) at which the parameter path_or_samples is set. This parameter will be ignored if path_or_samples is a path, but will be used to define the timestamps of the Audio object if path_or_samples is a list of samples.

  • name (str, optional) – Defines a name for the Audio instance. If a string is provided, the attribute name will take its value. If not, see Audio._define_name_init().

  • condition (str or None, optional) – Optional field to represent in which experimental condition the audio was recorded.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

samples

A list containing the audio samples, in chronological order.

Type:

np.ndarray(int)

timestamps

A list containing the timestamps matching each audio sample. Consequently, samples and timestamps should have the same length.

Type:

np.ndarray(float)

frequency

The amount of samples per second.

Type:

int or float

name

Custom name given to the audio. If no name has been provided upon initialisation, it will be defined by Audio._define_name_init().

Type:

str

condition

Defines in which experimental condition the audio clip was recorded.

Type:

str

metadata

A dictionary containing metadata about the recording, extracted from the file.

Type:

dict

path

Path to the audio file passed as a parameter upon creation; if samples were provided, this attribute will be None.

Type:

str

files

List of files contained in the path. The list will be of size 1 if the path points to a single file.

Type:

list(str)

kind

A parameter that is set on "Audio", to differentiate it from the different types of AudioDerivative.

Type:

str

Examples

>>> seq1 = Audio("sequences/Chris/seq_001.tsv")
>>> seq2 = Audio("sequences/Will/seq_001.xlsx", name="Will_001")
>>> seq3 = Audio(name="Jonny_001")
>>> from scipy.io import wavfile
>>> freq, samples = wavfile.read("sequences/Guy/seq_001.wav")
>>> seq4 = Audio(samples, freq, name="Guy_001", condition="English", time_unit="ms", system="Kinect", verbosity=0)

Magic methods

Audio.__len__()

Returns the number of samples in the audio clip (i.e., the length of the attribute samples).

New in version 2.0.

Returns:

The number of samples in the audio clip.

Return type:

int

Example

>>> audio = Audio("Recordings/Din/recording_35.wav")
>>> len(audio)
88200
Audio.__getitem__(index)

Returns the sample of index specified by the parameter index.

Parameters:

index (int) – The index of the sample to return.

Returns:

A sample from the attribute samples.

Return type:

float

Example

>>> audio = Audio("Recordings/Nayru/recording_36.wav")
>>> audio[820]
417
Audio.__eq__(other)

Returns True if all the samples in the attribute samples have identical values between the two Audio objects, and if the frequency is identical.

New in version 2.0.

Parameters:

other (Audio) – Another Audio object.

Audio.__repr__()

Returns the name attribute of the audio clip.

Returns:

The attribute name of the Audio instance.

Return type:

str

Examples

>>> audio = Audio("Recordings/Farore/recording_37.wav")
>>> print(audio)
recording_37
>>> audio = Audio("Recordings/Farore/recording_37.wav", name="audio_37")
>>> print(audio)
audio_37

Public methods

Setter functions

Audio.set_name(name)

Sets the name attribute of the Audio instance. This name can be used as display functions or as a means to identify the audio.

New in version 2.0.

Parameters:

name (str) – A name to describe the audio clip.

Example

>>> aud = Audio("C:/Users/Walter/Sequences/audio.wav")
>>> aud.set_name("Audio 28980")
Audio.set_condition(condition)

Sets the condition attribute of the Audio instance. This attribute can be used to save the experimental condition in which the Audio instance was recorded.

New in version 2.0.

Parameters:

condition (str) – The experimental condition in which the audio clip was recorded.

Example

>>> aud1 = Audio("C:/Users/Dwight/Sequences/English/seq.wav")
>>> aud1.set_condition("English")
>>> aud2 = Audio("C:/Users/Dwight/Sequences/Spanish/seq.wav")
>>> aud2.set_condition("Spanish")

Getter functions

Audio.get_path()

Returns the attribute path of the Audio instance.

. versionadded:: 2.0

Returns:

The path of the Audio instance.

Return type:

str

Example

>>> audio = Audio("Recordings/Giacchino/01.wav")
>>> audio.get_path()
Recordings/Giacchino/01.wav
Audio.get_name()

Returns the attribute name of the Audio instance.

. versionadded:: 2.0

Returns:

The name of the Audio instance.

Return type:

str

Example

>>> audio = Audio("Recordings/Williams/recording_02.wav")
>>> audio.get_name()
recording_02
Audio.get_condition()

Returns the attribute condition of the Audio instance.

New in version 2.0.

Returns:

The experimental condition in which the recording of the audio clip was performed.

Return type:

str

Examples

>>> audio = Audio("Recordings/Zimmer/recording_03.wav")
>>> audio.get_condition()
None
>>> audio = Audio("Recordings/Djawadi/recording_04.wav", condition="Basque")
>>> audio.get_condition()
Basque
Audio.get_samples()

Returns the attribute samples of the Audio instance.

. versionadded:: 2.0

Returns:

The samples of the Audio instance.

Return type:

np.ndarray(int|float)

Examples

>>> audio = Audio("Recordings/Elfman/recording_05.wav")
>>> audio.get_samples()
array([0, 1, 3, 6, 2, 7, 13, 20, 12, 21, 11, 22, 10, 23])
Audio.get_sample(sample_index)

Returns the sample corresponding to the index passed as parameter.

New in version 2.0.

Parameters:

sample_index (int) – The index of the sample.

Returns:

A sample from the sequence.

Return type:

int

Examples

>>> audio = Audio("Recordings/Morricone/recording_06.wav")
>>> audio.get_sample(42)
7418880
Audio.get_number_of_samples()

Returns the number of samples in the audio clip.

New in version 2.0.

Returns:

The amount of samples in the audio clip.

Return type:

int

Example

>>> audio = Audio("Recordings/Richter/recording_07.wav")
>>> audio.get_number_of_samples()
22050
Audio.get_timestamps()

Returns a list of the timestamps for every sample, in seconds.

New in version 2.0.

Returns:

List of the timestamps of all the samples of the audio clip, in seconds.

Return type:

np.ndarray(int|float)

Example

>>> audio = Audio("Recordings/Shapiro/recording_08.wav")
>>> audio.get_timestamps()
array([0.00000000e+00 2.26757370e-05 4.53514739e-05 6.80272109e-05
       9.07029478e-05 1.13378685e-04 1.36054422e-04 1.58730159e-04])
Audio.get_duration()

Returns the duration of the audio clip, in seconds.

New in version 2.0.

Returns:

The duration of the audio clip, in seconds.

Return type:

float

Example

>>> audio = Audio("Recordings/Desplat/recording_09.wav")
>>> audio.get_duration()
42.48151
Audio.get_frequency()

Returns the frequency of the audio clip, in hertz.

New in version 2.0.

Returns:

The frequency of the audio clip, in hertz.

Return type:

int or float

Example

>>> audio = Audio("Recordings/MacQuayle/recording_10.wav")
>>> audio.get_frequency()
44100
Audio.get_info(return_type='dict', include_path=True)

Returns information regarding the Audio clip.

New in version 2.0.

Parameters:
  • return_type (bool, optional) – If set on "dict" (default), the info is returned as an OrderedDict. If set on "table", the info is returned as a two-dimensional list, ready to be exported as a table. If set on "str", a printable string is returned.

  • include_path (bool, optional) – If set on True, the path of the audio clip is included in the returned info (default).

Returns:

An ordered dictionary where each descriptor is associated to its value. The included information fields are:

Return type:

OrderedDict

Example

>>> audio = Audio("Recordings/Reznor/recording_11.wav", include_path=False)
>>> audio.get_info()
Name: recording_11 · Duration: 0.5 · Frequency: 44100 · Number of samples: 22150

Transformation functions

Audio.get_envelope(window_size=1000000.0, overlap_ratio=0.5, filter_below=None, filter_over=None, padtype='constant', padlen=None, name=None, verbosity=1)

Calculates the envelope of an array, and returns it. The function can also optionally perform a band-pass filtering, if the corresponding parameters are provided.

Parameters:
  • window_size (int or None, optional) – The size of the windows (in samples) in which to cut the audio clip to calculate the envelope. Cutting the audio clips in windows allows, in the case where they are long, to speed up the computation. If this parameter is set on None, the window size will be set on the number of samples. A good value for this parameter is generally 1 million. If this parameter is set on 0, on None or on a number of samples bigger than the amount of samples in the Audio instance, the window size is set on the length of the samples.

  • overlap_ratio (float or None, optional) – The ratio of samples overlapping between each window. If this parameter is not None, each window will overlap with the previous (and, logically, the next) for an amount of samples equal to the number of samples in a window times the overlap ratio. Then, only the central values of each window will be preserved and concatenated; this allows to discard any “edge” effect due to the windowing. If the parameter is set on None or 0, the windows will not overlap.

  • filter_below (int, float or None, optional) – If not None nor 0, this value will be provided as the lowest frequency of the band-pass filter.

  • filter_over (int, float or None, optional) – If not None nor 0, this value will be provided as the highest frequency of the band-pass filter.

  • padtype (str, optional) – What type of padding to use. See the documentation of scipy.signal.filtfilt for more information (default: "constant" - warning: this default is not scipy’s default ("odd").)

  • padlen (int, optional) –

    The number of elements for the padding. See the documentation of scipy.signal.filtfilt for more information.

  • name (str or None, optional) – Defines the name of the envelope. If set on None, the name will be the same as the original Audio instance, with the suffix "(ENV)".

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

The envelope of the original array.

Return type:

np.array

Example

>>> audio = Audio("Recordings/Ross/recording_12.wav")
>>> envelope = audio.get_envelope(filter_over=50)
Audio.get_pitch(method='parselmouth', filter_below=None, filter_over=None, padtype='constant', padlen=None, name=None, zeros_as_nan=False, verbosity=1)

Calculates the pitch of the voice in the audio clip, and returns a Pitch object.

New in version 2.0.

Parameters:
  • method (str, optional) – Defines the pitch tracking method used. If set on "parselmouth" (default), the to_pitch method from Parselmouth will be used to get the pitch. If set on “crepe”, the CREPE Python module will be used.

  • filter_below (int, float or None, optional) – If not None nor 0, this value will be provided as the lowest frequency of the band-pass filter.

  • filter_over (int, float or None, optional) – If not None nor 0, this value will be provided as the highest frequency of the band-pass filter.

  • padtype (str, optional) –

    What type of padding to use. See the documentation of scipy.signal.filtfilt for more information (default: "constant" - warning: this default is not scipy’s default ("odd").)

  • padlen (int, optional) –

    The number of elements for the padding. See the documentation of scipy.signal.filtfilt for more information.

  • name (str or None, optional) – Defines the name of the pitch. If set on None, the name will be the same as the original Audio instance, with the suffix "(PIT)".

  • zeros_as_nan (bool, optional) – If set on True, the values where the pitch is equal to 0 will be replaced by numpy.nan objects.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

The pitch of the voice in the audio clip.

Return type:

Pitch

Example

>>> audio = Audio("Recordings/Silvestri/recording_13.wav")
>>> pitch = audio.get_pitch(filter_over=50)
Audio.get_intensity(filter_below=None, filter_over=None, padtype='constant', padlen=None, name=None, zeros_as_nan=False, verbosity=1)

Calculates the intensity of the voice in the audio clip, and returns an Intensity object. The function can also optionally perform a band-pass filtering and a resampling, if the corresponding parameters are provided.

New in version 2.0.

Parameters:
  • filter_below (int, float or None, optional) – If not None nor 0, this value will be provided as the lowest frequency of the band-pass filter.

  • filter_over (int, float or None, optional) – If not None nor 0, this value will be provided as the highest frequency of the band-pass filter.

  • padtype (str, optional) –

    What type of padding to use. See the documentation of scipy.signal.filtfilt for more information (default: "constant" - warning: this default is not scipy’s default ("odd").)

  • padlen (int, optional) –

    The number of elements for the padding. See the documentation of scipy.signal.filtfilt for more information.

  • name (str or None, optional) – Defines the name of the intensity. If set on None, the name will be the same as the original Audio instance, with the suffix "(INT)".

  • zeros_as_nan (bool, optional) –

    If set on True, the values where the intensity is equal to 0 will be replaced by numpy.nan objects.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

The intensity of the voice in the audio clip.

Return type:

Intensity

Example

>>> audio = Audio("Recordings/Horner/recording_14.wav")
>>> intensity = audio.get_intensity(filter_over=50)
Audio.get_formant(formant_number=1, filter_below=None, filter_over=None, padtype='constant', padlen=None, name=None, zeros_as_nan=False, verbosity=1)

Calculates the formants of the voice in the audio clip, and returns a Formant object.

New in version 2.0.

Parameters:
  • formant_number (int, optional.) – One of the formants of the voice in the audio clip (1 (default), 2, 3, 4 or 5).

  • filter_below (int, float or None, optional) – If not None nor 0, this value will be provided as the lowest frequency of the band-pass filter.

  • filter_over (int, float or None, optional) – If not None nor 0, this value will be provided as the highest frequency of the band-pass filter.

  • padtype (str, optional) –

    What type of padding to use. See the documentation of scipy.signal.filtfilt for more information (default: "constant" - warning: this default is not scipy’s default ("odd").)

  • padlen (int, optional) –

    The number of elements for the padding. See the documentation of scipy.signal.filtfilt for more information.

  • name (str or None, optional) – Defines the name of the intensity. If set on None, the name will be the same as the original Audio instance, with the suffix "(Fn)", with n being the formant number.

  • zeros_as_nan (bool, optional) –

    If set on True, the values where the pitch is equal to 0 will be replaced by numpy.nan objects.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

The value of a formant of the voice in the audio clip.

Return type:

Formant

Example

>>> audio = Audio("Recordings/Beck/recording_15.wav")
>>> formant = audio.get_formant(formant_number=1, filter_over=50)
Audio.get_derivative(derivative, filter_below=None, filter_over=None, padtype='constant', padlen=None, resampling_frequency=None, resampling_mode='pchip', res_window_size=10000000.0, res_overlap_ratio=0.5, timestamp_start=None, timestamp_end=None, name=None, verbosity=1, **kwargs)

Computes and returns the requested AudioDerivative.

New in version 2.0.

Parameters:
  • derivative (str) –

    The time series to be returned, among:

    • "audio", for the original sample values.

    • "envelope"

    • "pitch"

    • "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.

    • "intensity"

  • filter_below (int, float or None, optional) – If not None nor 0, this value will be provided as the lowest frequency of the band-pass filter.

  • filter_over (int, float or None, optional) – If not None nor 0, this value will be provided as the highest frequency of the band-pass filter.`

  • padtype (str, optional) –

    What type of padding to use. See the documentation of scipy.signal.filtfilt for more information (default: "constant" - warning: this default is not scipy’s default ("odd").)

  • padlen (int, optional) –

    The number of elements for the padding. See the documentation of scipy.signal.filtfilt for more information.

  • resampling_frequency (float) – The frequency, in hertz, at which you want to resample the audio clip. A frequency of 4 will return samples at 0.25 s intervals.

  • resampling_mode (str, optional) – This parameter allows for all the values accepted for the kind parameter in the function scipy.interpolate.interp1d(): "linear", "nearest", "nearest-up", "zero", "slinear", "quadratic", "cubic", "previous", and "next". See the documentation for this Python module for more. This parameter also allows another special value, "take", which keeps one out of \(n\) samples, where \(n\) is equal to the original frequency divided by the resampling frequency. This allows for faster computation. Note that this function will return a warning if the resampling frequency is not an integer divider of the original frequency.

  • res_window_size (int, optional) – The size of the windows in which to cut the audio samples to perform the resampling. Cutting long arrays in windows allows to speed up the computation. If this parameter is set on None, the window size will be set on the number of samples. A good value for this parameter is generally 10 million (1e7). If this parameter is set on 0, on None or on a number of samples bigger than the amount of samples in the Audio instance, the window size is set on the length of the samples.

  • res_overlap_ratio (float, optional) – The ratio of samples overlapping between each window. If this parameter is not None, each window will overlap with the previous (and, logically, the next) for an amount of samples equal to the number of samples in a window times the overlap ratio. Then, only the central values of each window will be preserved and concatenated; this allows to discard any “edge” effect due to the windowing. If the parameter is set on None or 0, the windows will not overlap. By default, this parameter is set on 0.5, meaning that each window will overlap for half of their values with the previous, and half of their values with the next.

  • timestamp_start (float or None, optional) – If provided, the return values having a timestamp below the one provided will be ignored from the output.

  • timestamp_end (float or None, optional) – If provided, the return values having a timestamp above the one provided will be ignored from the output.

  • name (str or None, optional) – Defines the name of the output audio derivative. If set on None, the name will be the same as the input audio clip, with suffixes matching the applied procedure ("(ENV)" for envelope, "(PIT)" for pitch, "(INT)" for intensity, "(Fn)" for formant (with the n matching the requested formant), "+RS" if a resampling was performed, "+TR" if a trimming was performed).

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

  • **kwargs (dict) – Any of the parameters needed for the specific sub-functions: Audio.get_envelope(), Audio.get_pitch(), Audio.get_intensity(), and Audio.get_formant().

Returns:

The requested AudioDerivative instance.

Return type:

AudioDerivative

Examples

>>> audio = Audio("Recordings/Kondo/recording_16.wav")
>>> envelope = audio.get_derivative("envelope", filter_over=50)
>>> pitch = audio.get_derivative("pitch", filter_over=50, zeros_as_nan=True)
>>> intensity = audio.get_derivative("intensity", filter_over=50, name="Kondo_intensity")
>>> f1 = audio.get_derivative("formant", formant_number=1, filter_over=50)
>>> f2 = audio.get_derivative("f2", filter_over=50)

Correction functions

Audio.filter_frequencies(filter_below=None, filter_over=None, padtype='constant', padlen=None, name=None, verbosity=1)

Applies a low-pass, high-pass or band-pass filter to the data in the attribute samples.

Parameters:
  • filter_below (float or None, optional) – The value below which you want to filter the data. If set on None or 0, this parameter will be ignored. If this parameter is the only one provided, a high-pass filter will be applied to the samples; if filter_over is also provided, a band-pass filter will be applied to the samples.

  • filter_over (float or None, optional) – The value over which you want to filter the data. If set on None or 0, this parameter will be ignored. If this parameter is the only one provided, a low-pass filter will be applied to the samples; if filter_below is also provided, a band-pass filter will be applied to the samples.

  • padtype (str, optional) –

    What type of padding to use. See the documentation of scipy.signal.filtfilt for more information (default: "constant" - warning: this default is not scipy’s default ("odd").)

  • padlen (int, optional) –

    The number of elements for the padding. See the documentation of scipy.signal.filtfilt for more information.

  • name (str or None, optional) – Defines the name of the output audio. If set on None, the name will be the same as the original audio derivative, with the suffix "+FF".

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

The Audio instance, with filtered values.

Return type:

Audio

Example

>>> audio = Audio("Recordings/Shore/recording_17.wav")
>>> audio_ff = audio.filter_frequencies(filter_below=10, filter_over=50)
Audio.resample(frequency, method='cubic', window_size=10000000.0, overlap_ratio=0.5, name=None, verbosity=1)

Resamples an audio clip to the frequency parameter. It first creates a new set of timestamps at the desired frequency, and then interpolates the original data to the new timestamps.

New in version 2.0.

Parameters:
  • frequency (float) – The frequency, in hertz, at which you want to resample the audio clip. A frequency of 4 will return samples at 0.25 s intervals.

  • method (str, optional) –

    This parameter allows for all the values accepted for the kind parameter in the function scipy.interpolate.interp1d(): "linear", "nearest", "nearest-up", "zero", "slinear", "quadratic", "cubic", "previous", and "next". See the documentation for this Python module for more. This parameter also allows another special value, "take", which keeps one out of \(n\) samples, where \(n\) is equal to the original frequency divided by the resampling frequency. This allows for faster computation. Note that this function will return a warning if the resampling frequency is not an integer divider of the original frequency.

  • window_size (int, optional) – The size of the windows in which to cut the audio samples to perform the resampling. Cutting long arrays in windows allows to speed up the computation. If this parameter is set on None, the window size will be set on the number of samples. A good value for this parameter is generally 10 million (1e7). If this parameter is set on 0, on None or on a number of samples bigger than the amount of samples in the Audio instance, the window size is set on the length of the samples.

  • overlap_ratio (float, optional) – The ratio of samples overlapping between each window. If this parameter is not None, each window will overlap with the previous (and, logically, the next) for an amount of samples equal to the number of samples in a window times the overlap ratio. Then, only the central values of each window will be preserved and concatenated; this allows to discard any “edge” effect due to the windowing. If the parameter is set on None or 0, the windows will not overlap. By default, this parameter is set on 0.5, meaning that each window will overlap for half of their values with the previous, and half of their values with the next.

  • name (str or None, optional) – Defines the name of the output audio clip. If set on None, the name will be the same as the input audio clip, with the suffix "+RS n" with n being the value of frequency.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

A new audio clip containing resampled timestamps and samples.

Return type:

Audio

Warning

This function allows both the upsampling and the downsampling of audio clips. However, during any of these operations, the algorithm only estimates the real values of the samples. You should then consider the upsampling (and the downsampling, to a lesser extent) with care. You can control the frequency of the original audio clip with Audio.get_frequency().

Example

>>> audio = Audio("Recordings/Raine/recording_18.wav")
>>> audio_resampled = audio.resample(2000, "cubic")
Audio.trim(start=None, end=None, name=None, error_if_out_of_bounds=False, verbosity=1, add_tabs=0)

Trims an audio clip according to a starting and an ending timestamps. Timestamps must be provided in seconds.

New in version 2.0.

Parameters:
  • start (int or None, optional) – The timestamp after which the samples will be preserved (inclusive). If set on None, or if set on a value lower than the first timestamp, the beginning of the audio will be set as the timestamp of the first sample.

  • end (int or None, optional) – The timestamp before which the samples will be preserved (inclusive). If set on None, or if set on a value higher than the last timestamp, the end of the audio will be set as the timestamp of the last sample.

  • name (str or None, optional) – Defines the name of the output audio clip. If set on None, the name will be the same as the input audio clip, with the suffix "+TR".

  • error_if_out_of_bounds (bool, optional) – Defines if to return an error if the timestamps are out of bounds. If set on True, the function will raise an Exception if start is below 0, or if end is above the length of the audio.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

  • add_tabs (int, optional) – Adds the specified amount of tabulations to the verbosity outputs. This parameter may be used by other functions to encapsulate the verbosity outputs by indenting them. In a normal use, it shouldn’t be set by the user.

Returns:

A new Audio instance containing a subset of the samples of the original.

Return type:

Audio

Example

>>> audio = Audio("Recordings/Holt/recording_19.wav")
>>> audio_trimmed = audio.trim(10, 15)

Delay finding

Audio.find_excerpt(other, **kwargs)

This function tries to find the timestamp at which an excerpt of the current Audio instance begins. The computation is performed through cross-correlation, by first turning the audio clips into filtered envelopes and downsampling them to accelerate the processing. The function returns the timestamp or the index of the maximal correlation value, or None if this value is below threshold.

New in version 2.0.

Parameters:
  • other (Audio) – An Audio instance, smaller than or of equal size to the current object, that is allegedly an excerpt from the current Audio instance. The amplitude, frequency or values do not have to match exactly the ones from the current Audio instance.

  • **kwargs (dict) – Any of the parameters needed for the function find_delay.

Returns:

  • int, float, timedelta or None – The sample index, timestamp or timedelta of the current Audio instance at which the excerpt can be found, or None if the excerpt is not contained in the current Audio instance.

  • float or None, optional – Optionally, if return_correlation_value is True, the correlation value at the corresponding index/timestamp.

Example

>>> audio = Audio("Recordings/Davis/recording_20.wav")
>>> audio_excerpt = audio.trim(12, 15)
>>> audio.find_excerpt(audio_excerpt, return_delay_format="s")
12
Audio.find_excerpts(excerpts, **kwargs)

This function tries to find the timestamp at which multiple excerpts of the current Audio instance begin. The computation is performed through cross-correlation, by first turning the audio clips into downsampled and filtered envelopes to accelerate the processing. For each excerpt, the function returns the timestamp of the maximal correlation value, or None if this value is below threshold.

New in version 2.0.

Parameters:
  • excerpts (list(Audio)) – A list of Audio instances, smaller than or of equal size to the current object, that are all allegedly excerpts from current Audio instance. The amplitude, frequency or values do not have to match exactly the ones from the current Audio instance.

  • **kwargs (dict) – Any of the parameters needed for the function find_delay.

Returns:

  • int|float|timedelta|None – The sample index, timestamp or timedelta of array1 at which array2 can be found (defined by the parameter return_delay_format), or None if array1 is not contained in array2.

  • float|None, optional – Optionally, if return_correlation_value is True, the correlation value at the corresponding index/timestamp.

Example

>>> audio = Audio("Recordings/Beal/recording_21.wav")
>>> audio_excerpt_1 = audio.trim(12, 15)
>>> audio_excerpt_2 = audio.trim(15, 18)
>>> audio_excerpt_3 = audio.trim(18, 21)
>>> audio.find_excerpt([audio_excerpt_1, audio_excerpt_2, audio_excerpt_3], return_delay_format="s")
[12, 15, 18]

Conversion functions

Audio.to_table()

Returns a list of lists where each sublist contains a timestamp and a sample. The first sublist contains the headers of the table. The output then resembles the table found in Tabled formats.

New in version 2.0.

Returns:

A list of lists that can be interpreted as a table, containing headers, and with the timestamps and the sample value on each row.

Return type:

list(list)

Example

>>> audio = Audio("Recordings/Elfman/recording_22.wav")
>>> table = audio.to_table()
>>> table[0:3]
[["Timestamp", "Sample"],
 [0, 0.4815],
 [0.001, 0.1623],
 [0.002, 0.42]]
Audio.to_json(include_metadata=True)

Returns a list ready to be exported in JSON. The returned JSON data is a dictionary with three keys: “Timestamp” is the key to the list of timestamps, “Sample” is the key to the list of samples, and “Frequency” is the key to the sampling frequency of the audio clip.

New in version 2.0.

Parameters:

include_metadata (bool, optional) – Whether to include the metadata in the file (default: True).

Returns:

A dictionary containing the data of the audio clip, ready to be exported in JSON.

Return type:

dict

Example

>>> audio = Audio("Recordings/Newton-Howard/recording_23.wav")
>>> json_data = audio.to_json()
>>> json_data
{"Timestamp": [0, 0.001, 0.002, 0.003], "Sample": [0, 0.4815, 0.1623, 0.42], "Frequency": 1000.0,
"processing_steps": []}
Audio.to_dict(include_frequency=True)

Returns a dictionary containing the data of the audio clip.

New in version 2.0.

Parameters:

include_frequency (bool, optional) – If set on True, includes the frequency of the audio clip in the output dictionary.

Returns:

  • dict – A dataframe containing the timestamps and samples of the audio clip.

    Example

  • ——-

  • >>> audio = Audio(“Recordings/Young/recording_24.wav”)

  • >>> dict_data = audio.to_dict()

  • >>> dict_data

  • {“Timestamp” ([0, 0.001, 0.002, 0.003], “Sample”: [0, 0.4815, 0.1623, 0.42], “Frequency”: 1000.0})

Audio.to_dataframe()

Returns a Pandas dataframe containing the data of the audio clip.

New in version 2.0.

Returns:

A dataframe containing the timestamps and samples of the audio clip.

Return type:

Pandas.dataframe

Example

>>> audio = Audio("Recordings/Arnold/recording_25.tsv")
>>> audio.to_dataframe()
    Timestamp   Sample
0      0.000      4815
1      0.001      1623
2      0.002        42

Saving function

Audio.save(folder_out, name=None, file_format='json', encoding='utf-8', individual=False, include_metadata=True, verbosity=1)

Saves an audio clip in a file or a folder. The function saves the sequence under folder_out/name.file_format. All the non-existent subfolders present in the folder_out path will be created by the function. The function also updates the path attribute of the Audio clip.

New in version 2.0.

Parameters:
  • folder_out (str, optional) – The path to the folder where to save the file or files. If one or more subfolders of the path do not exist, the function will create them. If the string provided is empty (by default), the audio clip will be saved in the current working directory. If the string provided contains a file with an extension, the fields name and file_format will be ignored.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio clip. If set on None, the name will be set on the attribute name of the audio clip; if that attribute is also set on None, the name will be set on "out". If individual is set on True, each sample will be saved as a different file, having the index of the pose as a suffix after the name (e.g. if the name is "sample" and the file format is "txt", the samples will be saved as sample_0.txt, sample_1.txt, sample_2.txt, etc.).

  • file_format (str or None, optional) –

    The file format in which to save the audio clip. The file format must be "json" (default), "xlsx", "txt", "csv", "tsv", "wav", or, if you are a masochist, "mat". Notes:

    • "xls" will save the file with an .xlsx extension.

    • Any string starting with a dot will be accepted (e.g. ".csv" instead of "csv").

    • "csv;" will force the value separator on ;, while "csv," will force the separator on ,. By default, the function will detect which separator the system uses.

    • "txt" and "tsv" both separate the values by a tabulation.

    • Any other string will not return an error, but rather be used as a custom extension. The data will be saved as in a text file (using tabulations as values separators).

  • encoding (str, optional) – The encoding of the file to save (applicable for json and text-based files). By default, the file is saved in UTF-8 encoding. This input can take any of the official Python accepted formats <https://docs.python.org/3/library/codecs.html#standard-encodings>`_.

  • individual (bool, optional) –

    If set on False (default), the function will save the audio clip in a unique file. If set on True, the function will save each sample of the audio clip in an individual file, appending an underscore and the index of the sample (starting at 0) after the name. This option is not available and will be ignored if file_format is set on "wav".

    Warning

    It is not recommended to save each sample in a different file. This incredibly tedious way of handling audio files has only been implemented to follow the same logic as for the Sequence files, and should be avoided.

  • include_metadata (bool, optional) –

    Whether to include the metadata in the file (default: True). This parameter does not apply to individually saved files.

    • For json files, the metadata is saved at the top level. Metadata keys will be saved next to the "Poses" key.

    • For mat files, the metadata is saved at the top level of the structure.

    • For xlsx files, the metadata is saved in a second sheet.

    • For pkl files, the metadata will always be saved as the object is saved as-is - this parameter is thus ignored.

    • For wav files, the metadata is saved as tags in the file.

    • For all the other formats, the metadata is saved at the beginning of the file.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Examples

>>> audio = Audio("Recordings/Bach/recording_26.wav")
>>> audio.save("Recordings/Bach/recording_26.tsv")
>>> audio.save("Recordings/Bach/recording_26.mat", include_metadata=False)
Audio.save_json(folder_out, name=None, individual=False, include_metadata=True, encoding='utf-8', verbosity=1)

Saves an audio clip as a json file or files. This function saves the Audio instance as folder_out/name.file_format.

New in version 2.0.

Parameters:
  • folder_out (str) – The path to the folder where to save the file or files. If one or more subfolders of the path do not exist, the function will create them.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio clip. If set on None, the name will be set on "out" if individual is False, or on "sample" if individual is True.

  • individual (bool, optional) –

    If set on False (default), the function will save the audio clip in a unique file. If set on True, the function will save each sample of the audio clip in an individual file, appending an underscore and the index of the sample (starting at 0) after the name.

    Warning

    It is not recommended to save each sample in a different file. This incredibly tedious way of handling audio files has only been implemented to follow the same logic as for the Sequence files, and should be avoided.

  • include_metadata (bool, optional) – Whether to include the metadata in the file (default: True).

  • encoding (str, optional) – The encoding of the file to save. By default, the file is saved in UTF-8 encoding. This input can take any of the official Python accepted formats.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Example

>>> audio = Audio("Recordings/Beethoven/recording_27.wav")
>>> audio.save_json("Recordings/Beethoven/recording_27.json")
Audio.save_mat(folder_out, name=None, individual=False, include_metadata=True, verbosity=1)

Saves an audio clip as a Matlab .mat file or files. This function saves the Audio instance as folder_out/name.file_format.

New in version 2.0.

Important

This function is dependent of the module scipy.

Parameters:
  • folder_out (str) – The path to the folder where to save the file or files. If one or more subfolders of the path do not exist, the function will create them.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio clip. If set on None, the name will be set on "out" if individual is False, or on "sample" if individual is True.

  • individual (bool, optional) –

    If set on False (default), the function will save the audio clip in a unique file. If set on True, the function will save each sample of the audio clip in an individual file, appending an underscore and the index of the sample (starting at 0) after the name.

    Warning

    It is not recommended to save each sample in a different file. This incredibly tedious way of handling audio files has only been implemented to follow the same logic as for the Sequence files, and should be avoided.

  • include_metadata (bool, optional) – Whether to include the metadata in the file (default: True).

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Example

>>> audio = Audio("Recordings/Shostakovich/recording_28.wav")
>>> audio.save_mat("Recordings/Shostakovich/recording_28.mat")
Audio.save_excel(folder_out, name=None, individual=False, sheet_name='Data', include_metadata=True, metadata_sheet_name='Metadata', verbosity=1)

Saves an audio clip as an Excel .xlsx file or files. This function is called by the Audio.save() method, and saves the Audio instance as folder_out/name.file_format.

New in version 2.0.

Important

This function is dependent of the module openpyxl.

Parameters:
  • folder_out (str) – The path to the folder where to save the file or files. If one or more subfolders of the path do not exist, the function will create them.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio clip. If set on None, the name will be set on "out" if individual is False, or on "sample" if individual is True.

  • individual (bool, optional) –

    If set on False (default), the function will save the audio clip in a unique file. If set on True, the function will save each sample of the audio clip in an individual file, appending an underscore and the index of the sample (starting at 0) after the name.

    Warning

    It is not recommended to save each sample in a different file. This incredibly tedious way of handling audio files has only been implemented to follow the same logic as for the Sequence files, and should be avoided.

  • sheet_name (str|None, optional) – The name of the sheet containing the data. If None, a default name will be attributed to the sheet ("Sheet").

  • include_metadata (bool, optional) – Whether to include the metadata in the Excel file (default: True). The metadata is saved in a separate sheet in the same Excel file.

  • metadata_sheet_name (str|None, optional) – The name of the sheet containing the metadata. If None, a default name will be attributed to the sheet ("Metadata").

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Example

>>> audio = Audio("Recordings/Chopin/recording_29.wav")
>>> audio.save_excel("Recordings/Chopin/recording_29.xlsx")
Audio.save_pickle(folder_out, name=None, individual=False, verbosity=1)

Saves an audio clip by pickling it. This allows to reopen the audio clip as an Audio object.

New in version 2.0.

Parameters:
  • folder_out (str) – The path to the folder where to save the file or files, or the complete path to the file. If one or more subfolders of the path do not exist, the function will create them.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio. If set on None, the name will be set on "out" if individual is False, or on "sample" if individual is True. This parameter is ignored if folder_out already contains the name of the file.

  • individual (bool, optional) – If set on False (default), the function will save the audio clip in a unique file. If set on True, the function will save each sample of the audio clip in an individual file, appending an underscore and the index of the sample (starting at 0) after the name.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Example

>>> audio = Audio("Recordings/Saint-Saëns/recording_30.wav")
>>> audio.save_pickle("Recordings/Saint-Saëns/recording_30.pkl")
Audio.save_wav(folder_out, name=None, include_metadata=True, verbosity=1)

Saves an audio clip as a .wav file or files. This function is called by the Audio.save() method, and saves the Audio instance as folder_out/name.file_format.

New in version 2.0.

Parameters:
  • folder_out (str) – The path to the folder where to save the file or files. If one or more subfolders of the path do not exist, the function will create them.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio clip. If set on None, the name will be set on "out".

  • include_metadata (bool, optional) – Whether to include the metadata in the file (default: True). The metadata is saved on the first lines of the file. Note: due to WAV tags limitations, the case of the metadata might be modified.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Example

>>> audio = Audio("Recordings/Mozart/recording_32.json")
>>> audio.save_wav("Recordings/Mozart/recording_32.wav")
Audio.save_txt(folder_out, name=None, file_format='csv', encoding='utf-8', individual=False, include_metadata=True, verbosity=1)

Saves an audio clip as a .txt, .csv, .tsv, or custom extension file or files. This function is called by the Audio.save() method, and saves the Audio instance as folder_out/name.file_format.

New in version 2.0.

Parameters:
  • folder_out (str) – The path to the folder where to save the file or files. If one or more subfolders of the path do not exist, the function will create them.

  • name (str or None, optional) – Defines the name of the file or files where to save the audio clip. If set on None, the name will be set on "out" if individual is False, or on "sample" if individual is True.

  • file_format (str, optional) – The file format in which to save the audio clip. The file format can be "txt", "csv" (default) or "tsv". "csv;" will force the value separator on ";", while "csv," will force the separator on ",". By default, the function will detect which separator the system uses. "txt" and "tsv" both separate the values by a tabulation. Any other string will not return an error, but rather be used as a custom extension. The data will be saved as in a text file (using tabulations as values separators).

  • encoding (str, optional) –

    The encoding of the file to save. By default, the file is saved in UTF-8 encoding. This input can take any of the official Python accepted formats.

  • individual (bool, optional) –

    If set on False (default), the function will save the audio clip in a unique file. If set on True, the function will save each sample of the audio clip in an individual file, appending an underscore and the index of the sample (starting at 0) after the name.

    Warning

    It is not recommended to save each sample in a different file. This incredibly tedious way of handling audio files has only been implemented to follow the same logic as for the Sequence files, and should be avoided.

  • include_metadata (bool, optional) – Whether to include the metadata in the file (default: True). The metadata is saved on the first lines of the file.

  • verbosity (int, optional) –

    Sets how much feedback the code will provide in the console output:

    • 0: Silent mode. The code won’t provide any feedback, apart from error messages.

    • 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.

    • 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Examples

>>> audio = Audio("Recordings/Vivaldi/recording_33.wav")
>>> audio.save_txt("Recordings/Vivaldi/recording_33.txt")
>>> audio.save_txt("Recordings/Vivaldi/recording_33.tsv", include_metadata=False)
>>> audio.save_txt("Recordings/Vivaldi", "recording_33", "aaa", include_metadata=False)

Copy function

Audio.copy()

Returns a deep copy of the Audio object.

New in version 2.0.

Returns:

A new Audio object, copy of the original.

Return type:

Audio

Example

>>> audio = Audio("Recordings/Tchaikovsky/recording_34.tsv")
>>> audio_copy = audio.copy()