Analysis functions

Description

These functions allow the analysis of the Sequence and AudioDerivative instances in relation with each other.

Functions

krajjat.analysis_functions.power_spectrum(experiment_or_dataframe, method='welch', group=None, condition=None, subjects=None, trials=None, series=None, average='subject', return_values='raw', permutation_level=None, number_of_randperms=0, sequence_measure='distance', audio_measure='envelope', sampling_frequency=50, specific_frequency=None, freq_atol=1e-08, include_audio=False, width_line=1, color_line=None, verbosity=1, **kwargs)

Returns the power spectrum values for all the variables (joints and audio) of the given dataframe or experiment. The function also plots these power spectrum values.

New in version 2.0.

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any)) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
method (str, optional) –
The method to use to calculate the power spectrum. It can be either:
- "fft": in that case, the power spectrum will be calculated using the Fast Fourier Transform algorithm from numpy.
- "welch" (default): in that case, the power spectrum will be calculated using the Welch algorithm from scipy. This method is more robust to noise than the FFT.
group (str or None) – If specified, the analysis will focus exclusively on subjects whose group attribute matches the value provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (str or None) – If specified, the analysis will focus exclusively on sequences whose condition attribute matches the value provided for this parameter. Otherwise, if this parameter is set on None (default), all sequences will be considered.
subjects (list(str), str or None) – If specified, the analysis will focus exclusively on subjects whose name attribute matches the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e., trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
series (str, optional) – Defines the series that divide the data for comparison. This value can take any of the column names from the dataframe (apart from the values indicated in sequence_measure and audio_measure). For instance, if group is selected, the correlation will be calculated and plotted for each individual group of the dataframe. Alternatively, setting this parameter on None (default) or “Full dataset” will not perform any comparison, but rather run the analysis on all the data.
average (str or None, optional) –
Defines if an average power spectrum is computed. This parameter can be:
- "subject": the power spectrum is calculated for each subject and averaged across all subjects.
- "trial": the power spectrum is calculated for each trial and averaged across all trials.
- None: the power spectrum is calculated for the whole dataset.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
audio_measure (str|None, optional) –
The audio measure, among:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
- None (default): in that case, the power spectrum of the audio will not be computed.
Note

In the case where the value is an audio value, this parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, used to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance. If the parameter experiment_or_dataframe is a dataframe, this parameter is ignored.
specific_frequency (float|list|None, optional) – If set, the function will return the power spectrum of the specified frequency, (or the specific frequencies, as a list) and plot a silhouette graph rather than a body graph.
freq_atol (float|int, optional) – The absolute tolerance of the frequency set on specific_frequency (default: 1e-8). If set, the function will look for the closest matching frequency in the range [specific_frequency - freq_atol, specific_frequency + freq_atol].
include_audio (bool, optional) – Whether to include the audio in the power spectrum calculation and plot. Default: False.
width_line (int or float, optional) – Defines the width of the plotted lines (default: 1).
color_line (list or None, optional) – A list containing the colors for the different variables of interest. If the number of colors is inferior to the plotted series, the colors loop through the list.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.
**kwargs (optional) – Any parameter accepted by either plot_functions.plot_silhouette() or plot_functions.plot_body_graphs().

Returns:

frequencies (np.ndarray) – Frequency bins.
averages (dict) – Average power per label and series.
stds (dict) – Standard deviations of the power per label and series.

krajjat.analysis_functions.correlation(experiment_or_dataframe, method='corr', group=None, condition=None, subjects=None, trials=None, series=None, average='subject', sequence_measure='distance', audio_measure='envelope', correlation_with='envelope', return_values='z-scores', permutation_level='whole', number_of_randperms=1000, sampling_frequency=50, include_audio=False, random_seed=None, verbosity=1, **kwargs)

Calculates and plots the correlation between one metric derived from the sequences, and the same metric from a given joint, or another metric derived from the corresponding audio clips.

New in version 2.0.

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any)) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
method (str, optional) – Can be either “corr” (default, uses pingouin_corr), “rm_corr” (uses pingouin_corr), or "numpy" (manual Pearson correlation).
group (str or None) – If specified, the analysis will focus exclusively on subjects whose group attribute matches the value provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (str or None) – If specified, the analysis will focus exclusively on sequences whose condition attribute matches the value provided for this parameter. Otherwise, if this parameter is set on None (default), all sequences will be considered.
subjects (list(str), str or None) – If specified, the analysis will focus exclusively on subjects whose name attribute matches the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e.. trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
series (str, optional) – Defines the series that divide the data for comparison. This value can take any of the column names from the dataframe (apart from the values indicated in sequence_measure and audio_measure). For instance, if group is selected, the correlation will be calculated and plotted for each individual group of the dataframe.
average (str or None, optional) –
Defines if an average correlation is returned. This parameter can be:
- "subject": the correlation is calculated for each subject and averaged across all subjects.
- "trial": the correlation is calculated for each trial and averaged across all trials.
- None: the correlation is calculated for the whole dataset.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
audio_measure (str, optional) –
The measure of the audio, can be either:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the audio data in the dataframe.
correlation_with (str) – The joint label or audio measure to correlate against (default: "envelope").
return_values (str) –
Defines which values are returned and plotted. This parameter can be:
- "average" or "raw": the correlation is returned, averaged across subjects, trials or the whole dataset, depending on the value for the parameter average.
- "z-score" (default): z-scores are returned, calculated against an average of randomly permuted arrays.
permutation_level (str) –
This determines how permutations are applied:
- ”whole”: permutations are done on the pooled data.
- ”individual”: permutations are done separately for each subject or trial.
- None: no permutations are calculated. This value is not allowed if return_values == “z-scores”.
number_of_randperms (int, optional) – How many random permutations to calculate. Only used if permutation_level is set to "whole" or "individual". An average of the calculated random permutations is then calculated, in order to calculate a Z-score for the correlation.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, used to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance, and to perform the correlation.
include_audio (bool, optional) – Whether to include the audio in the power spectrum calculation and plot. Default: False.
random_seed (int, optional) – Sets a fixed seed for the random number generator. Only used if random permutations are computed.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

dict – A nested dictionary containing the average correlation values for each series and each joint. Structure: {series_value: {joint_label: average_correlation_value}}.
dict – A nested dictionary containing the standard deviation of correlation values for each series and joint. Structure: {series_value: {joint_label: std_deviation_value}}.
dict, optional – A nested dictionary containing the Z-scores of the correlation values, computed as: (average - mean of random permutations) / std of permutations. Structure: {series_value: {joint_label: z_score_value}}.
dict, optional – A nested dictionary containing the p-values of the z-scores.

krajjat.analysis_functions.coherence(experiment_or_dataframe, group=None, condition=None, subjects=None, trials=None, series=None, average='subject', sequence_measure='distance', audio_measure='envelope', coherence_with='envelope', return_values='z-scores', permutation_level='whole', number_of_randperms=1000, sampling_frequency=50, specific_frequency=None, freq_atol=1e-08, step_segments=0.25, include_audio=False, random_seed=None, color_line=None, width_line=1, verbosity=1, **kwargs)

Calculates and plots the coherence between one metric derived from the sequences, and the same metric from a given joint, or another metric derived from the corresponding audio clips.

..versionadded:: 2.0

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any).) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
group (str or None) – If specified, the analysis will focus exclusively on subjects whose group attribute matches the value provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (str or None) – If specified, the analysis will focus exclusively on sequences whose condition attribute matches the value provided for this parameter. Otherwise, if this parameter is set on None (default), all sequences will be considered.
subjects (list(str), str or None) – If specified, the analysis will focus exclusively on subjects whose name attribute matches the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e., trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
series (str, optional) – Defines the series that divide the data for comparison. This value can take any of the column names from the dataframe (apart from the values indicated in sequence_measure and audio_measure). For instance, if group is selected, the coherence will be calculated and plotted for each individual group of the dataframe.
average (str or None, optional) –
Defines if an average coherence is returned. This parameter can be:
- "subject": the coherence is calculated for each subject and averaged across all subjects.
- "trial": the coherence is calculated for each trial and averaged across all trials.
- None: the coherence is calculated for the whole dataset.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
audio_measure (str, optional) –
The measure of the audio, can be either:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the audio data in the dataframe.
coherence_with (str) – The joint label or audio measure to correlate against (default: "envelope").
return_values (str) –
Defines which values are returned and plotted. This parameter can be:
- "average" or "raw": the coherence is returned, averaged across subjects, trials or the whole dataset, depending on the value for the parameter average.
- "z-score" (default): z-scores are returned, calculated against an average of randomly permuted arrays.
permutation_level (str) –
This parameter determines how permutations are applied:
- ”whole”: permutations are done on the pooled data.
- ”individual”: permutations are done separately for each subject or trial.
- None: no permutations are calculated. This value is not allowed if return_values == “z-scores”.
number_of_randperms (int, optional) – How many random permutations to calculate. Only used if permutation_level is set to "whole" or "individual". An average of the calculated random permutations is then calculated, in order to calculate a Z-score for the coherence.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, used to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance, and to perform the coherence.
specific_frequency (int or float, optional) – If specified, the function will return the coherence values for the specified frequency alone, and will display a silhouette plot instead of a body graph.
freq_atol (float|int, optional) – The absolute tolerance of the frequency set on specific_frequency (default: 1e-8). If set, the function will look for the closest matching frequency in the range [specific_frequency - freq_atol, specific_frequency + freq_atol].
step_segments (int or float, optional) – Defines how large each frequency segment will be for the analysis. If set on 0.25 (default), the coherence will be calculated at intervals of 0.25 Hz.
include_audio (bool, optional) – Whether to include the audio in the power spectrum calculation and plot. Default: False.
random_seed (int, optional) – Sets a fixed seed for the random number generator. Only used if permutation_level is set.
color_line (list or None, optional) – A list containing the colors for the different variables of interest. If the number of colors is inferior to the plotted series, the colors loop through the list.
width_line (int or float, optional) – Defines the width of the plotted lines (default: 1).
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.
**kwargs (optional) – Any parameter accepted by either plot_functions.plot_silhouette() or plot_functions.plot_body_graphs().

Returns:

np.ndarray(float) – A list of all the frequencies at which the coherence was computed.
dict(str (np.ndarray(float))) – A dictionary containing joint labels as keys, and coherence averages (or raw coherence, if no average was requested) as values.
dict(str (np.ndarray(float))) – A dictionary containing the standard deviations matching the averages from the previous one. If no average was requested, this dictionary will only contain zeros.

krajjat.analysis_functions.pca(experiment_or_dataframe, n_components, group=None, condition=None, subjects=None, trials=None, sequence_measure='distance', audio_measure='envelope', include_audio=False, sampling_frequency=50, show_graph=True, selected_components=None, nan_behaviour='ignore', verbosity=1)

Performs a principal component analysis (PCA) on the measures from the experiment, reducing the dimensionality of the data. Each joint_label is used as a feature for the PCA, and, if specified, the audio measure too. Relies on the PCA function from scikit.

..versionadded:: 2.0

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any).) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
n_components (int, optional) – The number of components to generate from the PCA.
group (str or None) – If specified, the analysis will discard the trials whose group attribute does not match the value provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (str or None) – If specified, the analysis will discard the trials whose condition attribute does not match the value provided for this parameter. Otherwise, if this parameter is set on None (default), all trials will be considered.
subjects (list(str), str or None) – If specified, the analysis will discard the subjects whose name attribute does not match the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e., trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
audio_measure (str, optional) –
The measure used for each audio instance, can be either:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the audio data in the dataframe.
include_audio (bool, optional) – If set on True, includes the audio channel as one of the features for the PCA. By default, this parameter is set on False.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, use to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance (otherwise, the parameter sampling_frequency is unused).
show_graph (bool, optional) –
If set on True (default), shows the selected components (see next parameter) and the contribution of each joint to the components.

Note

Even if include_audio is set on True, the audio will not appear on the contribution silhouette on the left of each graph.
selected_components (list, int or None, optional) – Defines the components to plot. It can be a single component (e.g. 2) or a list of components (e.g. [0, 2, 5]). If set on None (default), all the components are plotted.
nan_behaviour (str) – If “ignore” (default), the labels containing values equal to numpy.NaN will be removed from the PCA. If “zero”, all the numpy.NaN will be turned to zero.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

krajjat.analysis_functions.ica(experiment_or_dataframe, n_components, group=None, condition=None, subjects=None, trials=None, sequence_measure='distance', audio_measure='envelope', include_audio=False, sampling_frequency=50, show_graph=True, selected_components=None, nan_behaviour='ignore', verbosity=1)

Performs an independent component analysis (ICA) on the measures from the experiment, trying to separate them into subcomponents. Relies on the fastICA function from scikit.

..versionadded:: 2.0

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any).) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
n_components (int, optional) – The number of components to generate from the ICA.
group (str or None) – If specified, the analysis will discard the trials whose group attribute does not match the value provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (str or None) – If specified, the analysis will discard the trials whose condition attribute does not match the value provided for this parameter. Otherwise, if this parameter is set on None (default), all trials will be considered.
subjects (list(str), str or None) – If specified, the analysis will discard the subjects whose name attribute does not match the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e., trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
audio_measure (str, optional) –
The measure used for each audio instance, can be either:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the audio data in the dataframe.
include_audio (bool, optional) – If set on True, includes the audio channel as one of the features for the PCA. By default, this parameter is set on False.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, use to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance (otherwise, the parameter sampling_frequency is unused).
show_graph (bool, optional) –
If set on True (default), shows the selected components (see next parameter) and the contribution of each joint to the components.

Note

Even if include_audio is set on True, the audio will not appear on the contribution silhouette on the left of each graph.
selected_components (list, int or None, optional) – Defines the components to plot. It can be a single component (e.g. 2) or a list of components (e.g. [0, 2, 5]). If set on None (default), all the components are plotted.
nan_behaviour (str) – If “ignore” (default), the labels containing values equal to numpy.NaN will be removed from the ICA. If “zero”, all the numpy.NaN will be turned to zero.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

krajjat.analysis_functions.mutual_information(experiment_or_dataframe, group=None, condition=None, subjects=None, trials=None, series=None, average='subject', include_randperm='whole', sequence_measure='distance', regression_with='envelope', sampling_frequency=50, nan_behaviour='ignore', verbosity=1, **kwargs)

Performs a mutual information regression between the sequence measure and the audio. Relies on scikit.

..versionadded:: 2.0

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any).) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
group (str or None) – If specified, the analysis will discard the trials whose group attribute does not match the value provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (str or None) – If specified, the analysis will discard the trials whose condition attribute does not match the value provided for this parameter. Otherwise, if this parameter is set on None (default), all trials will be considered.
subjects (list(str), str or None) – If specified, the analysis will discard the subjects whose name attribute does not match the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e., trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
series (str, optional) – Defines the series that divide the data for comparison. This value can take any of the column names from the dataframe (apart from the values indicated in sequence_measure and audio_measure). For instance, if group is selected, the correlation will be calculated and plotted for each individual group of the dataframe.
average (str or None, optional) –
Defines if an average regression is returned. This parameter can be:
- "subject": the regression is calculated for each subject and averaged across all subjects.
- "trial": the regression is calculated for each trial and averaged across all trials.
- None: the regression is calculated for the whole dataset.
include_randperm (bool or str, optional) –
Defines if to include the calculation of the regression for the randomly permuted data. This parameter can be:
- False: in that case, no regression on a random permutation of the data will be calculated.
- "whole": calculates a random permutation on the whole data.
- "individual": calculates a random permutation for each series.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
regression_with (str, optional) –
The measure used for the coherence. It can be either a joint label (in that case, the coherence uses the sequence measure of that joint), or an audio measure, among:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
Note

In the case where the value is an audio value, this parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, use to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance (otherwise, the parameter sampling_frequency is unused).
nan_behaviour (str) – If “ignore” (default), the labels containing values equal to numpy.NaN will be removed from the ICA. If “zero”, all the numpy.NaN will be turned to zero.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Private functions

krajjat.analysis_functions._make_dataframe(experiment_or_dataframe, sequence_measure, audio_measure, sampling_frequency, verbosity=1)

Loads a dataframe from a variety of inputs.

New in version 2.0.

Parameters:

experiment_or_dataframe (Experiment, pandas.DataFrame, str or list(any).) –
This parameter can be:
- A Experiment instance, containing the full dataset to be analyzed.
- A pandas DataFrame, generally generated from Experiment.get_dataframe().
- The path of a file containing a pandas DataFrame, generally generated from Experiment.save_dataframe().
- A list combining any of the above types. In that case, all the dataframes will be merged sequentially.
sequence_measure (str, optional) –
The measure used for each sequence instance, can be either:
- "x", for the values on the x-axis (in meters)
- "y", for the values on the y-axis (in meters)
- "z", for the values on the z axis (in meters)
- "distance_hands", for the distance between the hands (in meters)
- "distance", for the distance travelled (in meters, default)
- "distance_x" for the distance travelled on the x-axis (in meters)
- "distance_y" for the distance travelled on the y-axis (in meters)
- "distance_z" for the distance travelled on the z axis (in meters)
- "velocity" for the velocity (in meters per second)
- "acceleration" for the acceleration (in meters per second squared)
- "acceleration_abs" for the absolute acceleration (in meters per second squared)
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the sequence data in the dataframe.
audio_measure (str, optional) –
The measure used for each audio instance, can be either:
- "audio", for the original sample values.
- "envelope" (default)
- "pitch"
- "f1", "f2", "f3", "f4", "f5" for the values of the corresponding formant.
- "intensity"
Note

This parameter will be used to generate a dataframe if the parameter experiment_or_dataframe is an Experiment instance. In any other case, this parameter has to be equal to the title of the column containing the audio data in the dataframe.
sampling_frequency (int or float, optional) – The sampling frequency of the sequence and audio measures, used to resample the data when generating the dataframe, if the parameter experiment_or_dataframe is an Experiment instance, and to perform the coherence.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

dataframe – A dataframe containing the loaded data.

Return type:

pandas.DataFrame

krajjat.analysis_functions._get_dataframe_from_requirements(dataframe, group=None, condition=None, subjects=None, trials=None, verbosity=1)

Returns a sub-dataframe containing only the data where the group, condition, subjects and trails match the given parameters.

New in version 2.0.

Parameters:

dataframe (pandas.DataFrame) –
A pandas DataFrame, generally generated from Experiment.get_dataframe().
group (list(str), str or None) – If specified, the analysis will discard the trials whose group attribute does not match the value or values provided for this parameter. Otherwise, if this parameter is set on None (default), subjects from all groups will be considered.
condition (list(str), str or None) – If specified, the analysis will discard the trials whose condition attribute does not match the value or values provided for this parameter. Otherwise, if this parameter is set on None (default), all trials will be considered.
subjects (list(str), str or None) – If specified, the analysis will discard the subjects whose name attribute does not match the value(s) provided for this parameter. This parameter can be a string (for one subject), or a list of strings (for multiple subjects). Otherwise, if this parameter is set on None, all subjects will be considered. This parameter can be combined with the parameter group (default), to perform the analysis on certain subjects from a certain group.
trials (dict(str: list(str)), list(str), str or None) –
If specified, the analysis will discard the trials whose name attribute does not match the value(s) provided for this parameter. This parameter can be:
- A dictionary where each key is a subject name, and each value is a list containing trial names. This allows discarding or select specific trials for individual subjects.
- A list where each element is a trial name. This will select only the trials matching the given name, for each subject.
- The name of a single trial. This will select the given trial for all subjects.
- None (default). In that case, all trials will be considered.
This parameter can be combined with the other parameters to select specific subjects or conditions.

..note ::
In the case where at least two of the parameters group, condition, subjects or trials are set, the selected trials will be the ones that match all the selected categories. For example, if subjects is set on [“sub_001”, “sub_002”, “sub_003”] and trials is set on {“sub_001”: [“trial_001”, “trial_002”], “sub_004”: [“trial_001”]}, the analysis will run on the trials that intersect both requirements, i.e., trials 1 and 2 for subject 1. Trials from subjects 2, 3 and 4 will be discarded.
verbosity (int, optional) –
Sets how much feedback the code will provide in the console output:
- 0: Silent mode. The code won’t provide any feedback, apart from error messages.
- 1: Normal mode (default). The code will provide essential feedback such as progression markers and current steps.
- 2: Chatty mode. The code will provide all possible information on the events happening. Note that this may clutter the output and slow down the execution.

Returns:

dataframe – A dataframe, containing a subset from the original dataframe.

Return type:

pandas.DataFrame