Ready-to-use scripts¶
Pergola provides a series of scripts that are available in your system after its installation installation.
These scripts try to wrap up the most common functionalities of Pergola library.
Contents
pergola¶
pergola
enables the user to execute many of the main pergola functionalities.
If you prefer to code your own scripts you can see some examples at the tutorials section.
Tip
To reproduce all pergola
commands shown in this section you can download the following data set:
The data consists of a series of recordings corresponding to three weeks of the feeding behavior of C57BL/6 mice fed either with a high-fat or a standard chow. To download and uncompress the data you can use the following commands:
mkdir data
wget -O- https://zenodo.org/record/1168982/files/feeding_behavior_C57BL6_mice_HF.tar.gz | tar xz -C data
If you install pergola
from GitHub or use the Docker image you can also find this data set on:
/path_to_installation/pergola/sample_data`.
Pergola options allow the user to use the main features of Pergola library in a ready-to-use script.
We divided in the five following sections the available arguments:
Note
All the command line examples can be reproduce using the files found in the feeding_behavior_C57BL6_mice_HF.tar.gz tarball file.
Data input¶
Available data input parameters are listed on the table below:
Argument | short | Description | Example |
---|---|---|---|
--input |
-i |
Path of input data file | -i /foo/feeding_behavior_HF_mice.csv.csv |
--mapping_file |
-m |
Path of mapping file | -m /foo/my_mappings.txt |
--field_separator |
-fs |
Field separator of mapping file | -fs " " |
--no_header |
-nh |
The input file has not header (column names) | -nh |
--fields_read |
-s |
List of columns name (used in mappping file) | -s 'CAGE' 'EndT' 'Nature' 'StartT' 'Value' |
Only two of the data input arguments are mandatory to run pergola
:
- the
-i
,--input
argument specifies the file to convert.- the
-m
,--mapping_file
argument contains the mappings between the input file fields and the terms defined in the pergola ontology.
Tip
pergola
can take as input files in CSV and xlsx format. You can see examples of both in the
input data section.
In this manner, the minimal command to run pergola
provided that the input and mappings file are correctly formatted
would be:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt
As mentioned before you can also use a xlsx file as input instead:
pergola -i ./data/feeding_behavior_HF_mice.xlsx -m ./data/b2p.txt
Note
The format of the mapping file consists in a text file containing in each line the correspondence between a field in the input data field and the pergola ontology. For more details, refer to the pergola ontology section and to the example of a mapping file.
Tip
Any field on the input data that should not be used by pergola must be set to dummy
term in the mapping file.
The rest of arguments are optional and enable the user to provide additional information about the input data in the cases it does not entirely fits the default pergola input data format.
For instance, the -fs
, --field_separator
sets the delimiter that separates fields inside the input data file
when it is not set to tabs (default). As an example if fields are delimited by ,
, you can specify it as shown below:
pergola -i ./data/feeding_behavior_HF_mice_commas.csv -m ./data/b2p.txt -fs ","
Pergola needs that input files columns are mapped into pergola ontology terms and thus, if the input file has not header you should provide an ordered
list with the corresponding fields of your file as in the example below, using the -nh
, --no_header
argument together with the -s
, --fields_read
:
pergola -i ./data/feeding_behavior_HF_mice_no_header.csv -m ./data/b2p.txt -nh -s 'CAGE' 'EndT' 'Nature' 'StartT' 'Value'
Tip
To avoid setting fields names you can use the reserved word ordinal
with -s argument. This allows to use numbers
instead of terms in the mapping file. Below you can see an example of a assignation of the mapping file using this
option:
behavioral_file:1 > pergola:track
The command would result in:
pergola -i ./data/feeding_behavior_HF_mice_no_header.csv -m ./data/b2p_ordinal.txt -nh -s 'ordinal'
File formats¶
Pergola can convert your data to several genomic file formats. The BED (default option) and GFF file formats provide the perfect scaffold to encode events in the form of discrete time intervals such as for instance a meal. In the other hand, BedGraph format provides a perfect structure to store continuous data such as for instance any behavioral feature measure continuously along time (speed along a trajectory), or any score derived from the original data (cumulative values applying a binning or statitiscal parameter).
Argument | short | Options | Description | Example |
---|---|---|---|---|
--format |
-f |
bed | Converts data to BED format | -f bed |
gff | Converts data to BedGraph format | -f gff |
||
bedGraph | Converts data to format | -f bedGraph |
Following our previous example the command line to convert our data to BedGraph format will be:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -f bedGraph
Note
Pergola converts data by default to BED file format. Refer to the mapping file section to see pergola’s adapted genomic formats.
Filtering¶
Filtering arguments allow you to select a part of your input data based on pergola assigned fields.
Argument | short | Description | Example |
---|---|---|---|
--tracks |
-t |
List of tracks to keep | -t track_id_1 track_id_2 |
--range |
-r Track range to keep (numerical) |
-r 1 10 |
|
--track_actions |
-a |
Action to perform on selected tracks | -t track_id_1 track_id_2 -a split_all |
--data_types_list |
-dl |
List of data types to keep | -dl data_type_one data_type_2 |
--data_types_actions |
-d |
Action to perform on selected data types | -dl data_type_one data_type_2 -d one_per_channel |
Pergola allows you to filter a subset of your data input based on the field set as track
in your
mapping file.
The example below shows how to get the data only from animal 1 4 7 (tracks):
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -t 1 4 7 -dl food_sc food_fat
If you want to get all tracks from 1 to 4 you can then use the -r
option provided your track
field is numeric:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -r 1-4
Tip
By default tracks selected by -r
option are joined together in a single output track. You can use -a
option
to change this behavior.
The -a
option allows to join together tracks in the same file. Available -a
options are:
track_actions | Description |
---|---|
split_all | Split all tracks into different files |
join_all | Join all tracks in a single file |
join_odd | Join only odd tracks in a single file |
join_even | Join only even tracks in a single file |
An example of how to join all tracks in the same file would be:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -a join_all
Tip
You can combine -t
or -r
options with -a
in order to filter tracks and join them as you prefer
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -t 1 2 3 -a join_all
It is possible to provide pergola with a list of the field assigned to data_type
pergola ontology term to be kept using -dl
argument.
For instance, in the code below only events assigned to “food_fat” data_type
term are kept:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -dl food_fat
Besides -d
option allows to combine all data types into a single output file or split them in different files:
track_actions | Description |
---|---|
all | Join all data_type into a single file |
one_per_channel | Split each data_type into different files |
Both -dl
and -d
options can be combined into a single command, in the example below only events tagged as
food_sc and food_fat will be kept and joined for each mice id (track):
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -dl food_sc food_fat -d all
Temporal arguments¶
Given the prominent temporal nature of longitudinal data, pergola provides several arguments to obtain time-based features or to process time intervals.
Argument | short | Options | Description | Example |
---|---|---|---|---|
--relative_coord |
-e |
Time relative to first time point | -e |
|
--window_size |
-w |
integer | Bins the data in time windows of the selected size | -w 300 |
--window_mean |
-wm |
Averages by the window size | -wm |
|
--value_mean |
-vm |
Averages by the data items within the window | -vm |
|
--min_time |
-min |
integer | Min time point from which data will be processed | -min 10 |
--max_time |
-max |
integer | Max time point from which data will be processed | -max 1000 |
--intervals_gen |
-n |
Creates two time points from an original input with a single one | -n |
|
--interval_step |
-ns |
Sets the step to create end time points when -n option is set | -ns 100 |
|
--multiply_intervals |
-mi |
integer | Multiple time points by the selected value | -mi 1000 |
It is possible that input files do not start at time 0. The relative_coord
transforms the time points relative to the first
time point inside the file.
For instance, if time inside the file is expressed as epoch time as in the example below:
CAGE StartT EndT Value Nature
1 1335986151 1335986261 0.06 food_sc
1 1335986275 1335986330 0.02 food_sc
1 1335986341 1335986427 0.02 food_sc
Applying the -e
it will result into the time coordinates below:
pergola -i ./data/file_0.csv -m data/b2p.txt -e
0 110
124 179
190 276
Pergola enables the user to bin the data using equidistant time windows when formatting data to BedGraph files. The -w
arguments sets the size of these windows.
For example:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -f bedGraph -w 300
Note
The -w
argument can be only used together with -f bedGraph
option
The -wm
argument calculates the mean value inside each of the window of time.
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -f bedGraph -w 300 -wm
The -min
and -max
arguments set which is the first and the last time point to be present in pergola output file.
This can be used for instance to unify the beginning (example below) or end of files:
file_1.csv
CAGE StartT EndT Value Nature
1 20 30 0.02 food_sc
1 50 60 0.02 food_fat
pergola -i ./data/file_1.csv -m ./data/b2p.txt -f bedGraph -w 10 -min 0
Note
The time points inserted at the beginning of the file using -min
and -max
will be set to zero value. In the example above
the beginning of the output file will then look as follows:
chr1 0 10 0
chr1 10 20 0.0
chr1 20 30 0.02
If the input file has only a single time point, pergola can process it using the -n
argument. This situation is common
in files encoding data that are in equidistant time points, as the following one:
id time value
1 1 8
1 2 13
1 3 21
In this case the -n
argument generates an interval for each of the items of the file:
pergola -i ./data/file_2.csv -m ./data/file_2_to_p.txt -f bedGraph -n
In the case were the time inside the input file is encode as decimal values (for instance tenth of seconds).
time value
1 -30.98
2 -5.19
3 23.96
4 -2.75
It is possible to multiply the time stamp inside this input file by a given factor using the -mi
argument
and for instance getting the time stamps in milliseconds:
pergola -i ./data/file_3.csv -m ./data/file_2_to_p.txt -n -mi 1000 -f bedGraph
As a result two time point intervals will be returned in output file:
chr1 0 9 -30.98
chr1 10 19 -5.19
chr1 20 29 23.96
chr1 30 31 -2.75
Note
This last argument is useful because provided that genomic tools are always expressed as integer values, if our time points are expressed as decimals sometimes it will be necessary to convert them to integer values.
Data output¶
There are several arguments related to optional fields inside the genomic file formats. These arguments are related to the data visualization in genomic tools.
Argument | short | Options | Description | Example |
---|---|---|---|---|
--no_track_line |
-nt |
When set bed file does not include a track line (Browser configuration) | -nt |
|
--bed_label |
-bl |
BED files include labels describing each interval (data type) | -bl |
|
--color_file |
-c |
Path to a file setting a color for the different data types to be displayed | -c /your_path/color.txt |
Some genomic software as for example genome browsers use the track line to get parameters about the visualization of
the data. To avoid the track line you can use the -nt
option.
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -nt
The name field of the BED file enables to display a label for each record encoded inside the file. Pergola uses this field to display the data_type of each file line when the option is set:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -bl
The header of a BED file resulting of this option will look as the following one:
track name="1_food_sc" description="1 food_sc" visibility=2 itemRgb="On" priority=20
chr1 1335986151 1335986261 food_sc 0.06 + 1335986151 1335986261 113,113,113
chr1 1335986275 1335986330 food_sc 0.02 + 1335986275 1335986330 170,170,170
To choose which color will be use to display each of the data types inside the file, it is possible to provide pergola with a file coding the colors to be used. The file will consists:
food_sc orange
food_fat blue
How to use it is shown in the following example:
pergola -i ./data/feeding_behavior_HF_mice.csv -m ./data/b2p.txt -c ./data/color_code.txt
Tip
In order to see all available options up you can simply type pergola -h
jaaba_to_pergola¶
Jaaba annotates behavior using video recordings of animals. jaaba_to_pergola is available in your system after you installed pergola. This script allows user to adapt Jaaba data using Pergola for its visualization and analysis.
The available jaaba_to_pergola modes allow to deal with two types of jaaba data:
Note
In order to see all available options up you can simply type jaaba_to_pergola -h
Jaaba features¶
Jaaba uses a series of features or variables derived from the video-based trajectories of behaving animals to annotate behavior. Pergola allows to obtain these features.
Pergola allows to obtain these features as csv files using the fc
mode. Users can also directly process them using
pergola
by using the fp
mode.
Available arguments are:
Argument | short | Description |
---|---|---|
--input |
-i |
Directory where jaaba features files are placed |
--jaaba_features |
-jf |
Features to extract |
--dumping_directory |
-dd |
Directory for dumping csv files |
For example it is possible to obtain JAABA features formatted as CSV files using fc
mode:
$jaaba_to_pergola fc -i "/jaaba_data/perframe/" -jf velmag dtheta -dd "/output_dir/"
Note
The above example shows how to obtain velmag
and dtheta
features from the perframe folder where
jaaba MAT features files are stored and dump them in a directory output_dir
.
The fp
mode makes it possible to convert the selected features into bed or bedGraph files. At the same time it is
possible to process the data using any of the pergola options:
$jaaba_to_pergola fp -i "/jaaba_data/perframe/" -jf velmag dtheta -dd "/output_dir/" -m "jaaba2pergola_mapping.txt" -f bedGraph -w 300
Jaaba scores¶
Pergola can convert Jaaba annotations of animal behavior for its visualization and analysis. Jaaba predicts the periods of time within which animals are having a given behavior along a trajectory. These predictions can be dumped into a MAT-file format that contain both the behavioral events predicted and the scores of the reliability of each event.
Jaaba predictions can be also stored in CSV files or process to bed or bedGraph files applying any pergola option. To choose between these two options
users can set the sc
or the sp
mode respectively.
The possible arguments for this modes are:
Argument | short | Description |
---|---|---|
--input |
-i |
Path to jaaba scores file |
Hence, the command line to process a scores Jaaba file into a CSV formatted file using sc
mode will be:
$jaaba_to_pergola sc -i predicted_behavior.mat
In the case of sp
mode, besides we can use any pergola option:
$jaaba_to_pergola sc -i predicted_behavior.mat -m jaaba_scores2pergola_mapping.txt -f bed