FROGSFUNC_3_pathways
Context
PICRUSt2 is a software for predicting functional abundances based only on marker gene sequences. This tool is integrated inside FROGS suite as FROGSFUNC tools. They are split into 4 steps :
- FROGSFUNC_1_placeseqs_copynumber : Places the ASVs into a reference phylogenetic tree and predicts the copy numbers of the marker gene (16S, ITS or 18S).
- FROGSFUNC_2_functions: Predicts number of function copy number in each ASV and calculates functions abundances in each sample and ASV abundances according to marker copy number.
- FROGSFUNC_3_pathways : Calculates pathway abundances in each sample.
This data can be useful for generating hypotheses, but should always be interpreted cautiously especially when focused on a single function or predictions for a single ASV.
PICRUSt2 are based on 3 markers only, 16S, ITS and 18S. If you used another one (rpob, 23S, coi, ef1 etc.), you cannot used these 3 tools.
What it does
FROGSFUNC_3_pathways is the last step of PICRUSt2. It infers MetaCyc/KEGG pathway abundances based on EC/KO number abundances. There are three steps performed at this stage:
- Regroups EC or KO numbers to MetaCyc or KEGG reactions, depending of the unstrat abundances input file.
- Infers that MetaCyc/KEGG pathways are present based on these reactions with MinPath.
- Calculates and returns the abundance of pathways identified as present.
FROGSFUNC_3_pathways tool summaryCommand line
v4.1.0
usage: frogsfunc_pathways.py [-h] [--debug] [--per-sequence-contrib] -i
INPUT_FILE [-m MAP]
[--per-sequence-abun PER_SEQUENCE_ABUN]
[--per-sequence-function PER_SEQUENCE_FUNCTION]
[--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]]
[--normalisation] [-o OUTPUT_PATHWAYS_ABUND]
[--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB]
[--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS]
[--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ]
[-v] [-l LOG_FILE] [-t SUMMARY]
Infer the presence and abundances of pathways based on gene family abundances
in a sample.
optional arguments:
-h, --help show this help message and exit
--debug Keep temporary files to debug program.
--per-sequence-contrib
If stratified option is activated, a new table is
built. It will contain the abundances of each function
of each OTU in each sample. (in contrast to the
default stratified output, which is the contribution
to the community-wide pathway abundances.) Options
--per-sequence-abun and --per-sequence-function need
to be set when this option is used (default: False)
Inputs:
-i INPUT_FILE, --input-file INPUT_FILE
Input TSV function abundances table from
FROGSFUNC_step3_function (unstratified table :
frogsfunc_functions_unstrat.tsv).
-m MAP, --map MAP File required if you are not analyzing 16S sequences
with the Metacyc ("EC" function in the previous step)
database. IF MARKER STUDYED STILL 16S: it must
indicate the path to the PICRUSt2 KEGG pathways
mapfile, if you chose "KO" in the previous step (the
mapfile is available here : $PICRUSt2_PATH/default_fil
es/pathway_mapfiles/KEGG_pathways_to_KO.tsv) IF MARKER
STUDYED IS ITS OR 18S: Path to mapping file of
pathways to fungi reactions (the mapfile is available
here : $PICRUSt2_PATH/default_files/pathway_mapfiles/m
etacyc_path2rxn_struc_filt_fungi.txt ).
--per-sequence-abun PER_SEQUENCE_ABUN
Path to table of sequence abundances across samples
normalized by marker copy number (typically the
normalized sequence abundance table output at the
metagenome pipeline step:
frogsfunc_functions_marker_norm.tsv by default). This
input is required when the --per-sequence-contrib
option is set. (default: None).
--per-sequence-function PER_SEQUENCE_FUNCTION
Path to table of function abundances per sequence,
which was outputted at the hidden-state prediction
step (frogsfunc_copynumbers_predicted_functions.tsv by
default). This input is required when the --per-
sequence-contrib option is set. Note that this file
should be the same input table as used for the
metagenome pipeline step (default: None).
--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]
The ordered ranks levels used in the metadata
hierarchy pathways. [Default: ['Level1', 'Level2',
'Level3', 'Pathway']]
--normalisation To normalise data after analysis. Values are divided
by sum of columns , then multiplied by 10^6 (CPM
values). [Default: False]
Outputs:
-o OUTPUT_PATHWAYS_ABUND, --output-pathways-abund OUTPUT_PATHWAYS_ABUND
Pathway abundance file output. Default:
frogsfunc_pathways_unstrat.tsv]
--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB
Stratified output corresponding to contribution of
predicted gene family abundances within each predicted
genome.
--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS
Stratified output corresponding to contribution of
predicted gene family abundances within each predicted
genome.
--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ
Pathway abundance file output per sequences (if --per-
sequence-contrib set)
-v, --version show programs version number and exit
-l LOG_FILE, --log-file LOG_FILE
This output file will contain several information on
executed commands.
-t SUMMARY, --summary SUMMARY
Path to store resulting html file. [Default:
frogsfunc_pathways_summary.html]
Example of command line:
./frogsfunc_pathways.py \
--input-file frogsfunc_functions_unstrat_EC.tsv \
--normalisation \
--per-sequence-contrib \
--per-sequence-abun frogsfunc_functions_marker_norm.tsv \
--per-sequence-function EC_copynumbers_predicted.tsv \
--output-pathways-abund frogsfunc_pathways_unstrat.tsv \
--output-pathways-contrib frogsfunc_pathways_strat.tsv \
--output-pathways-predictions frogsfunc_pathways_predictions.tsv \
--output-pathways-abund-per-seq frogsfunc_pathways_unstrat_per_seq.tsv \
--summary frogsfunc_pathways_summary.html
Stratified output (–per-sequence-contrib, --per-sequence-abun, --per-sequence-function related paramaters ) is optionnal.
Galaxy
Function abundance file:
TSV function abundances table from FROGSFUNC_2_functions tool, frogsfunc_functions_unstrat_EC.tsv or frogsfunc_functions_unstrat_KO.tsv (unstratified table).
This input must be the unstratified table (the default table)
Taxonomic marker:
Output table of predicted marker gene copy numbers per sequence from FROGSFUNC_1_placeseqs_and_copynumbers tool.
Pathway reference:
Mapping of pathways to reactions.
- For 16S marker, choose Metacyc or KEGG in accordance with your choice in the FROGSFUNC_2_functions tool. If you want both, run this tool twice.
- For ITS or 18S marker, Metacyc is the only valid option.
Do you want to normalize the final output table ?
normalization = values are divided by sum of columns, then multiplied by 10^6 (Count Per Million values).
If this option is set, the pathway abundances file (frogsfunc_functions_unstrat.tsv) is normalized: values are divided by sum of columns, then multiplied by 10^6 (Count Per Million values).
This normalization allows to compare the samples between them. But to perform more precise statistical analysis, some tools as DESeq2 need the non-normalized abundance table to perform the normalization by themselves. So be careful which table to use for further analysis.
Outputs
HTML report
The HTML file summarizes information about pathway abundances within each sample.
What is the distribution of pathway abundances in the samples ?
- Samples: Mean of NSTI values of ASVs present in the sample, normalized by their abundances.
- Nb pathway retrieved : Number of pathway present in the sample.
- Display global distribution button allows to view the distribution of pathway abundances across all samples.
To view this distribution only on some samples, you check the boxes of the samples (first column of the table above), and click on the “Show distribution” button at the bottom of the table.
pathway distribution for selected samples The innermost circle represents the highest hierarchical level of pathways according to Metacyc or Kegg databases. The more we go outwards, the more the hierarchical level becomes precise until indicating the identifier of the pathway.
For exemple :
Generation of Precursor Metabolites and Energy > Fermentation > Fermentation of Pyruvate > PWY-6588
For more pathway details, double-click on a the interest pathway name.
Pathway abundance tables
Pathway abundances table - “unstratified”.
It is the pathways abundance predictions of metagenome, per sample.
- Classification column: the hierarchy classification of the pathway.
- db_link column: the url on the link accession ID (observation_name) of the pathway.
- observation_name: Accession identifier
- last columns: Abundances of these pathway in each samples.
Pathway abundances table - stratified (optional and command only).
optional and only for command line - not available on galaxy version
This default stratified pathway abundance table represents how much each ASV is contributing to the community-wide pathway abundance and not what the pathway abundance is predicted to be within the predicted genome of that ASV alone.
N.B.: In this above example, the first N lines of the file correspond to the N ASVs in the sample SC1703-104TTGCCC-B6TMLL001R, and so on for each sample.
Please note that requesting the stratified output files implies a longer process time. And, this file is very large, there are as many lines as there are samples x ASVs x pathways.
- sample: sample names
- function: accession ID from pathway database
- taxon: ASVs names
- taxon_abun: sequence number of ASV in the sample divided by number of marker copy number.
- taxon_rel_abun: This is the same as the “taxon_abun” column, but in terms of relative abundance (so that the sum of all ASV abundances per sample is 100).
- genome_function_count: Predicted copy number of this pathway per ASV.
- taxon_function_abun: Multiplication of “taxon_abun” column by “genome_function_count” column.
- taxon_rel_function_abun: Multiplication of “taxon_rel_abun” column by “genome_function_count” column.
- norm_taxon_function_contrib: This is the same as the “taxon_rel_function_abun” column, but in terms of relative abundance in the sample (so that the sum of all number of this column equals 1).
Abundance table of pathways per ASV (only with stratified option).
FROGSFUNC_3_pathways
FROGSFUNC_3_pathways
Context
PICRUSt2 is a software for predicting functional abundances based only on marker gene sequences. This tool is integrated inside FROGS suite as FROGSFUNC tools. They are split into 4 steps :
This data can be useful for generating hypotheses, but should always be interpreted cautiously especially when focused on a single function or predictions for a single ASV.
PICRUSt2 are based on 3 markers only, 16S, ITS and 18S. If you used another one (rpob, 23S, coi, ef1 etc.), you cannot used these 3 tools.
What it does
FROGSFUNC_3_pathways is the last step of PICRUSt2. It infers MetaCyc/KEGG pathway abundances based on EC/KO number abundances. There are three steps performed at this stage:
Command line
v4.1.0
usage: frogsfunc_pathways.py [-h] [--debug] [--per-sequence-contrib] -i INPUT_FILE [-m MAP] [--per-sequence-abun PER_SEQUENCE_ABUN] [--per-sequence-function PER_SEQUENCE_FUNCTION] [--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]] [--normalisation] [-o OUTPUT_PATHWAYS_ABUND] [--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB] [--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS] [--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ] [-v] [-l LOG_FILE] [-t SUMMARY] Infer the presence and abundances of pathways based on gene family abundances in a sample. optional arguments: -h, --help show this help message and exit --debug Keep temporary files to debug program. --per-sequence-contrib If stratified option is activated, a new table is built. It will contain the abundances of each function of each OTU in each sample. (in contrast to the default stratified output, which is the contribution to the community-wide pathway abundances.) Options --per-sequence-abun and --per-sequence-function need to be set when this option is used (default: False) Inputs: -i INPUT_FILE, --input-file INPUT_FILE Input TSV function abundances table from FROGSFUNC_step3_function (unstratified table : frogsfunc_functions_unstrat.tsv). -m MAP, --map MAP File required if you are not analyzing 16S sequences with the Metacyc ("EC" function in the previous step) database. IF MARKER STUDYED STILL 16S: it must indicate the path to the PICRUSt2 KEGG pathways mapfile, if you chose "KO" in the previous step (the mapfile is available here : $PICRUSt2_PATH/default_fil es/pathway_mapfiles/KEGG_pathways_to_KO.tsv) IF MARKER STUDYED IS ITS OR 18S: Path to mapping file of pathways to fungi reactions (the mapfile is available here : $PICRUSt2_PATH/default_files/pathway_mapfiles/m etacyc_path2rxn_struc_filt_fungi.txt ). --per-sequence-abun PER_SEQUENCE_ABUN Path to table of sequence abundances across samples normalized by marker copy number (typically the normalized sequence abundance table output at the metagenome pipeline step: frogsfunc_functions_marker_norm.tsv by default). This input is required when the --per-sequence-contrib option is set. (default: None). --per-sequence-function PER_SEQUENCE_FUNCTION Path to table of function abundances per sequence, which was outputted at the hidden-state prediction step (frogsfunc_copynumbers_predicted_functions.tsv by default). This input is required when the --per- sequence-contrib option is set. Note that this file should be the same input table as used for the metagenome pipeline step (default: None). --hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]] The ordered ranks levels used in the metadata hierarchy pathways. [Default: ['Level1', 'Level2', 'Level3', 'Pathway']] --normalisation To normalise data after analysis. Values are divided by sum of columns , then multiplied by 10^6 (CPM values). [Default: False] Outputs: -o OUTPUT_PATHWAYS_ABUND, --output-pathways-abund OUTPUT_PATHWAYS_ABUND Pathway abundance file output. Default: frogsfunc_pathways_unstrat.tsv] --output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB Stratified output corresponding to contribution of predicted gene family abundances within each predicted genome. --output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS Stratified output corresponding to contribution of predicted gene family abundances within each predicted genome. --output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ Pathway abundance file output per sequences (if --per- sequence-contrib set) -v, --version show programs version number and exit -l LOG_FILE, --log-file LOG_FILE This output file will contain several information on executed commands. -t SUMMARY, --summary SUMMARY Path to store resulting html file. [Default: frogsfunc_pathways_summary.html]
Example of command line:
Stratified output (–per-sequence-contrib, --per-sequence-abun, --per-sequence-function related paramaters ) is optionnal.
Galaxy
Function abundance file:
TSV function abundances table from FROGSFUNC_2_functions tool, frogsfunc_functions_unstrat_EC.tsv or frogsfunc_functions_unstrat_KO.tsv (unstratified table).
This input must be the unstratified table (the default table)
Taxonomic marker:
Output table of predicted marker gene copy numbers per sequence from FROGSFUNC_1_placeseqs_and_copynumbers tool.
Pathway reference:
Mapping of pathways to reactions.
Do you want to normalize the final output table ?
normalization = values are divided by sum of columns, then multiplied by 10^6 (Count Per Million values).
If this option is set, the pathway abundances file (frogsfunc_functions_unstrat.tsv) is normalized: values are divided by sum of columns, then multiplied by 10^6 (Count Per Million values).
This normalization allows to compare the samples between them. But to perform more precise statistical analysis, some tools as DESeq2 need the non-normalized abundance table to perform the normalization by themselves. So be careful which table to use for further analysis.
Outputs
HTML report
The HTML file summarizes information about pathway abundances within each sample.
What is the distribution of pathway abundances in the samples ?
To view this distribution only on some samples, you check the boxes of the samples (first column of the table above), and click on the “Show distribution” button at the bottom of the table.
The innermost circle represents the highest hierarchical level of pathways according to Metacyc or Kegg databases. The more we go outwards, the more the hierarchical level becomes precise until indicating the identifier of the pathway.
For more pathway details, double-click on a the interest pathway name.
Pathway abundance tables
Pathway abundances table - “unstratified”.
It is the pathways abundance predictions of metagenome, per sample.
Pathway abundances table - stratified (optional and command only).
optional and only for command line - not available on galaxy version
This default stratified pathway abundance table represents how much each ASV is contributing to the community-wide pathway abundance and not what the pathway abundance is predicted to be within the predicted genome of that ASV alone.
N.B.: In this above example, the first N lines of the file correspond to the N ASVs in the sample SC1703-104TTGCCC-B6TMLL001R, and so on for each sample.
Please note that requesting the stratified output files implies a longer process time. And, this file is very large, there are as many lines as there are samples x ASVs x pathways.
Abundance table of pathways per ASV (only with stratified option).
A work by FROGS team