Welcome to IsoAnalyst

IsoAnalyst is a mass spectrometry (MS) metabolomics data analysis program designed to determine the number of stable isotopically labeled (SIL) tracers incorporated into metabolites across parallel SIL tracer experiments. Unlabeled control samples are required for each SIL tracer used, and separate pre-processed datasets from both labeled and unlabeled samples are required input. IsoAnalyst compares isotopologue distributions between MS features in the unlabeled and labeled datasets and generates a summary file containing the number of heavy isotopes incorporated into every MS feature in each SIL condition.

Installation

Create a conda environment with dependencies

conda env create -f environment.yml

This should install a Python 3.8+ environment with all the necessary packages.

You then need to activate the virtual environment.

conda activate isoanalyst

Then to install the IsoAnalyst CLI program run

python setup.py install

To test whether the program has been properly installed, you can run

isoanalyst --version

Note: This program uses the rtree library, which requires bindings for libspatialindex. This has been known to cause problems on Windows installs. rtree installed from PyPi (pip install rtree) should work out of the box. The environment.yml installation specification has been configured to do this for you. If you receive an error on running isoanalyst --version this library is the most likely culprit. You can try to install it manually again with conda install -c conda-forge rtree (this has been found to be required on macOS).

CLI Options

The isoanalyst program is split into four sub-applications which are intended to be run in sequence. To run all steps in a simple, reproducible manner, the Snakemake pipeline is recommended.

Both the root app and each sub-app have their own CLI help context, which is accessible through the --help flag.

Root app help

[jvansan@cpu ~]$ isoanalyst --help
Usage: isoanalyst [OPTIONS] COMMAND [ARGS]...

  Isoanalyst CLI entrypoint

  Processing steps:

  validate - Step 0 : validate input file structure

  prep - Step 1 : Prepare ground truth list of features including
  dereplication and optional blank removal

  scrape - Step 2 : Scrape all scan data for each of the feature

  analyze - Step 3 : Analyze all scan data for all of the data

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  analyze   Performs Stable Isotope Labelling detecting and analysis
  prep      Prepares the ground truth feature list
  scrape    Collects relevant scan data for all members of the ground truth...
  validate  Performs some simple checks on your input specification file

Two common parameters are required for all steps of the program:

-n or --name is the experiment name and is used as the output path and for marking mass features.

-i or --input_specification is the path the input specification file.

The number of parallel jobs is available for all steps except the validate step.

-j or --jobs

The number of parallel jobs to run. A job for each can condition will be run in parallel. Defaults to -1, indicating all available detected CPU cores/threads.

Validate

The isoanalyst validate step performs some simple checks on your input specification file, such as making sure all the required information is present, and that there are no missing specified input files.

Validate step help

[jvansan@cpu ~]$ isoanalyst validate --help
Welcome to IsoAnalyst!
Usage: isoanalyst validate [OPTIONS]

  Performs some simple checks on your input specification file

Options:
  -i, --input_specification PATH  Input specification filename  [required]
  -n, --name TEXT                 Experiment name - used as output file
                                  directory  [required]

  --help                          Show this message and exit.

There are no additional options for the validate step.

Prep

The isoanalyst prep step collects all your feature lists to produce a single ground truth feature lists of unlabelled features.

This step performs a replicate comparison on each condition, and can remove features present from a “BLANK” condition. All the features from the conditions are then aggregated into the ground truth feature list, which will be output as NAME/NAME_all_features.csv (where NAME is the specified experiment name).

Individual lists of ions from each condition (COND) are all stored in NAME/COND/all_ions_sorted.csv for all the listed features and NAME/COND/all_ions_averaged.csv for all the replicate compared features.

Prep step help

[jvansan@cpu ~]$ isoanalyst prep --help
Welcome to IsoAnalyst!
Usage: isoanalyst prep [OPTIONS]

  Prepares the ground truth feature list

Options:
  -i, --input_specification PATH  Input specification filename  [required]
  -n, --name TEXT                 Experiment name - used as output file
                                  directory  [required]

  --blank-remove / --no-blank-remove
                                  Perform blank removal during feature
                                  aggregation (or not).  [default: True]

  --minreps INTEGER               Minimum reps to consider in replication
                                  comparison  [default: 3]

  --mztol FLOAT                   M/Z tolerance in PPM  [default: 10.0]
  --rttol FLOAT                   rettime tolerance in min  [default: 0.03]
  -j, --jobs INTEGER              Maximum number of parallel processes
                                  [default: -1]

  --help                          Show this message and exit.

OPTIONS

--blank-remove / --no-blank-remove

Perform blank removal on the ground truth feature list. This will only work if any “BLANK” condition input feature files are present in the input specification. True == --blank-remove by default as blanks are highly recommended.

--minreps

The minimum number of replicates a feature must be present in a single condition to be considered a real feature. Default value is 3.

--mztol

The M/Z tolerance for replicate comparison in PPM. Defaults to 10.0.

--rttol

The retention time tolerance for replicate comparison in minutes. Defaults to 0.03.

Scrape

The isoanalyst scrape step collects all the scan by scan data for each of the features in the ground truth feature list.

This step performs collects all scans for each of the ground truth features from the prepared labelled and unlabelled all scan data. It collects all the scans for each replicate individually to make the analysis step possible.

Paired down all scan data for each condition after import and optional intensity tolerance filtering (COND) are stored in NAME/all_scan_data/all_scans_COND.csv,and extracted scans are stored in NAME/all_scan_data/all_ions_COND.csv.

Scrape step help

[jvansan@cpu ~]$ isoanalyst scrape --help
Welcome to IsoAnalyst!
Usage: isoanalyst scrape [OPTIONS]

  Collects relevant scan data for all members of the ground truth feature
  list

Options:
  -i, --input_specification PATH  Input specification filename  [required]
  -n, --name TEXT                 Experiment name - used as output file
                                  directory  [required]

  --minscans INTEGER              Minimum number of scans  [default: 2]
  --mztol FLOAT                   M/Z tolerance in PPM  [default: 10.0]
  --minintensity INTEGER          Minimum intensity threshold for data
                                  [default: 0]

  --minrt FLOAT                   Ignores data before minimum RT (minutes)
                                  [default: 0.8]

  --scanwindow INTEGER            Number of scans to consider for isotope
                                  alignment. This only applies if you data is
                                  missing scan ranges.  [default: 10]

  -j, --jobs INTEGER              Maximum number of parallel processes
                                  [default: -1]

  --help                          Show this message and exit.

OPTIONS

--minintensity

The minimum intensity threshold for scan data. This is applied during scan data import. While you may be able to detect more labelled peaks with a lower intensity, this can drastically slow down processing. Defaults to 0, i.e. no filtering.

--minrt

The minimum retention time value for considering scan data. In our analysis we ignored the first 0.8 minutes, thus this is the default value.

--minscans

The minimum number of scans to be considered for collection. Less than two scans will does not work during analysis, thus the Default is 2.

--mztol

The M/Z tolerance for scan scrapping in PPM. Defaults to 10.0.

--scanwindow

The number of scans to consider on either side of the central scan of a ground truth feature. This is only applicable if you import data without a scan range for features, such as when importing from MzMine feature lists.

Analyze

The isoanalyst analyze step uses the ground truth feature list and collected scans to detect stable isotope labelling and summarize the extent of labelling of each feature in each of the specified conditions.

For each condition (COND), the algorithm computes slope data between each of the isotopomers, output in NAME/all_slope_data/all_slope_data_COND.csv. These slopes are then used to determine the extend of labelling, output in NAME/all_isotope_analysis/iso_analysis_COND.csv. These results are aggregated in NAME/NAME_data_summary.csv and a filtered version (based on the --minconditions flag) in NAME/NAME_data_summary_filtered.csv.

Analyze step help

[jvansan@cpu ~]$ isoanalyst analyze --help
Welcome to IsoAnalyst!
Usage: isoanalyst analyze [OPTIONS]

  Performs Stable Isotope Labelling detecting and analysis

Options:
  -i, --input_specification PATH  Input specification filename  [required]
  -n, --name TEXT                 Experiment name - used as output file
                                  directory  [required]

  --minconditions INTEGER         Minimum number of conditions to output in
                                  filtered output  [default: 1]

  --minscans INTEGER              Minimum number of scans  [default: 5]
  -j, --jobs INTEGER              Maximum number of parallel processes
                                  [default: -1]

  --help                          Show this message and exit.

OPTIONS

--minconditions

The minumum number of conditions to be considered for annotation in the final filtered output file.

--minscans

The minimum number of scans to be considered for slope analysis used in isotope labelling detection. This number must be greater than or equal to the number of minscans from the scrape step.

Input Specification

To remove the error prone inference from the program, a simple CSV input specification has been devised for input files and related parameters. Below is a simple example which matches the example_workflow in the repository.

All columns are required. The columns are defined in the following manner

filepath
- This is the path the input file, either complete or relative to where the isoanalyst program is run from.
organism
- The name of the organism for that particular experiment. This may commonly be all the same value.
type
- The type of input file: One of feature list f, or all scan s.
element
- The element involved in SIL detected (currently the program only support C and N) in the given condition.
isotope
- The isotope being detected in the given condition.
condition
- The name of the experimental SIL condition (or “BLANK”).
replicate
- The replicate number for a given condition.

All fields are required for all conditions except isotope and element for the “BLANK” condition.

Example Input Specification

filepath	organism	type	element	isotope	condition	replicate
feature_lists/20180409_RLUS135312ACED0-1_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	ACE	1
feature_lists/20180409_RLUS135312ACED0-2_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	ACE	2
feature_lists/20180409_RLUS135312ACED0-3_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	ACE	3
feature_lists/20180409_RLUS135312ACED0-4_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	ACE	4
feature_lists/20180409_RLUS135314GLUD0-1_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	GLU	1
feature_lists/20180409_RLUS135314GLUD0-2_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	GLU	2
feature_lists/20180409_RLUS135314GLUD0-3_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	GLU	3
feature_lists/20180409_RLUS135314GLUD0-4_seen.mzml_chromatograms_deconvoluted.csv	RLUS1353	f	C	12	GLU	4
mzmls/20180409_RLUS135312ACED0-1_seen.mzml	RLUS1353	s	C	12	ACE	1
mzmls/20180409_RLUS135312ACED0-2_seen.mzml	RLUS1353	s	C	12	ACE	2
mzmls/20180409_RLUS135312ACED0-3_seen.mzml	RLUS1353	s	C	12	ACE	3
mzmls/20180409_RLUS135312ACED0-4_seen.mzml	RLUS1353	s	C	12	ACE	4
mzmls/20180409_RLUS135314GLUD0-1_seen.mzml	RLUS1353	s	C	12	GLU	1
mzmls/20180409_RLUS135314GLUD0-2_seen.mzml	RLUS1353	s	C	12	GLU	2
mzmls/20180409_RLUS135314GLUD0-3_seen.mzml	RLUS1353	s	C	12	GLU	3
mzmls/20180409_RLUS135314GLUD0-4_seen.mzml	RLUS1353	s	C	12	GLU	4
mzmls/20180410_RLUS135313ACED0-1_seen.mzml	RLUS1353	s	C	13	ACE	1
mzmls/20180410_RLUS135313ACED0-2_seen.mzml	RLUS1353	s	C	13	ACE	2
mzmls/20180410_RLUS135313ACED0-3_seen.mzml	RLUS1353	s	C	13	ACE	3
mzmls/20180410_RLUS135313ACED0-4_seen.mzml	RLUS1353	s	C	13	ACE	4
mzmls/20180410_RLUS135315GLUD0-1_seen.mzml	RLUS1353	s	N	15	GLU	1
mzmls/20180410_RLUS135315GLUD0-2_seen.mzml	RLUS1353	s	N	15	GLU	2
mzmls/20180410_RLUS135315GLUD0-3_seen.mzml	RLUS1353	s	N	15	GLU	3
mzmls/20180410_RLUS135315GLUD0-4_seen.mzml	RLUS1353	s	N	15	GLU	4

Notes

Although this is discussed in greater details in the Data Requirements Section, here are some important guidelines for organizing your input specification.

Blank data should only be present as a feature list
Include only unlabelled data as features lists. Doing otherwise is likely to corrupt your analysis.
Detection of the M0 peak in the unlabelled condition should always be done with the 12/13 C isotope. This is why in the example_workflow input specification, the isotope = 12 and element = C for the GLU unlabelled condition, even though the labelling is done with 15N (isotope = 15, element = N for the GLU labelled condition).

Data Requirements

1.Scan Data

Files containing centroided scan data are required for all unlabeled controls and labeled samples. Four technical replicates are recommended for each condition.

Generic CSV tabular inputs minimally with ["FunctionScanIndex", "RT", "MZ", "Intensity"] column headers. These are compatible msExpress func001 files.

Scan data can be imported as mzMLs from msConvert. Standard GNPS settings recommended for MSconvert.

2. Feature Lists

Feature lists are required for unlabeled control samples only. Four technical replicates are recommended for every feedstock condition used. Three additional feature lists are required for solvent injection blanks.

Generic CSV tabular inputs minimally with ["Sample", "PrecMz", "PrecZ", "PrecIntensity", "RetTime", "ScanLowRange", "ScanHighRange"] column headers. These are compatible msExpress CPPIS files.

Feature lists may be made using the standard MZmine 2 workflow. Parameters for mass detection, chromatogram building, and deconvolution are dependent on the MS instrument used and data quality (ie signal to noise). The isotope peak grouper function must be used in order to deistope all features in the list. Every feature in this list should be identified by the monosisotopic m/z value.

Export .csv files from MZmine 2 with the columns ["row m/z", "row retention time"]

Peak intensity is not required for feature lists; m/z and retention time values are used to get the isotopologue peak intensities from the scan data. The program will then set a window of 10 scans on both sides of the center scan index detected during all scan importing.

Data Collection Recommendation

Not supported for DDA.

Either DIA with interleaved scans or MS1 only.

Development

Unit tests run on the pytest framework. Be sure to install pytest in your development environment (pip instll pytest or conda install pytest) and make sure all tests pass before committing code.

Running pytest is as simple as running pytest from the root directory of the code repository.

To install the isoanalyst toolchain in a development environment, during installation use

pytest setup.py develop

Data Output and Visualization

The summary output file from IsoAnalyst contains rows corresponding to every MS feature from the ground truth feature list which has SIL incorporation detected in two or more conditions. Using the parameters in snakemake, this filtered summary list can be unfiltered (containing all MS features from ground truth list regardless of SIL incorporation) or filtered to contain features labeled in a minimum number of conditions. We recommend two conditions in order to filter out those features with labeling only in a single condition.

The columns of the summary file represent each SIL condition used. In these data the following abbreviations are used: ACE ([1-13C]acetate), PROP ([1-13C]propionate), MET ([methyl-13C]methionine), and GLU ([1-15N]glutamate). There is a column containing a boolean value for whether SIL incorporation was detected and a column containing an integer number representing how many position enriched by SIL were detected in that condition.

Summary Output Example

Exp_ID	RetTime	PrecMz	PrecZ	ACE_unlabelled	ACE_labelled	ACE_count	PROP_unlabelled	PROP_labelled	PROP_count	MET_unlabelled	MET_labelled	MET_count	GLU_unlabelled	GLU_labelled	GLU_count
SERY_32	2.54	734.4764	1	TRUE	TRUE	4	TRUE	TRUE	7	TRUE	TRUE	4	TRUE	TRUE	1
SERY_25	2.54	716.4649	1	TRUE	TRUE	4	TRUE	TRUE	7	TRUE	TRUE	4	TRUE	TRUE	1
SERY_23	2.54	576.3832	1	TRUE	TRUE	4	TRUE	TRUE	7	TRUE	TRUE	2	TRUE	TRUE	1
SERY_26	2.54	558.3732	1	TRUE	TRUE	4	TRUE	TRUE	7	TRUE	TRUE	2	TRUE	TRUE	1

Visualizing Results in Tableau

These data are generated in a .csv format and can be manually interrogated. We use a program called Tableau to plot the data in the summary file for easier interrogation. Tableau is a data visualization software for which there is a free version available to academics. This step-by-step guide will allow researchers to utilize the IsoAnalyst output file in Tableau to allow for easier data analysis. We found that this program allows for a much easier visualization to compare SIL incorporation across related MS features.

Import .csv file into Tableau.

1_data_import

To set up the data overview, measures (green) and dimensions (blue) on left panel can be dragged to rows and columns

2_blank_sheet

When initially dragging a measure onto the graph, use the dropdown menu to change to a dimension

3_change_dimension

The data overview shown below allows visualization of related masses and their retention times. Vertically aligned MS features can be observed and compared to associate related features based on RT and SIL incorporation.

4_mz_display

Additional measures can be added to the marks toolbar as details which appear when hovering over mz features in the overview graph. Drag the exp_id dimension to the color tab in the mark toolbar to add a unique identifier to every feature.

5_exp_id_colors

Features can then be selected and grouped in sets

6_exp_id_set_eryA

These sets can be used to filter the data in additional worksheets such as the bar graphs indicating SIL incorporation shown below. Extensive tutorials on creating and manipulating worksheets are available on the Tableau website. In particular this article contains information on modifying colors, formatting, and other details to achieve desired visualizations of the data. (https://help.tableau.com/current/pro/desktop/en-us/visual_best_practices.htm)

7_bar_graphs_ery_A