Link Search Menu Expand Document

Introduction

Objective

The core objective of the NP Analyst platform is to identify bioactive molecules directly from complex mixtures.

The platform works by using bioassay data from extract libraries in conjunction with MS-based metabolomics data on the same set of extracts to determine which MS features correlate with specific biological phenotypes. The platform is assay-independent, meaning that it will work with assay data from any platform, provided that the results format is either numerical (e.g. percent growth) or binary (i.e. hit/no hit, live/ dead etc.). The platform returns both tables of MS features with predicted activity values and network graphs of extracts and bioactive MS features.

Design

NP Analyst is designed to work with data from extract libraries ranging in size from a minimum of 10 samples to over 1,000 samples in a single analysis. Biological data can be from any assay type. Inclusion of data from multiple assays is both supported and encouraged, and improves prediction accuracy over assay data from small numbers of assays.

To do this, untargeted metabolomics MS data are first replicate-compared and feature aligned to create a list of all unique MS features in the sample set. This can be done either in-line by uploading mzML files, or offline using functionality in either MZmine or GNPS. The bioassay data are then used to calculate the magnitude and consistency of biological assay profiles for samples containing each MS feature. Finally these scores are used to filter the MS feature list to retain only features of biological interest. Results are exported in both table and graph formats to enable lead selection and prioritization.

Because NP Analyst predicts biological activities based on MS feature distributions in the sample set, prediction accuracy improves as the sample set increases in size. A set of 100 samples will afford more accurate predictions than a set containing just 10 samples. Similarly, a set derived from related organisms (e.g. Streptomyces) will perform better than a sample set derived from lots of different classes of source organisms. Finally, including multiple bioassay results in the bioassay file increases the resolution of the biological profiles; bioassay files containing a single data column have low resolving power and often yield higher false positive rates than analyses with higher numbers of bioassay readouts.


Table of contents