Data Visualization

Scatter Plot View

Overview

The scatter plot displays bioactive MS features in a plot of retention time (x-axis) vs. m/z ratio (y-axis). The color scheme of the nodes indicates their Cluster Score values (blue = negative, red = positive). The diameter of each node indicates its relative Activity Score (larger nodes = higher Activity Scores).

Data Filtration

Data in the scatter plot can be filtered in two ways. Below the plot on the left are a set of sliders and data entry boxes that allow users to dynamically modify cutoffs for key values. For example, increasing the Minimum Cluster Score value removes MS features with inconsistent biological profiles. Alternatively, reducing the Maximum Feature Count slider removes features that are present with high frequency in the dataset (e.g. background contaminants or primary metabolites).

Alternatively, data can be filtered to include or exclude specific samples using the check box list to the right below the scatter plot. If specific samples from the sample set are of particular interest then these samples can be selected and the include/exclude slider set to ‘include’. This will remove all MS features in the plot that are not present in the selected samples. These two filtration methods can be used together to (for example) highlight strongly bioactive features from a specific subset of samples in the sample set.

Data Interpretation

Often, molecules in sample sets generate more than one MS feature in the metabolomics data. These features derive from different adducts (e.g. [M+H]⁺, [M+Na]⁺), different charge states ([M+2H]2⁺), and in-source fragments ([M-H2O+H]⁺). Each of these different features will possess a different m/z value, but the same retention time. In addition, each of them should possess similar Activity and Cluster Scores, given that they all derive from the same molecule. In the scatter plot these sets of features appear as vertical stripes of features, and can be a useful marker for identifying compounds of interest for further evaluation.

Network View

Overview

The network view contains nodes for both samples and bioactive MS features, connected by edges denoting presence of a given MS feature in a given sample. These networks form communities based on the presence of shared biologically active MS features. Unlike other networking tools (e.g. molecular networking) that group nodes based on chemical similarity, communities in the NP Analyst network view are formed between samples that share MS features above the defined Activity and Cluster Score cutoffs. Consequently, the network view is most valuable when limited to MS features with strong Activity and Cluster Scores. Inclusion of large numbers of low activity features can create ‘hairball’ networks that obscure the biologically relevant MS features.

Network Styling

The default network layout is Force Spring. By default, samples are represented by dark grey nodes with sample labels displayed at all zoom levels. MS features are displayed as colored nodes with the same styling as the Scatter Plot (color of the nodes indicates their Cluster Score values (blue = negative, red = positive). The diameter of each node indicates its relative Activity Score (larger nodes = higher Activity Scores)). Edges are bidirectional and unweighted. Metadata for MS nodes is accessible via a popup box by hovering over the node of interest. The network color scheme can be toggled to community membership by toggling the Community color scheme slider.

Community Detection

Communities are detected using the Louvain community detection algorithm in the NetworkX library in Python.

Data Interpretation

Samples that are related by the presence of shared bioactive molecules will form communities in the network view. If several samples are connected by one or more MS features, this is either because one molecule generates several adducts or fragments (i.e. several MS features for one compound) or because several bioactive molecules are present as a compound family (one or more MS features from multiple compounds). In both cases, users should take the retention time and m/z values for these priority MS features to examine their original MS data for the presence of HPLC peaks corresponding to bioactive analytes. Alternatively, users can note the related sample codes and then navigate to the Scatter Plot view and filter for the samples of interest in this view to highlight priority MS features.

Unlike the scatter plot, nodes cannot be filtered based on Activity or Cluster Scores in the network view. For advanced filtering analyses we recommend that users download the network graphML file from the downloads page and perform these analyses in a local network analysis tool such as Cytoscape or Gephi.

Community View

The community view includes four plots containing all of the data for a specific community from the full network. These plots are:

Community Network

A network view illustrating the samples and MS features present in a specific community

Scatter Plot

A plot of retention time vs. m/z ratio for bioactive MS features. Mirrors the styling for the main Scatter Plot

Activity Plot

Plot of Cluster Score (x-axis) vs. Activity Score (y-axis). Color scheme mirrors the styling for the main Scatter Plot. Activity Heatmap of original bioactivity file values for all samples in the community are shown in addition.

Data Interpretation

In principle, each community should contain samples with a discrete and consistent biological phenotype. Sometimes, due to the method used for community detection, multiple phenotypes can be included in a single community. Reviewing the heatmap to assess phenotype consistency, and then examining the MS features that link samples with consistent biological signatures is a powerful strategy for directly highlighting bioactive MS features for isolation/ dereplication efforts.

Visualization in Cytoscape

NP Analyst creates .graphml files that can easily be imported into the Cytoscape or Gephi environment. These external tools enable in-depth analysis and modification of the retrieved network. In order to allow the user to continue the network analysis offline, a .xml style file is provided here (tested with Cytoscape 3.8.2). This style will colorize the feature/basket nodes according to the assigned Cluster score and adjust the node diameter according to the Activity score.

In Cytoscape, under File -> Import -> Styles from File… the downloadble style file can be selected. In the Style branch the loaded style can be selected, using the dropdown menu.

Since this Cytoscape style was created for the BioMAP dataset, the range for the expected Activity Score changes and therefore the node size range has to be adjusted in Cytoscape. In order to adjust the size range, the Size option in the Style branch of Cytoscape has to be selected. Clicking on the Current Mapping plot opens the Continuous Mapping Editor for Node Size window. The range can be adjusted using the Set Min and Max… option. If users want to colorize the nodes by the assigned community, The Fill Color paramater needs to be changed. By right-clicking on the parameter, the current fill coloring scheme can be reseted. Afterwards, the user needs to select ‘community’ as the Column, and select ‘Discrete Mapping’ as the Mapping Type. Right-clicking on the appearing empty cells will open a menu, where ‘Mapping Value Generators’ needs to be selected, followed by any desired color scheme (eg. ‘Rainbow’).

Download NP Analyst Cytoscape style