Friday, November 16, 2012

Paper of the week: Visualizing associations between paired data sets

This week's paper of the week is by González, et al., entitled "Visualising associations between paired `omics' data sets," and published in BioData Mining (vol 5:19). The pdf of this report can be found here.

The authors demonstrate that graphical outputs such as Correlation Circle plots, Relevance Networks and Clustered Image Maps are useful in the visualization and interpretation of output from integrative analysis tools. The goal is to facilitate an understanding of systems as a whole when complex data often force donning of blinders to not observe the whole forest.

The graphical tools described in the report are implemented in the freely available R package mixOmics and in its associated web application.

As an example of what the authors have built, consider their presentation of Nutrimouse data showing correlations (or not) between between large data sets, in case gene expression and metabolite levels in liver, as taken from their figure 5.The Nutrimouse data are from a nutrigenomic study in which 40 mice from two genotypes (wild-type and Ppara -/-) were fed five diets with different fatty acid compositions. Details are in the Methods section. Expression of 120 genes in liver cells was obtained with microarrays and concentrations of 21 hepatic fatty acids were measured by gas chromatography. Hence, the data matrices are of size (40 × 120) for the gene expression and (40 × 120) for the fatty acids measurements.

The Authors write: The Correlation Circle plot (above) displays all fatty acids and the genes selected on each component (100 in total in this plot). Highlighted are subsets of variables important in defining each component. For example, C18:2ω6, C20:2ω6 and C16:0 are fatty acids for which variation allows the definition of the sPLS component 2 (top and bottom of the y-axis). Similarly, genes such as Car1, Acoth, Siat4c, Scarb1 (SR.BI) and Slc10a1 (Ntcp, or Ntop [sic]) are positively correlated to each other, and to the fatty acid C16:1ω9 and their variation participate in defining the sPLS component 1 (left-hand side of the x-axis).

I find such analysis and depiction of results useful and look forward to trying this with our GWAS data.

1 comment:

  1. Matrix Management is a compelling buzzword with a tempting nirvana of shared resources and unlimited access to expertise that lies in other functional areas. integrated assignment