Wednesday, June 19, 2013

GTEx Community Meeting - notes

Yesterday I attended the GTEx (Genotype-Tissue Expression project) Community Meeting held at the Broad Institute in Cambridge, MA, who hosts the GTEx portal. This meeting offered opportunities to GTEx researchers and those scientists not part of this NIH Common Fund program to engage in dialog regarding new aspects of the GTEx project. An overview of the project is here. A main impetus for GTEx is many GWAS signals linking genotype to disease phenotype have a role in regulation of gene expression. Thus, learning more about gene expression can assist in the interpretation of GWAS results. Below are my notes from this one-day meeting.

Simona Volpi – NIH. See for details. Samples are from biobanked tissues. This may make it difficult to engage in challenge experiments. Goal is to establish a database of genotype-gene expression relationships. Goal is to collect from 900 donors. They use PAXgene, alcohol-based fixative for the tissues in 0.2 – 0.5 gram aliquots. Tissue processing includes histopathologic review, FPPE paraffin embedding, RNA extraction.  A U01 RFA seeking application to propose ways to enhance GTEx is being formed and will soon be announced.
  • BMI range for donors is greater than 18.5 and less than 35.
  • Cause of death of donors: 34% cerebrovascular, 13% cardiac, 22% respiratory, 21% from accidents (transportation and non-transportation).
  • RFA-RM-12-009 – eGTEx RFA from NIH – perhaps this is closed. They are working on liberalizing the access policy.
  • Data will be housed in dbGaP. Need to apply for access to get a lot of info, but some basic info is available at the GTEx portal at the Broad.

Kristin Ardlie – LDACC – Laboratory, data analysis and coordinating center. There are 47 tissues: 35 PAXgene tissues + blood + 11 frozen brain sub-regions. Blood is collected and processed pre-mortem. Goal by Jan 2014: 9534 RNA samples from 430+ donors. Goal for RNA-seq is 50 million aligned reads and no less than 15 million. Input is 200 ng total RNA of RNAs w a RIN of 6.0 or higher. RIN = RNA integrity score. Skeletal muscle and lung have high RINs, but pancreas, adipose and other enzyme-rich tissues have lower RINs and quicker decay post-ischemic time (or post-mortem interval). 

Manolis Dermitzakis uses a FDR of 5% based on Storey to identify eQTL SNPs. They use a 1 MB window around TSS but also a 100 kbp window. Using the established 15 PEER factors (to account for population ancestry structure), about 800 or so eQTL for adipose are expected. Overall, there are about 6200 eQTL genes for subcutaneous adipose. Skeletal muscle gets toward 7800 eQTL genes. He urges caution when seeing overlap between eQTL and GWAS hits because these are almost certain as data increase in volume, so take into consideration the effect size (20% increase of mRNA and protein is not the same in terms of biological consequences as a 2-fold increase). Can they borrow power from a “related” tissue to look at a specific tissue? Estimates of tissue sharing are high for adipose; nerve is best at 0.92, artery is 0.91. About 0.56 is the probability for a SNP to be active in all nine tissues. Probability of being active in a single tissue is just 0.03. 

Roderic Guigo. There are about 15000 to 20000 expressed genes in most tissues, with blood less and testis more. Most tissues have 2000 to 3000 expressed lncRNAs, with testis having many more. 3820 genes are expressed in only one tissue. Most genes express about half of annotated isoforms in a given tissue. With two isoforms, the major isoform dominates with 90% of expression of that gene, ~40-50% of expression comes from the dominant mRNA isoform when there are 5 or so isoforms. Splicing QTL, SNPs affecting splicing pattern of the gene, but may or may not affect expression. Their group had to develop software to detect these in the RNA-seq data and also to account for the complex phenotype: isoforms and expression counts.

Mike Weale. Using arrays for eQTL studies can lead to generation of false positives. See Ramasamy et al 2013 Nucl Acids Res for an analysis of probe-dropping with better reference data. Something like 5.5% of probes map to ref genome SNPs but account for 90% of brain eQTL hits. See Trabzuni Hum MolGenet 21:4094 for the famous example of a false positive eQTL for MAPT. Their PiP finder tool is at

Barbara Englehardt. Replication of cis- and trans-eQTL across cell types. Her goal is to predict cis-eQTL as functional SNPs. This method is soon to be out in PLoS Genetics. Study size and replicate arrays account for >95% of the variability in fraction of genes showing an eQTL. They found no false positives when using replicate expression arrays, but false negatives persisted. Replcation strengthened cis-eQTL discovery.

Yaniv Erlich. STRs short tandem repeats of 2-6 bp. Sometimes these occur in promoter regions. Using 1000G data, they saw about 80% of STRs are polymorphic with MAF >1%. Many more STRs in introns than in exons and loss of heterozygozity with populations not of African origin. Looking for effects of STR variation on gene expression, they found 2673 eSTRs, but with replication in orthogonal data (use arrays when original data came from RNA-seq, eg) they found 81% of eSTRs showed the same direction of effect. Were they tagging SNPs? 77% of eSTR had same slope as before when conditioned on most likely cis-eQTL SNP, meaning that they were not tagging SNPs in most cases. They do not see any dose-dependence with the STRs, meaning a length effect on the expression effect. He speculates that the STRs create Z-DNA.

João Fadista. Prediction on individual level genotypes based on solely on GTEx gene expression. Assuming each gene has at least one cis-eQTL and 20,000 genes, there will be 320,000 combinations and this combination or pattern could be used to predict a person’s genotype. His examples will come from the Nordic Network for Islet Transplantation and includes other tissues/organs. 89 islet donors, 61% men, 5.8 ± 0.9% HbA1c levels. Found 136 eQTL. Only 22 of these had genotype prediction data in all GTEx samples, but could be sufficient to predict genotype: 322 is greater than current world population by more than 4-fold. See also work by Eric Schadt (Nat Genet, Bayesian method to predict individual SNP…) on their replication of liver and adipose eQTL. Why do this and jeopardize GTEx, asks M. Dermitzakis, and JF replies that a blood gene expression test combined with eQTL data can predict disease. M. Dermitzakis states that heritability of gene expression is about 0.3 and so predictions of tissue-specific gene expression will be limited.

Stephen Montgomery. He looks at allele-specific expression and GTEx data to discover eQTL. See Wei Sun on TReCASE tool in Biometrics from 2012. Even with 15 million RNA-seq reads, 52.2% of sites have depth less than 30 and so have lowered ability to confidently label as ASE (allele specific expression). Allelic ratios of the mRNAs are highly heritable, as seen by looking at a 3-generation, 17-member family. Such is seen across low and moderately expressed genes. Can ASEs say anything about deleterious variants? Looked at 10 tissues in one 25-yo Chinese male, and looked at deleterious sties (50), loss-of-function (LoF, stop-gain) sites (74) and ? (very few). The LoF variant is lowly expressed across the tissues, as reported by Dan MacArthur. 

Tuuli Lappalainan. Uses December release of GTEx data to look at ASEs. ASEs can recover from under-powered studies to identify eQTL. Master data to be released with upcoming paper: ≥ 8 RNA-seq reads over a site. Most analyses sampling are done to exactly 30 reads in order to avoid coverage issues. Note: only relatively highly expressed, perhaps ubiquitously expressed, genes can be analyzed. She’s examining distribution of allelic effects between individuals and between tissues. She wants to quantify regulatory variation in each tissue by looking across all tissues and samples. Thyroid has a large relative (to other tissues) proportion of cis-eQTL and ASEs unlike other tissues. Her data are progressing to descriptions of proxy tissues for eQTL analysis. She is asking, How likely is a second tissue in the same individual to show ASE? eQTL work is done in populations and now look at the individual and that person’s ASE effect.  Because of wide variation in expression levels and ASE effects across individuals, the variants are not great predictor of individual phenotypes even at the cellular level.

Manuel Rivas. Transcriptome analysis of the functional impact of putative loss of function variants. Looks to annotate exome resequencing data and rare variants to isoforms from RNA-seq. LMNA provides a nice example of a gene with muscle specific mRNA isoforms and thus only these two isoforms should be used in explaining muscle disorders as the other isoforms are not expressed in this tissue.  

Chris Fuller. GWAS variants as eQTL based on analysis of GTEx data. Sherlock is their tool, It uses all GWAS SNPs even those below genome-wide significance. It uses both cis- and trans-eQTL loci. Sherlock maps disease-SNP associations to disease-gene groups. Linkage is very important in this work. The stronger results come from cis-eQTL, as shown by looking at Crohn disease GWAS. He also implicates genes through trans-eQTL data. See Many small GWAS may remain unpublished for lack of strong single-SNP results. Aggregating SNPs boosts statistical power. They then implicate a relevant leukemia gene (FLI1) though multiple (n=6) trans-eQTL SNPs.

Eric Gamazon. GTEx – Expanding on GWAS. Uses the Wellcome Trust Case Control Consortium and their 7 diseases, including CAD. Adipose cis-eQTL are enriched for Crohn disease, CAD, hypertension and rheumatoid arthritis variants. GTEx adipose eQTL improves HOMA-IR GWAS. What proportion of variability in expression is captured by eQTL? He claims that he can capture 30-50% of heritability from genome-wide markers SNPs (> 200,000) for type 1 diabetes and Crohn disease when using a small number of informative cis-eQTL GTEx SNPs (2883 SNPs for T1D, ~3000 for CD). He makes no claims about saying anything about causal variants with this approach. 

Jason Wright. Chasing causal loci: Genome engineering of a non-coding region of 9p21 to identify mechanisms of diabetes predisposition. Enhancer assay of a standard type was used to look at various 2-kbp sequences across the region. Little or no enhancer activity seen until they transfected rodent islet cells. Promoter regions of all three genes (CDKN2A, eg) physically interact with the 10-kbp region containing the risk haplotype. TALEN (TAL effector nucleases) genome engineering gives near isogenic cell lines with or without specific alleles; he did this to delete the entire risk region. It looks as though the 9p21 region affects expression of CDKN2A and not CDKN2B and CDKN2AB-AS by about 20% per risk allele and in cis. He is now trying CRISPR genome engineering to fine map the 10-kbp region.

Daniel MacArthur. Gene expression data and PPIs are used to inform the clinical exome sequencing. IBAS protein-protein interaction score and way to score placement within a PPI network.

Luke Ward. Systematic annotation of GTEx eQTL using ENCODE and Roadmap data. His slide on genetic variant, tissue/cell type, molecular phenotypes (histone methylation, eg) and organismal phenotypes (lipids, heart rate) slide is neat and while outlining potential for a druggable path could also be used to outline a nutrition path to retain and maintain health, as opposed to recovering health. HaploReg ( is the portal where the HepG2 enhancer variants can be found – these are relevant for blood lipids. 

GTEx & NIH panel. Audience participation in terms of data types/fields to make available and discussion of other tissues to sample.

Gad Getz. He gave a recap of the day's talks…

No comments:

Post a Comment