Simona Volpi –
NIH. See commonfund.nih.gov/gtex for details. Samples are from biobanked
tissues. This may make it difficult to engage in challenge experiments. Goal is
to establish a database of genotype-gene expression relationships. Goal is to
collect from 900 donors. They use PAXgene, alcohol-based fixative for the
tissues in 0.2 – 0.5 gram aliquots. Tissue processing includes histopathologic
review, FPPE paraffin embedding, RNA extraction. A U01 RFA seeking application to propose ways
to enhance GTEx is being formed and will soon be announced.
- BMI range for donors is greater than 18.5 and less than 35.
- Cause of death of donors: 34% cerebrovascular, 13% cardiac, 22% respiratory, 21% from accidents (transportation and non-transportation).
- RFA-RM-12-009 – eGTEx RFA from NIH – perhaps this is closed. They are working on liberalizing the access policy.
- Data will be housed in dbGaP. Need to apply for access to get a lot of info, but some basic info is available at the GTEx portal at the Broad.
Kristin Ardlie – LDACC
– Laboratory, data analysis and coordinating center. There are 47 tissues: 35
PAXgene tissues + blood + 11 frozen brain sub-regions. Blood is collected and
processed pre-mortem. Goal by Jan 2014: 9534 RNA samples from 430+ donors. Goal
for RNA-seq is 50 million aligned reads and no less than 15 million. Input is
200 ng total RNA of RNAs w a RIN of 6.0 or higher. RIN = RNA integrity score. Skeletal
muscle and lung have high RINs, but pancreas, adipose and other enzyme-rich
tissues have lower RINs and quicker decay post-ischemic time (or post-mortem
interval).
Manolis Dermitzakis
uses a FDR of 5% based on Storey to identify eQTL SNPs. They use a 1 MB window around TSS but also a
100 kbp window. Using the established 15 PEER factors (to account for
population ancestry structure), about 800 or so eQTL for adipose are expected.
Overall, there are about 6200 eQTL genes for subcutaneous adipose.
Skeletal muscle gets toward 7800 eQTL genes. He urges caution when seeing overlap
between eQTL and GWAS hits because these are almost certain as data increase in
volume, so take into consideration the effect size (20% increase of mRNA and protein
is not the same in terms of biological consequences as a 2-fold increase). Can
they borrow power from a “related” tissue to look at a specific tissue? Estimates of tissue
sharing are high for adipose; nerve is best at 0.92, artery is 0.91. About 0.56
is the probability for a SNP to be active in all nine tissues. Probability of
being active in a single tissue is just 0.03.
Roderic Guigo.
There are about 15000 to 20000 expressed genes in most tissues, with blood less
and testis more. Most tissues have 2000 to 3000 expressed lncRNAs, with testis
having many more. 3820 genes are expressed in only one tissue. Most genes express about half of annotated isoforms in a given tissue. With two isoforms, the major
isoform dominates with 90% of expression of that gene, ~40-50% of expression
comes from the dominant mRNA isoform when there are 5 or so isoforms. Splicing QTL,
SNPs affecting splicing pattern of the gene, but may or may not affect
expression. Their group had to develop software to detect these in the RNA-seq
data and also to account for the complex phenotype: isoforms and expression
counts.
Mike Weale. Using arrays for eQTL studies can lead to generation of
false positives. See Ramasamy et al 2013 Nucl Acids Res for an analysis of
probe-dropping with better reference data. Something like 5.5% of probes map to
ref genome SNPs but account for 90% of brain eQTL hits. See Trabzuni Hum MolGenet 21:4094 for the famous example of a false positive eQTL for MAPT. Their PiP finder tool is at
bitly.com/pipfinder.
Barbara Englehardt.
Replication of cis- and trans-eQTL across cell types. Her goal is to predict
cis-eQTL as functional SNPs. This method is soon to be out in PLoS Genetics.
Study size and replicate arrays account for >95% of the variability in
fraction of genes showing an eQTL. They found no false positives when using
replicate expression arrays, but false negatives persisted. Replcation
strengthened cis-eQTL discovery.
Yaniv Erlich.
STRs short tandem repeats of 2-6 bp. Sometimes these occur in promoter regions.
Using 1000G data, they saw about 80% of STRs are polymorphic with MAF >1%.
Many more STRs in introns than in exons and loss of heterozygozity with
populations not of African origin. Looking for effects of STR variation on gene
expression, they found 2673 eSTRs, but with replication in orthogonal data (use
arrays when original data came from RNA-seq, eg) they found 81% of eSTRs showed
the same direction of effect. Were they tagging SNPs? 77% of eSTR had same
slope as before when conditioned on most likely cis-eQTL SNP, meaning that they
were not tagging SNPs in most cases. They do not see any dose-dependence with
the STRs, meaning a length effect on the expression effect. He speculates that
the STRs create Z-DNA.
João Fadista.
Prediction on individual level genotypes based on solely on GTEx gene
expression. Assuming each gene has at least one cis-eQTL and 20,000 genes,
there will be 320,000 combinations and this combination or pattern
could be used to predict a person’s genotype. His examples will come from the
Nordic Network for Islet Transplantation and includes other tissues/organs. 89
islet donors, 61% men, 5.8 ± 0.9% HbA1c levels. Found 136 eQTL. Only 22 of
these had genotype prediction data in all GTEx samples, but could be sufficient
to predict genotype: 322 is greater than current world population by
more than 4-fold. See also
work by Eric Schadt (Nat Genet, Bayesian method to predict individual SNP…) on
their replication of liver and adipose eQTL. Why do this and jeopardize GTEx,
asks M. Dermitzakis, and JF replies that a blood gene expression test combined
with eQTL data can predict disease. M. Dermitzakis states that heritability of
gene expression is about 0.3 and so predictions of tissue-specific gene
expression will be limited.
Stephen Montgomery.
He looks at allele-specific expression and GTEx data to discover eQTL. See Wei
Sun on TReCASE tool in Biometrics from 2012. Even with 15 million RNA-seq
reads, 52.2% of sites have depth less than 30 and so have lowered ability to
confidently label as ASE (allele specific expression). Allelic ratios of the
mRNAs are highly heritable, as seen by looking at a 3-generation, 17-member
family. Such is seen across low and moderately expressed genes. Can ASEs say
anything about deleterious variants? Looked at 10 tissues in one 25-yo Chinese
male, and looked at deleterious sties (50), loss-of-function (LoF, stop-gain) sites
(74) and ? (very few). The LoF variant is lowly expressed across the
tissues, as reported by Dan MacArthur.
Tuuli Lappalainan.
Uses December release of GTEx data to look at ASEs. ASEs can recover from
under-powered studies to identify eQTL. Master data to be released with
upcoming paper: ≥ 8 RNA-seq reads over a site. Most analyses sampling are done
to exactly 30 reads in order to avoid coverage issues. Note: only relatively
highly expressed, perhaps ubiquitously expressed, genes can be analyzed. She’s
examining distribution of allelic effects between individuals and between
tissues. She wants to quantify regulatory variation in each tissue by looking
across all tissues and samples. Thyroid has a large relative (to other tissues)
proportion of cis-eQTL and ASEs unlike other tissues. Her data are progressing
to descriptions of proxy tissues for eQTL analysis. She is asking, How likely
is a second tissue in the same individual to show ASE? eQTL work is done in
populations and now look at the individual and that person’s ASE effect. Because of wide variation in expression
levels and ASE effects across individuals, the variants are not great predictor
of individual phenotypes even at the cellular level.
Manuel Rivas. Transcriptome
analysis of the functional impact of putative loss of function variants. Looks
to annotate exome resequencing data and rare variants to isoforms from RNA-seq.
LMNA provides a nice example of a
gene with muscle specific mRNA isoforms and thus only these two isoforms should
be used in explaining muscle disorders as the other isoforms are not expressed
in this tissue.
Eric Gamazon. GTEx
– Expanding on GWAS. Uses the Wellcome Trust Case Control Consortium and their
7 diseases, including CAD. Adipose cis-eQTL are enriched for Crohn disease,
CAD, hypertension and rheumatoid arthritis variants. GTEx adipose eQTL improves HOMA-IR GWAS. What proportion
of variability in expression is captured by eQTL? He claims that he
can capture 30-50% of heritability from genome-wide markers SNPs (> 200,000)
for type 1 diabetes and Crohn disease when using a small number of informative
cis-eQTL GTEx SNPs (2883 SNPs for T1D, ~3000 for CD). He makes no claims about
saying anything about causal variants with this approach.
Jason Wright. Chasing
causal loci: Genome engineering of a non-coding region of 9p21 to identify
mechanisms of diabetes predisposition. Enhancer assay of a standard type was
used to look at various 2-kbp sequences across the region. Little or no
enhancer activity seen until they transfected rodent islet cells. Promoter
regions of all three genes (CDKN2A, eg) physically interact with the 10-kbp
region containing the risk haplotype. TALEN (TAL effector nucleases) genome
engineering gives near isogenic cell lines with or without specific alleles; he
did this to delete the entire risk region. It looks as though
the 9p21 region affects expression of CDKN2A
and not CDKN2B and CDKN2AB-AS by about 20% per risk allele
and in cis. He is now trying CRISPR genome engineering to fine map the 10-kbp
region.
Daniel MacArthur.
Gene expression data and PPIs are used to inform the clinical exome sequencing.
IBAS protein-protein interaction score and way to score placement within a PPI
network.
Luke Ward. Systematic
annotation of GTEx eQTL using ENCODE and Roadmap data. His slide on genetic
variant, tissue/cell type, molecular phenotypes (histone methylation, eg) and
organismal phenotypes (lipids, heart rate) slide is neat and while outlining potential for a
druggable path could also be used to outline a nutrition path to retain and
maintain health, as opposed to recovering health. HaploReg (http://nar.oxfordjournals.org/content/40/D1/D930.short)
is the portal where the HepG2 enhancer variants can be found – these are
relevant for blood lipids.
GTEx & NIH panel.
Audience participation in terms of data types/fields to make available and discussion of other tissues to sample.
Gad Getz. He gave a recap of the day's talks…
No comments:
Post a Comment