Wednesday, November 3, 2010
ASHG 2010 conference notes - 3 Nov 2010
Notes from ASHG 2010 (American Society of Human Genetics)
3 November 2010
John Rossi (City of Hope National Medical Center) – SNPs in human microRNA genes affect biogenesis and function
miRNAs regulate translation and degradation of mRNAs. Identifying targets of the miRNAs is a major challenge.
Euan Ashley (Stanford University) – What to do with all the sequence data?
Examine the genome of S. Quake with its 6 billion data points.
A rare variants algorithm – tough because a single database does not exist or is private and in varying format. Thus, they use catalogs of common variants for this Patient Zero prototype. With common variants, they need genotype frequencies much more than odds ratio or p-value of association (in the population) when applying population data to the individual.
Dealing with novel variants presents another challenge but some new tools were built by their team (e.g., using SNP-based changes in free energy of RNA folding).
They want to put the genetic risk of the individual in the context of risk for that patient – a 40-yr old White male. For example, he already has a 50% increased risk for obesity given certain non-genetic parameters. It is also necessary to consider environmental risk. Below is an example figure of how such information on risk can be presented to the patient, where the bar indicates how risk changes for this person. In this case, there is an increase in risk of obesity from about 10% to about 60%.
- Data are coming, lots and lots!
- We need to deal with large amounts of data
- Databases need to be reconfigured to facilitate genome interpretation
- Physicians need to learn how to communicate such genetic results with patients
Russ Altman (Stanford University) – Pharmacogenomics
He started with a screenshot of www.pharmgkb.org and used it to highlight a few SNPs relevant to warfarin dosing.
The focus of the talk was to analyze S. Quake’s genome and evaluate ~2500 SNPs and CNVs with pharmacological implications. They used common variants. Within CYP2C19, Quake has a known variant resulting in 50% reduction in metabolizing rate (he’s heterozygous). He then presented a table with column headers of: Drug, Summary, Level of evidence, PMID, Gene, rsID.
Then on to the novel SNPs found in the Quake genome and organized in the same type of table. The focus was on those SNPs that change an amino acid and are predicted to be deleterious, with predicted potential drug impact. He, as a physician, cannot say, “These SNPs have not been studied before and we will ignore the data (on predicted impact).” Instead, acknowledge those SNPs and genes and drugs and go in a different but equivalent direction with regard to advice and treatment.
Job Dekker (University of Massachusetts Medical School) - HiC and higher order folding of the human genome
Started with chromosome 21 to identify higher order organization of the genome. The 5C method was employed to identify millions of chromatin-chromatin interactions across the entire genome. Their finding is genes often become physically close to elements that are 1 to 10 MB away from that gene. This is a long-range distance but mapping to the same chromosome. They have identified some 3000 such examples.
Arend Sidow (Stanford University) – What is the functional fraction of the portion of the variable part of the human genome?
How big is the functional fraction of our total genetic variation? “Our” is a key word: It could relate to population or to a single person or haploid genome. For the amount of total genetic variation, consider derived alleles.
0.5% of haploid genome is deviant – but what fraction is functional?
He used p53 (TP53) as an example with its SNPs and repeats to suggest to him that 10% of variants are functional. They use GERP – genomic evolutionary rate profiling (Cooper 2005 Genome Res). See Davydov (PLoS Comp Biol, in press). That work shows that 225 MB, 7.3% of the genome, is functional.
What is the functional fraction of the variation in human?
0.5% of the genome, 3 million variants. Functional: 3-8%, 300,000 to 1,000,000 bp, with most (~90%) mapping to non-coding sites.
Erin Kaminsky (Emory University) – Towards evidence-based criteria for clinical interpretation of CNVs
15,749 subjects (from 7 different studies) were genotyped for CNVs as were ~10,400 controls. I think the pathology was for neurological disorders. Pathogenic CNVs were identified in ~17% of cases.
She presented a table of CNV deletions at 22q11.2 (found in 93 cases and 0 controls), 15q13.2-q13.3 (epilepsy, 46 cases, 0 controls), 15q11.2-q13.3 (Angelman, 41 cases, 0 controls), 16p11.2 (autism, 67 cases, 5 controls), and 1q21.1 (microcephaly, 55 cases, 3 controls). The group also looked at duplications.
They used p-value to classify the CNV as pathogenic or not. There was nothing like pathway analysis or gene expression data to go along with this.
N. Wasserman – MYC, GWAS for cancer and the nearby gene desert
This region near to MYC is a gene desert but it is a region of regulation (see Wasserman 2010 Genome Res).
How then to identify such long-range regulatory potential? They use BACs (bacterial artificial chromosomes) as enhancer traps!
FTO. The obesity associations fall within a 50-kbp block of LD that includes the last half of intron 1, exon 2 and most of intron 2. Fto-/- mice are smaller and leaner, and have less adipose than control. Thus, tissue-specific upregulation of FTO should lead to the obese condition. The result is enhancers in this 50-kbp region enhance expression in many tissues just like normal Fto (mouse).
They then used 13 different contigs spanning this 50 kbp region to tile across the LD block to find tissue-specific enhancer elements in zebrafish, then to mouse. They found a brain enhancer and then deleted that enhancer from the BAC enhancer trap to show that that small segment is necessary to drive expression in brain.
Jared Maguire (Broad Institute) – Using conditional mutation rate to interpret variation in the genome
They use adjacent bases as an explanation for local variability. They look at 3-mers in the coding sequence but he offered an example of GCG > GTG as a known sequence-context-driven C > T change from CpG islands. (I thought CpG islands were not typically found in coding sequence.)
They look for genes with higher SNP burden than others. No specific genes were given.
M. Eberle (Illumina, Inc) – Illumina NextGen genotype arrays
15-20% increase in the number of common variants based on latest NextGen and 1000G data. Can they build haplotypes? They use 1.4 million SNPs for imputation based on 60 CEPH samples. He thinks this will improve when more samples are added. This process gives 7.7 million total SNPs. Many show concordance. Genotype calls for rare variants are very accurate: Rare variants show similar accuracy to common variants and overall concordance is 99.96%.
Li – Global patterns of RNA editing in humans
RDDs = RNA-DNA differences
Traditional RNA editors are the ADARs (A>I) and APOBECs (C>U). RDDs are not traditional.
RNA preps from 27 CEU B cell samples were sequenced along with the genomic DNA. From the DNA side, they retained only monomorphic sites not in dbSNP, HapMap, 1000G data. From the RNA side, they required greater than 20 reads per position, greater than 20% of those reads with sequence different than the DNA.
They find 3762 (+/-1647) RDD events per subject. Overall, there were 20,753 events in 4507 genes. When requiring that the event/gene be present in more than half the subjects, there were 10,117 events and 3776 events detected in all the subjects.
30.8% of the 101,574 grand total events were A>G or T>C. 19.3% were C>T or G>A. But all others were seen. About 25% of the events are in coding sequence.
What percent of the reads show the RDD? Of all 101,574 events, median level is 97%! These affect splicing. These affect disease susceptibility. These modify disease manifestation. The question remains if these mRNAs are degraded or translated.
J. Knight – Psoriasis susceptibility loci and genetic interaction between HLA-C and ERAP1.
Their GWAS identified many immune system genes. They then looked for pair-wise interactions between SNPs that replicated and those concordant with other studies. They used a dominant model to do this.
M. Hannibel – Identification of a gene involved in Kabuki syndrome
This is a rare syndrome and so they began the search by looking for a SNP in exome data but in HapMap or dbSNP. 78% of 104 kindreds have MLL2 mutations. MLL2 methylates histone H3 on lysine 4, H3K4.