Notes from ASHG 2010 (American Society of Human Genetics)
Washington, D.C. 4 November 2010
A Goldstein – Challenges to identification of high-risk alleles
High-risk alleles are rare to very rare and typically have a penetrance greater than 5.
Challenges to finding high-risk alleles
There really is no major high-risk gene
Lack of power or informativeness
Underlying complexity of genetics
Clinical and epidemiological heterogeneity and/or misclassification
Follow-up of linkage results
Illustrations of challenges
BRCA1 – 10% of risk of breast cancer
BRCA2 – 12% of risk of breast cancer
Existence of a "BRCA3" with high-risk is rather unlikely
CDKN2A/ARF – ~20% risk for melanoma
CDK4 – ~1% risk for melanoma
So, increase power of the study. Better use or incorporate:
Molecular genetic data
Functional genomics data
Epidemiological and clinical data
New technology may help – such as NextGen sequencing
J. Bailey-Wilson – Complex traits really are complex
Major environmental risk factors may be common
Major genetic risk alleles for serious diseases tend to be rare in population
- Due to selection
- A major locus may have many “risk” alleles
She offers breast cancer as a model. Traditional approaches identified BRCA1 and BRCA2, but then came GWAS.
Linkage is very powerful to detect high penetrance risk alleles in families. Association is very powerful to detect common risk alleles but – if each family has a different, rare or private allele/variant, association will not succeed.
Why has “the gene” not been found?
- False positive linkage
- Have the right gene but don’t understand it yet
- Haven’t yet sequenced fully the region defined by the linkage study
- It is not a gene but a regulatory region
- Could be a long, non-coding RNA
- MicroRNAs and intronic variants, too
Synonymous variants are interesting – change the kinetics of translation!
She is hopeful that more sequencing will be done under broad linkage peaks. But need to phenotype well to fully test for GxE influence.
E. Wijsman – Cardiovascular QTLs and large pedigrees
They are looking at familial combined hyperlipidemia (FCHL) in 4 families with 253 subjects. They looked at 600 STRs and 48K SNPs on CVD chip. The phenotype of choice is plasma APOB. For plasma APOB levels, they noted a LOD score of 3.1 on chromosome 4.
Across this large APOB linkage peak, they used each SNP as a covariate to see which one(s) abolish the peak. Then, which gene? Do exome sequencing. All this identified a SNP in LRBP but direct genotyping of the entire pedigree brought the variance from 0.4 to ~0.18 – killed it. So, need to generate many candidate variants for quick screening by genotyping the entire pedigree – because finding one SNP and testing it in a one-by-one manner is not efficient.
The exome data may identify a haplotype which extends to the non-exome.
N. Camp – Analytical strategies to identify rare risk variants using extended high-risk pedigrees
They use Utah family data: 2.2 million individuals over three to eleven gnerations, with hospital records.
J. Degner – Using genome-wide sensitivity data to infer transcription factor binding
Transcription factor binding sites (TFBS) are poorly annotated. They use ENCODE’s DNase I data. See http://centipede.uchicago.edu for their tool – it uses 230 position weight matrices, 800,000 sites. They also have an article in press at Genome Research. So, use this to check GWAS hits. An example is a binding site QTL for PEBPI.
I Aneas – What are the downstream targets of Tbx20?
- differential expression in Tbx20 wildtype vs knockout mice, in heart tissue
- ChIP-seq data from embryo gives 2000 binding sites, from adult gives 4000 binding sites
Combining the above gives 2000 genes. This set is enriched for ion transport and calcium homeostasis functions.
A Letourneau – Effect of trisomy 21 on gene expression
They used a twin study – monozygotic twins where one is trisomic for Chr21 and the other not. Many genes on Chr21 and elsewhere in the genome show differential expression. Many Chr21 genes show >1.5-fold increase in expression for trisomic:normal comparison. 58 genes show Chr21-trisomy-specific alternate splicing. [LP: This has got to be a harbinger of what is possible with careful analysis of the effect of CNVs.]
T. Teslovich – Sequencing of 400 cases, 200 controls at 26 genes for type 2 diabetes
Goal: Identify rare variants in genes implicated by GWAS.
To date, the most interesting finding is GCKR variant E584X (stop codon). In study #1, the minor allele frequency (MAF) was 0.56% in cases and 0.80% in controls. In study #2, the MAF was 0.08% in cases and 0.15% in controls. (I missed values for study #3.) The point here is one of where the differences in allele frequencies are not significant. So, go to the Metabolo-chip with 14,000 cases and 17,000 controls. This is on-going…
H. Daoud – Exome sequencing in ALS families
Six candidate genes were identified that are shared in two ALS families, but none are shared in three families. This is indicative of the heterogeneity of ALS.
D. MacArthur – Loss-of-function mutations in healthy human genomes
LOF is a premature stop, splice site disruption, small indel leading to a frameshift, others.
Data from the 1000G pilot:
- 1088 stop SNPs
- 643 splice disruptors
- 956 small (< 40 bp) frameshift indels
- 147 genes disrupted by large indels
Implication is each person has many of these types of variant. ~25% (453 of ~1743) LOF variants did not pass manual validation. OK, so a few of these LOF variants actually are from RefSeq errors and gene model errors. Gene models will be corrected in the next release of Gencode so that subsequent clinical sequencing won’t have to deal with this. In other words, there will be no error.
The estimate is there are ~140 true LOF variants per individual and about 35 or these are homozygous.