[Report] Mutations in the neverland Gene Turned Drosophila pachea into an Obligate Specialist Species: A few changes made the fly Drosophila pachea reliant on the steroid precursors produced by the senita cactus.
Authors: Michael Lang, Sophie Murat, Andrew G. Clark, Géraldine Gouppil, Catherine Blais, Luciano M. Matzkin, Émilie Guittard, Takuji Yoshiyama-Yanagawa, Hiroshi Kataoka, Ryusuke Niwa, René Lafont, Chantal Dauphin-Villemant, Virginie Orgogozo
Friday, September 28, 2012
Robust 4C-seq data analysis to screen for regulatory DNA interactions
Robust 4C-seq data analysis to screen for regulatory DNA interactions:
Robust 4C-seq data analysis to screen for regulatory DNA interactions
Nature Methods 9, 969 (2012).
doi:10.1038/nmeth.2173
Authors: Harmen J G van de Werken, Gilad Landan, Sjoerd J B Holwerda, Michael Hoichman, Petra Klous, Ran Chachik, Erik Splinter, Christian Valdes-Quezada, Yuva Öz, Britta A M Bouwman, Marjon J A M Verstegen, Elzo de Wit, Amos Tanay & Wouter de Laat
Regulatory DNA elements can control the expression of distant genes via physical interactions. Here we present a cost-effective methodology and computational analysis pipeline for robust characterization of the physical organization around selected promoters and other functional elements using chromosome conformation capture combined with high-throughput sequencing (4C-seq). Our approach can be multiplexed and routinely integrated with other functional genomics assays to facilitate physical characterization of gene regulation.
Robust 4C-seq data analysis to screen for regulatory DNA interactions
Nature Methods 9, 969 (2012).
doi:10.1038/nmeth.2173
Authors: Harmen J G van de Werken, Gilad Landan, Sjoerd J B Holwerda, Michael Hoichman, Petra Klous, Ran Chachik, Erik Splinter, Christian Valdes-Quezada, Yuva Öz, Britta A M Bouwman, Marjon J A M Verstegen, Elzo de Wit, Amos Tanay & Wouter de Laat
Regulatory DNA elements can control the expression of distant genes via physical interactions. Here we present a cost-effective methodology and computational analysis pipeline for robust characterization of the physical organization around selected promoters and other functional elements using chromosome conformation capture combined with high-throughput sequencing (4C-seq). Our approach can be multiplexed and routinely integrated with other functional genomics assays to facilitate physical characterization of gene regulation.
Asymmetrically Modified Nucleosomes
Asymmetrically Modified Nucleosomes: Philipp Voigt, Gary LeRoy, William J. Drury, Barry M. Zee, Jinsook Son, David B. Beck, Nicolas L. Young, Benjamin A. Garcia, Danny Reinberg.
Mononucleosomes, the basic building blocks of chromatin, contain two copies of each core histone. The associated posttranslational modifications regulate essential chromatin-dependent processes, y....
Mononucleosomes, the basic building blocks of chromatin, contain two copies of each core histone. The associated posttranslational modifications regulate essential chromatin-dependent processes, y....
Ghost Loci Imply Hox and ParaHox Existence in the Last Common Ancestor of Animals
Ghost Loci Imply Hox and ParaHox Existence in the Last Common Ancestor of Animals: Olivia Mendivil Ramos, Daniel Barker, David E.K. Ferrier. Hox genes are renowned for patterning animal development, with widespread roles in developmental gene regulation. Despite this importance, their evolutionary origin remains obscure, due to absence....
Foxp3 Exploits a Pre-Existent Enhancer Landscape for Regulatory T Cell Lineage Specification
Foxp3 Exploits a Pre-Existent Enhancer Landscape for Regulatory T Cell Lineage Specification: Robert M. Samstein, Aaron Arvey, Steven Z. Josefowicz, Xiao Peng, Alex Reynolds, Richard Sandstrom, Shane Neph, Peter Sabo, Jeong M. Kim, Will Liao, Ming O. Li, Christina Leslie, John A. Stamatoyannopoulos, Alexander Y. Rudensky.
Regulatory T (Treg) cells, whose identity and function are defined by the transcription factor Foxp3, are indispensable for immune homeostasis. It is unclear whether Foxp3 exerts its Treg lineage ....
Regulatory T (Treg) cells, whose identity and function are defined by the transcription factor Foxp3, are indispensable for immune homeostasis. It is unclear whether Foxp3 exerts its Treg lineage ....
Thursday, September 27, 2012
Dynamics of transcription driven by the tetA promoter, one event at a time, in live Escherichia coli cells
Dynamics of transcription driven by the tetA promoter, one event at a time, in live Escherichia coli cells:
In Escherichia coli, tetracycline prevents translation. When subject to tetracycline, E. coli express TetA to pump it out by a mechanism that is sensitive, while fairly independent of cellular metabolism. We constructed a target gene, PtetA-mRFP1-96BS, with a 96 MS2-GFP binding site array in a single-copy BAC vector, whose expression is controlled by the tetA promoter. We measured the in vivo kinetics of production of individual RNA molecules of the target gene as a function of inducer concentration and temperature. From the distributions of intervals between transcription events, we find that RNA production by PtetA is a sub-Poissonian process. Next, we infer the number and duration of the prominent sequential steps in transcription initiation by maximum likelihood estimation. Under full induction and at optimal temperature, we observe three major steps. We find that the kinetics of RNA production under the control of PtetA, including number and duration of the steps, varies with induction strength and temperature. The results are supported by a set of logical pairwise Kolmogorov-Smirnov tests. We conclude that the expression of TetA is controlled by a sequential mechanism that is robust, whereas sensitive to external signals.
In Escherichia coli, tetracycline prevents translation. When subject to tetracycline, E. coli express TetA to pump it out by a mechanism that is sensitive, while fairly independent of cellular metabolism. We constructed a target gene, PtetA-mRFP1-96BS, with a 96 MS2-GFP binding site array in a single-copy BAC vector, whose expression is controlled by the tetA promoter. We measured the in vivo kinetics of production of individual RNA molecules of the target gene as a function of inducer concentration and temperature. From the distributions of intervals between transcription events, we find that RNA production by PtetA is a sub-Poissonian process. Next, we infer the number and duration of the prominent sequential steps in transcription initiation by maximum likelihood estimation. Under full induction and at optimal temperature, we observe three major steps. We find that the kinetics of RNA production under the control of PtetA, including number and duration of the steps, varies with induction strength and temperature. The results are supported by a set of logical pairwise Kolmogorov-Smirnov tests. We conclude that the expression of TetA is controlled by a sequential mechanism that is robust, whereas sensitive to external signals.
Wednesday, September 26, 2012
Genomic analysis of a key innovation in an experimental Escherichia coli population
Genomic analysis of a key innovation in an experimental Escherichia coli population:
Genomic analysis of a key innovation in an experimental Escherichia coli population
Nature 489, 7417 (2012). doi:10.1038/nature11514
Authors: Zachary D. Blount, Jeffrey E. Barrick, Carla J. Davidson & Richard E. Lenski
Evolutionary novelties have been important in the history of life, but their origins are usually difficult to examine in detail. We previously described the evolution of a novel trait, aerobic citrate utilization (Cit+), in an experimental population of Escherichia coli. Here we
Genomic analysis of a key innovation in an experimental Escherichia coli population
Nature 489, 7417 (2012). doi:10.1038/nature11514
Authors: Zachary D. Blount, Jeffrey E. Barrick, Carla J. Davidson & Richard E. Lenski
Evolutionary novelties have been important in the history of life, but their origins are usually difficult to examine in detail. We previously described the evolution of a novel trait, aerobic citrate utilization (Cit+), in an experimental population of Escherichia coli. Here we
Tuesday, September 25, 2012
Chromatin conformation governs TCR segment usage [Immunology]
Chromatin conformation governs TCR segment usage [Immunology]: T cells play fundamental roles in adaptive immunity, relying on a diverse repertoire of T-cell receptor (TCR) α and β chains. Diversity of the TCR β chain is generated in part by a random yet intrinsically biased combinatorial rearrangement of variable (Vβ), diversity (Dβ), and joining (Jβ) gene segments. The...
Thursday, September 20, 2012
Coevolution within and between Regulatory Loci Can Preserve Promoter Function Despite Evolutionary Rate Acceleration
Coevolution within and between Regulatory Loci Can Preserve Promoter Function Despite Evolutionary Rate Acceleration:
by Antoine Barrière, Kacy L. Gordon, Ilya Ruvinsky
Phenotypes that appear to be conserved could be maintained not only by strong purifying selection on the underlying genetic systems, but also by stabilizing selection acting via compensatory mutations with balanced effects. Such coevolution has been invoked to explain experimental results, but has rarely been the focus of study. Conserved expression driven by the unc-47 promoters of Caenorhabditis elegans and C. briggsae persists despite divergence within a cis-regulatory element and between this element and the trans-regulatory environment. Compensatory changes in cis and trans are revealed when these promoters are used to drive expression in the other species. Functional changes in the C. briggsae promoter, which has experienced accelerated sequence evolution, did not lead to alteration of gene expression in its endogenous environment. Coevolution among promoter elements suggests that complex epistatic interactions within cis-regulatory elements may facilitate their divergence. Our results offer a detailed picture of regulatory evolution in which subtle, lineage-specific, and compensatory modifications of interacting cis and trans regulators together maintain conserved gene expression patterns.
by Antoine Barrière, Kacy L. Gordon, Ilya Ruvinsky
Phenotypes that appear to be conserved could be maintained not only by strong purifying selection on the underlying genetic systems, but also by stabilizing selection acting via compensatory mutations with balanced effects. Such coevolution has been invoked to explain experimental results, but has rarely been the focus of study. Conserved expression driven by the unc-47 promoters of Caenorhabditis elegans and C. briggsae persists despite divergence within a cis-regulatory element and between this element and the trans-regulatory environment. Compensatory changes in cis and trans are revealed when these promoters are used to drive expression in the other species. Functional changes in the C. briggsae promoter, which has experienced accelerated sequence evolution, did not lead to alteration of gene expression in its endogenous environment. Coevolution among promoter elements suggests that complex epistatic interactions within cis-regulatory elements may facilitate their divergence. Our results offer a detailed picture of regulatory evolution in which subtle, lineage-specific, and compensatory modifications of interacting cis and trans regulators together maintain conserved gene expression patterns.
Set2 methylation of histone H3 lysine 36 suppresses histone exchange on transcribed genes
Set2 methylation of histone H3 lysine 36 suppresses histone exchange on transcribed genes:
Set2 methylation of histone H3 lysine 36 suppresses histone exchange on transcribed genes
Nature 489, 7416 (2012). doi:10.1038/nature11326
Authors: Swaminathan Venkatesh, Michaela Smolle, Hua Li, Madelaine M. Gogol, Malika Saint, Shambhu Kumar, Krishnamurthy Natarajan & Jerry L. Workman
Set2-mediated methylation of histone H3 at Lys 36 (H3K36me) is a co-transcriptional event that is necessary for the activation of the Rpd3S histone deacetylase complex, thereby maintaining the coding region of genes in a hypoacetylated state. In the absence of Set2, H3K36 or Rpd3S acetylated histones accumulate on open reading frames (ORFs), leading to transcription initiation from cryptic promoters within ORFs. Although the co-transcriptional deacetylation pathway is well characterized, the factors responsible for acetylation are as yet unknown. Here we show that, in yeast, co-transcriptional acetylation is achieved in part by histone exchange over ORFs. In addition to its function of targeting and activating the Rpd3S complex, H3K36 methylation suppresses the interaction of H3 with histone chaperones, histone exchange over coding regions and the incorporation of new acetylated histones. Thus, Set2 functions both to suppress the incorporation of acetylated histones and to signal for the deacetylation of these histones in transcribed genes. By suppressing spurious cryptic transcripts from initiating within ORFs, this pathway is essential to maintain the accuracy of transcription by RNA polymerase II.
Set2 methylation of histone H3 lysine 36 suppresses histone exchange on transcribed genes
Nature 489, 7416 (2012). doi:10.1038/nature11326
Authors: Swaminathan Venkatesh, Michaela Smolle, Hua Li, Madelaine M. Gogol, Malika Saint, Shambhu Kumar, Krishnamurthy Natarajan & Jerry L. Workman
Set2-mediated methylation of histone H3 at Lys 36 (H3K36me) is a co-transcriptional event that is necessary for the activation of the Rpd3S histone deacetylase complex, thereby maintaining the coding region of genes in a hypoacetylated state. In the absence of Set2, H3K36 or Rpd3S acetylated histones accumulate on open reading frames (ORFs), leading to transcription initiation from cryptic promoters within ORFs. Although the co-transcriptional deacetylation pathway is well characterized, the factors responsible for acetylation are as yet unknown. Here we show that, in yeast, co-transcriptional acetylation is achieved in part by histone exchange over ORFs. In addition to its function of targeting and activating the Rpd3S complex, H3K36 methylation suppresses the interaction of H3 with histone chaperones, histone exchange over coding regions and the incorporation of new acetylated histones. Thus, Set2 functions both to suppress the incorporation of acetylated histones and to signal for the deacetylation of these histones in transcribed genes. By suppressing spurious cryptic transcripts from initiating within ORFs, this pathway is essential to maintain the accuracy of transcription by RNA polymerase II.
A nuclear Argonaute promotes multigenerational epigenetic inheritance and germline immortality
A nuclear Argonaute promotes multigenerational epigenetic inheritance and germline immortality:
A nuclear Argonaute promotes multigenerational epigenetic inheritance and germline immortality
Nature 489, 7416 (2012). doi:10.1038/nature11352
Authors: Bethany A. Buckley, Kirk B. Burkhart, Sam Guoping Gu, George Spracklin, Aaron Kershner, Heidi Fritz, Judith Kimble, Andrew Fire & Scott Kennedy
Epigenetic information is frequently erased near the start of each new generation. In some cases, however, epigenetic information can be transmitted from parent to progeny (multigenerational epigenetic inheritance). A particularly notable example of this type of epigenetic inheritance is double-stranded RNA-mediated gene silencing in Caenorhabditis elegans. This RNA-mediated interference (RNAi) can be inherited for more than five generations. To understand this process, here we conduct a genetic screen for nematodes defective in transmitting RNAi silencing signals to future generations. This screen identified the heritable RNAi defective 1 (hrde-1) gene. hrde-1 encodes an Argonaute protein that associates with small interfering RNAs in the germ cells of progeny of animals exposed to double-stranded RNA. In the nuclei of these germ cells, HRDE-1 engages the nuclear RNAi defective pathway to direct the trimethylation of histone H3 at Lys 9 (H3K9me3) at RNAi-targeted genomic loci and promote RNAi inheritance. Under normal growth conditions, HRDE-1 associates with endogenously expressed short interfering RNAs, which direct nuclear gene silencing in germ cells. In hrde-1- or nuclear RNAi-deficient animals, germline silencing is lost over generational time. Concurrently, these animals exhibit steadily worsening defects in gamete formation and function that ultimately lead to sterility. These results establish that the Argonaute protein HRDE-1 directs gene-silencing events in germ-cell nuclei that drive multigenerational RNAi inheritance and promote immortality of the germ-cell lineage. We propose that C. elegans use the RNAi inheritance machinery to transmit epigenetic information, accrued by past generations, into future generations to regulate important biological processes.
A nuclear Argonaute promotes multigenerational epigenetic inheritance and germline immortality
Nature 489, 7416 (2012). doi:10.1038/nature11352
Authors: Bethany A. Buckley, Kirk B. Burkhart, Sam Guoping Gu, George Spracklin, Aaron Kershner, Heidi Fritz, Judith Kimble, Andrew Fire & Scott Kennedy
Epigenetic information is frequently erased near the start of each new generation. In some cases, however, epigenetic information can be transmitted from parent to progeny (multigenerational epigenetic inheritance). A particularly notable example of this type of epigenetic inheritance is double-stranded RNA-mediated gene silencing in Caenorhabditis elegans. This RNA-mediated interference (RNAi) can be inherited for more than five generations. To understand this process, here we conduct a genetic screen for nematodes defective in transmitting RNAi silencing signals to future generations. This screen identified the heritable RNAi defective 1 (hrde-1) gene. hrde-1 encodes an Argonaute protein that associates with small interfering RNAs in the germ cells of progeny of animals exposed to double-stranded RNA. In the nuclei of these germ cells, HRDE-1 engages the nuclear RNAi defective pathway to direct the trimethylation of histone H3 at Lys 9 (H3K9me3) at RNAi-targeted genomic loci and promote RNAi inheritance. Under normal growth conditions, HRDE-1 associates with endogenously expressed short interfering RNAs, which direct nuclear gene silencing in germ cells. In hrde-1- or nuclear RNAi-deficient animals, germline silencing is lost over generational time. Concurrently, these animals exhibit steadily worsening defects in gamete formation and function that ultimately lead to sterility. These results establish that the Argonaute protein HRDE-1 directs gene-silencing events in germ-cell nuclei that drive multigenerational RNAi inheritance and promote immortality of the germ-cell lineage. We propose that C. elegans use the RNAi inheritance machinery to transmit epigenetic information, accrued by past generations, into future generations to regulate important biological processes.
Friday, September 14, 2012
Dynamic and Coordinated Epigenetic Regulation of Developmental Transitions in the Cardiac Lineage
Dynamic and Coordinated Epigenetic Regulation of Developmental Transitions in the Cardiac Lineage: Joseph A. Wamstad, Jeffrey M. Alexander, Rebecca M. Truty, Avanti Shrikumar, Fugen Li, Kirsten E. Eilertson, Huiming Ding, John N. Wylie, Alexander R. Pico, John A. Capra, Genevieve Erwin, Steven J. Kattman, Gordon M. Keller, Deepak Srivastava, Stuart S. Levine, Katherine S. Pollard, Alisha K. Holloway, Laurie A. Boyer, Benoit G. Bruneau.
Heart development is exquisitely sensitive to the precise temporal regulation of thousands of genes that govern developmental decisions during differentiation. However, we currently lack a detaile....
Heart development is exquisitely sensitive to the precise temporal regulation of thousands of genes that govern developmental decisions during differentiation. However, we currently lack a detaile....
A Temporal Chromatin Signature in Human Embryonic Stem Cells Identifies Regulators of Cardiac Development
A Temporal Chromatin Signature in Human Embryonic Stem Cells Identifies Regulators of Cardiac Development: Sharon L. Paige, Sean Thomas, Cristi L. Stoick-Cooper, Hao Wang, Lisa Maves, Richard Sandstrom, Lil Pabon, Hans Reinecke, Gabriel Pratt, Gordon Keller, Randall T. Moon, John Stamatoyannopoulos, Charles E. Murry.
Directed differentiation of human embryonic stem cells (ESCs) into cardiovascular cells provides a model for studying molecular mechanisms of human cardiovascular development. Although it is known....
Directed differentiation of human embryonic stem cells (ESCs) into cardiovascular cells provides a model for studying molecular mechanisms of human cardiovascular development. Although it is known....
Thursday, September 13, 2012
Multiple layers of complexity in cis-regulatory regions of developmental genes
Multiple layers of complexity in cis-regulatory regions of developmental genes:
Abstract
Genomes contain the necessary information to ensure that genes are expressed in the right place, at the right time, and with the proper rate. Metazoan developmental genes often possess long stretches of DNA flanking their coding sequences and/or large introns which contain elements that influence gene expression. Most of these regulatory elements are relatively small and can be studied in isolation. For example, transcriptional enhancers, the elements that generate the expression pattern of a gene, have been traditionally studied with reporter constructs in transgenic animals. These studies have provided and will provide invaluable insights into enhancer evolution and function. However, this experimental approach has its limits; often, enhancer elements do not faithfully recapitulate native expression patterns. This fact suggests that additional information in cis-regulatory regions modulates the activity of enhancers and other regulatory elements. Indeed, recent studies have revealed novel functional aspects at the level of whole cis-regulatory regions. First, the discovery of “shadow enhancers”. Second, the ubiquitous interactions between cis-regulatory elements. Third, the notion that some cis-regulatory regions may not function in a modular fashion. Last, the effect of chromatin conformation on cis-regulatory activity. In this article I describe these recent findings and discuss open questions in the field. Developmental Dynamics, 2012. © 2012 Wiley Periodicals, Inc.
Wednesday, September 12, 2012
Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages
Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages:
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
Friday, September 7, 2012
Genes Involved in the Evolution of Herbivory by a Leaf-Mining, Drosophilid Fly
Genes Involved in the Evolution of Herbivory by a Leaf-Mining, Drosophilid Fly:
Herbivorous insects are among the most successful radiations of life. However, we know little about the processes underpinning the evolution of herbivory. We examined the evolution of herbivory in the fly, Scaptomyza flava, whose larvae are leaf miners on species of Brassicaceae, including the widely studied reference plant, Arabidopsis thaliana (Arabidopsis). Scaptomyza flava is phylogenetically nested within the paraphyletic genus Drosophila, and the whole genome sequences available for 12 species of Drosophila facilitated phylogenetic analysis and assembly of a transcriptome for S. flava. A time-calibrated phylogeny indicated that leaf mining in Scaptomyza evolved between 6 and 16 million years ago. Feeding assays showed that biosynthesis of glucosinolates, the major class of antiherbivore chemical defense compounds in mustard leaves, was upregulated by S. flava larval feeding. The presence of glucosinolates in wild-type (WT) Arabidopsis plants reduced S. flava larval weight gain and increased egg–adult development time relative to flies reared in glucosinolate knockout (GKO) plants. An analysis of gene expression differences in 5-day-old larvae reared on WT versus GKO plants showed a total of 341 transcripts that were differentially regulated by glucosinolate uptake in larval S. flava. Of these, approximately a third corresponded to homologs of Drosophila melanogaster genes associated with starvation, dietary toxin-, heat-, oxidation-, and aging-related stress. The upregulated transcripts exhibited elevated rates of protein evolution compared with unregulated transcripts. The remaining differentially regulated transcripts also contained a higher proportion of novel genes than the unregulated transcripts. Thus, the transition to herbivory in Scaptomyza appears to be coupled with the evolution of novel genes and the co-option of conserved stress-related genes.
Herbivorous insects are among the most successful radiations of life. However, we know little about the processes underpinning the evolution of herbivory. We examined the evolution of herbivory in the fly, Scaptomyza flava, whose larvae are leaf miners on species of Brassicaceae, including the widely studied reference plant, Arabidopsis thaliana (Arabidopsis). Scaptomyza flava is phylogenetically nested within the paraphyletic genus Drosophila, and the whole genome sequences available for 12 species of Drosophila facilitated phylogenetic analysis and assembly of a transcriptome for S. flava. A time-calibrated phylogeny indicated that leaf mining in Scaptomyza evolved between 6 and 16 million years ago. Feeding assays showed that biosynthesis of glucosinolates, the major class of antiherbivore chemical defense compounds in mustard leaves, was upregulated by S. flava larval feeding. The presence of glucosinolates in wild-type (WT) Arabidopsis plants reduced S. flava larval weight gain and increased egg–adult development time relative to flies reared in glucosinolate knockout (GKO) plants. An analysis of gene expression differences in 5-day-old larvae reared on WT versus GKO plants showed a total of 341 transcripts that were differentially regulated by glucosinolate uptake in larval S. flava. Of these, approximately a third corresponded to homologs of Drosophila melanogaster genes associated with starvation, dietary toxin-, heat-, oxidation-, and aging-related stress. The upregulated transcripts exhibited elevated rates of protein evolution compared with unregulated transcripts. The remaining differentially regulated transcripts also contained a higher proportion of novel genes than the unregulated transcripts. Thus, the transition to herbivory in Scaptomyza appears to be coupled with the evolution of novel genes and the co-option of conserved stress-related genes.
Thursday, September 6, 2012
[Research Article] Systematic Localization of Common Disease-Associated Variation in Regulatory DNA
[Research Article] Systematic Localization of Common Disease-Associated Variation in Regulatory DNA: Genetic variants that have been associated with diseases are concentrated in regulatory regions of the genome.
Authors: Matthew T. Maurano, Richard Humbert, Eric Rynes, Robert E. Thurman, Eric Haugen, Hao Wang, Alex P. Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, Sandra Stehling-Sun, Audra K. Johnson, Theresa K. Canfield, Erika Giste, Morgan Diegel, Daniel Bates, R. Scott Hansen, Shane Neph, Peter J. Sabo, Shelly Heimfeld, Antony Raubitschek, Steven Ziegler, Chris Cotsapas, Nona Sotoodehnia, Ian Glass, Shamil R. Sunyaev, Rajinder Kaul, John A. Stamatoyannopoulos
Authors: Matthew T. Maurano, Richard Humbert, Eric Rynes, Robert E. Thurman, Eric Haugen, Hao Wang, Alex P. Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, Sandra Stehling-Sun, Audra K. Johnson, Theresa K. Canfield, Erika Giste, Morgan Diegel, Daniel Bates, R. Scott Hansen, Shane Neph, Peter J. Sabo, Shelly Heimfeld, Antony Raubitschek, Steven Ziegler, Chris Cotsapas, Nona Sotoodehnia, Ian Glass, Shamil R. Sunyaev, Rajinder Kaul, John A. Stamatoyannopoulos
Circuitry and Dynamics of Human Transcription Factor Regulatory Networks
Circuitry and Dynamics of Human Transcription Factor Regulatory Networks: Shane Neph, Andrew B. Stergachis, Alex Reynolds, Richard Sandstrom, Elhanan Borenstein, John A. Stamatoyannopoulos.
The combinatorial cross-regulation of hundreds of sequence-specific transcription factors (TFs) defines a regulatory network that underlies cellular identity and function. Here we use genome-wide ....
The combinatorial cross-regulation of hundreds of sequence-specific transcription factors (TFs) defines a regulatory network that underlies cellular identity and function. Here we use genome-wide ....
Wednesday, September 5, 2012
The long-range interaction landscape of gene promoters
The long-range interaction landscape of gene promoters:
The long-range interaction landscape of gene promoters
Nature 489, 7414 (2012). doi:10.1038/nature11279
Authors: Amartya Sanyal, Bryan R. Lajoie, Gaurav Jain & Job Dekker
The vast non-coding portion of the human genome is full of functional elements and disease-causing regulatory variants. The principles defining the relationships between these elements and distal target genes remain unknown. Promoters and distal elements can engage in looping interactions that have been implicated in gene regulation. Here we have applied chromosome conformation capture carbon copy (5C) to interrogate comprehensively interactions between transcription start sites (TSSs) and distal elements in 1% of the human genome representing the ENCODE pilot project regions. 5C maps were generated for GM12878, K562 and HeLa-S3 cells and results were integrated with data from the ENCODE consortium. In each cell line we discovered >1,000 long-range interactions between promoters and distal sites that include elements resembling enhancers, promoters and CTCF-bound sites. We observed significant correlations between gene expression, promoter–enhancer interactions and the presence of enhancer RNAs. Long-range interactions show marked asymmetry with a bias for interactions with elements located ∼120 kilobases upstream of the TSS. Long-range interactions are often not blocked by sites bound by CTCF and cohesin, indicating that many of these sites do not demarcate physically insulated gene domains. Furthermore, only ∼7% of looping interactions are with the nearest gene, indicating that genomic proximity is not a simple predictor for long-range interactions. Finally, promoters and distal elements are engaged in multiple long-range interactions to form complex networks. Our results start to place genes and regulatory elements in three-dimensional context, revealing their functional relationships.
The long-range interaction landscape of gene promoters
Nature 489, 7414 (2012). doi:10.1038/nature11279
Authors: Amartya Sanyal, Bryan R. Lajoie, Gaurav Jain & Job Dekker
The vast non-coding portion of the human genome is full of functional elements and disease-causing regulatory variants. The principles defining the relationships between these elements and distal target genes remain unknown. Promoters and distal elements can engage in looping interactions that have been implicated in gene regulation. Here we have applied chromosome conformation capture carbon copy (5C) to interrogate comprehensively interactions between transcription start sites (TSSs) and distal elements in 1% of the human genome representing the ENCODE pilot project regions. 5C maps were generated for GM12878, K562 and HeLa-S3 cells and results were integrated with data from the ENCODE consortium. In each cell line we discovered >1,000 long-range interactions between promoters and distal sites that include elements resembling enhancers, promoters and CTCF-bound sites. We observed significant correlations between gene expression, promoter–enhancer interactions and the presence of enhancer RNAs. Long-range interactions show marked asymmetry with a bias for interactions with elements located ∼120 kilobases upstream of the TSS. Long-range interactions are often not blocked by sites bound by CTCF and cohesin, indicating that many of these sites do not demarcate physically insulated gene domains. Furthermore, only ∼7% of looping interactions are with the nearest gene, indicating that genomic proximity is not a simple predictor for long-range interactions. Finally, promoters and distal elements are engaged in multiple long-range interactions to form complex networks. Our results start to place genes and regulatory elements in three-dimensional context, revealing their functional relationships.
Architecture of the human regulatory network derived from ENCODE data
Architecture of the human regulatory network derived from ENCODE data:
Architecture of the human regulatory network derived from ENCODE data
Nature 489, 7414 (2012). doi:10.1038/nature11245
Authors: Mark B. Gerstein, Anshul Kundaje, Manoj Hariharan, Stephen G. Landt, Koon-Kiu Yan, Chao Cheng, Xinmeng Jasmine Mu, Ekta Khurana, Joel Rozowsky, Roger Alexander, Renqiang Min, Pedro Alves, Alexej Abyzov, Nick Addleman, Nitin Bhardwaj, Alan P. Boyle, Philip Cayting, Alexandra Charos, David Z. Chen, Yong Cheng, Declan Clarke, Catharine Eastman, Ghia Euskirchen, Seth Frietze, Yao Fu, Jason Gertz, Fabian Grubert, Arif Harmanci, Preti Jain, Maya Kasowski, Phil Lacroute, Jing Leng, Jin Lian, Hannah Monahan, Henriette O’Geen, Zhengqing Ouyang, E. Christopher Partridge, Dorrelyn Patacsil, Florencia Pauli, Debasish Raha, Lucia Ramirez, Timothy E. Reddy, Brian Reed, Minyi Shi, Teri Slifer, Jing Wang, Linfeng Wu, Xinqiong Yang, Kevin Y. Yip, Gili Zilberman-Schapira, Serafim Batzoglou, Arend Sidow, Peggy J. Farnham, Richard M. Myers, Sherman M. Weissman & Michael Snyder
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic
Architecture of the human regulatory network derived from ENCODE data
Nature 489, 7414 (2012). doi:10.1038/nature11245
Authors: Mark B. Gerstein, Anshul Kundaje, Manoj Hariharan, Stephen G. Landt, Koon-Kiu Yan, Chao Cheng, Xinmeng Jasmine Mu, Ekta Khurana, Joel Rozowsky, Roger Alexander, Renqiang Min, Pedro Alves, Alexej Abyzov, Nick Addleman, Nitin Bhardwaj, Alan P. Boyle, Philip Cayting, Alexandra Charos, David Z. Chen, Yong Cheng, Declan Clarke, Catharine Eastman, Ghia Euskirchen, Seth Frietze, Yao Fu, Jason Gertz, Fabian Grubert, Arif Harmanci, Preti Jain, Maya Kasowski, Phil Lacroute, Jing Leng, Jin Lian, Hannah Monahan, Henriette O’Geen, Zhengqing Ouyang, E. Christopher Partridge, Dorrelyn Patacsil, Florencia Pauli, Debasish Raha, Lucia Ramirez, Timothy E. Reddy, Brian Reed, Minyi Shi, Teri Slifer, Jing Wang, Linfeng Wu, Xinqiong Yang, Kevin Y. Yip, Gili Zilberman-Schapira, Serafim Batzoglou, Arend Sidow, Peggy J. Farnham, Richard M. Myers, Sherman M. Weissman & Michael Snyder
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic
Landscape of transcription in human cells
Landscape of transcription in human cells:
Landscape of transcription in human cells
Nature 489, 7414 (2012). doi:10.1038/nature11233
Authors: Sarah Djebali, Carrie A. Davis, Angelika Merkel, Alex Dobin, Timo Lassmann, Ali Mortazavi, Andrea Tanzer, Julien Lagarde, Wei Lin, Felix Schlesinger, Chenghai Xue, Georgi K. Marinov, Jainab Khatun, Brian A. Williams, Chris Zaleski, Joel Rozowsky, Maik Röder, Felix Kokocinski, Rehab F. Abdelhamid, Tyler Alioto, Igor Antoshechkin, Michael T. Baer, Nadav S. Bar, Philippe Batut, Kimberly Bell, Ian Bell, Sudipto Chakrabortty, Xian Chen, Jacqueline Chrast, Joao Curado, Thomas Derrien, Jorg Drenkow, Erica Dumais, Jacqueline Dumais, Radha Duttagupta, Emilie Falconnet, Meagan Fastuca, Kata Fejes-Toth, Pedro Ferreira, Sylvain Foissac, Melissa J. Fullwood, Hui Gao, David Gonzalez, Assaf Gordon, Harsha Gunawardena, Cedric Howald, Sonali Jha, Rory Johnson, Philipp Kapranov, Brandon King, Colin Kingswood, Oscar J. Luo, Eddie Park, Kimberly Persaud, Jonathan B. Preall, Paolo Ribeca, Brian Risk, Daniel Robyr, Michael Sammeth, Lorian Schaffer, Lei-Hoon See, Atif Shahab, Jorgen Skancke, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner, Diane Trout, Nathalie Walters, Huaien Wang, John Wrobel, Yanbao Yu, Xiaoan Ruan, Yoshihide Hayashizaki, Jennifer Harrow, Mark Gerstein, Tim Hubbard, Alexandre Reymond, Stylianos E. Antonarakis, Gregory Hannon, Morgan C. Giddings, Yijun Ruan, Barbara Wold, Piero Carninci, Roderic Guigó & Thomas R. Gingeras
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the
Landscape of transcription in human cells
Nature 489, 7414 (2012). doi:10.1038/nature11233
Authors: Sarah Djebali, Carrie A. Davis, Angelika Merkel, Alex Dobin, Timo Lassmann, Ali Mortazavi, Andrea Tanzer, Julien Lagarde, Wei Lin, Felix Schlesinger, Chenghai Xue, Georgi K. Marinov, Jainab Khatun, Brian A. Williams, Chris Zaleski, Joel Rozowsky, Maik Röder, Felix Kokocinski, Rehab F. Abdelhamid, Tyler Alioto, Igor Antoshechkin, Michael T. Baer, Nadav S. Bar, Philippe Batut, Kimberly Bell, Ian Bell, Sudipto Chakrabortty, Xian Chen, Jacqueline Chrast, Joao Curado, Thomas Derrien, Jorg Drenkow, Erica Dumais, Jacqueline Dumais, Radha Duttagupta, Emilie Falconnet, Meagan Fastuca, Kata Fejes-Toth, Pedro Ferreira, Sylvain Foissac, Melissa J. Fullwood, Hui Gao, David Gonzalez, Assaf Gordon, Harsha Gunawardena, Cedric Howald, Sonali Jha, Rory Johnson, Philipp Kapranov, Brandon King, Colin Kingswood, Oscar J. Luo, Eddie Park, Kimberly Persaud, Jonathan B. Preall, Paolo Ribeca, Brian Risk, Daniel Robyr, Michael Sammeth, Lorian Schaffer, Lei-Hoon See, Atif Shahab, Jorgen Skancke, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner, Diane Trout, Nathalie Walters, Huaien Wang, John Wrobel, Yanbao Yu, Xiaoan Ruan, Yoshihide Hayashizaki, Jennifer Harrow, Mark Gerstein, Tim Hubbard, Alexandre Reymond, Stylianos E. Antonarakis, Gregory Hannon, Morgan C. Giddings, Yijun Ruan, Barbara Wold, Piero Carninci, Roderic Guigó & Thomas R. Gingeras
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the
An expansive human regulatory lexicon encoded in transcription factor footprints
An expansive human regulatory lexicon encoded in transcription factor footprints:
An expansive human regulatory lexicon encoded in transcription factor footprints
Nature 489, 7414 (2012). doi:10.1038/nature11212
Authors: Shane Neph, Jeff Vierstra, Andrew B. Stergachis, Alex P. Reynolds, Eric Haugen, Benjamin Vernot, Robert E. Thurman, Sam John, Richard Sandstrom, Audra K. Johnson, Matthew T. Maurano, Richard Humbert, Eric Rynes, Hao Wang, Shinny Vong, Kristen Lee, Daniel Bates, Morgan Diegel, Vaughn Roach, Douglas Dunn, Jun Neri, Anthony Schafer, R. Scott Hansen, Tanya Kutyavin, Erika Giste, Molly Weaver, Theresa Canfield, Peter Sabo, Miaohua Zhang, Gayathri Balasundaram, Rachel Byron, Michael J. MacCoss, Joshua M. Akey, M. A. Bender, Mark Groudine, Rajinder Kaul & John A. Stamatoyannopoulos
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to
An expansive human regulatory lexicon encoded in transcription factor footprints
Nature 489, 7414 (2012). doi:10.1038/nature11212
Authors: Shane Neph, Jeff Vierstra, Andrew B. Stergachis, Alex P. Reynolds, Eric Haugen, Benjamin Vernot, Robert E. Thurman, Sam John, Richard Sandstrom, Audra K. Johnson, Matthew T. Maurano, Richard Humbert, Eric Rynes, Hao Wang, Shinny Vong, Kristen Lee, Daniel Bates, Morgan Diegel, Vaughn Roach, Douglas Dunn, Jun Neri, Anthony Schafer, R. Scott Hansen, Tanya Kutyavin, Erika Giste, Molly Weaver, Theresa Canfield, Peter Sabo, Miaohua Zhang, Gayathri Balasundaram, Rachel Byron, Michael J. MacCoss, Joshua M. Akey, M. A. Bender, Mark Groudine, Rajinder Kaul & John A. Stamatoyannopoulos
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to
The accessible chromatin landscape of the human genome
The accessible chromatin landscape of the human genome:
The accessible chromatin landscape of the human genome
Nature 489, 7414 (2012). doi:10.1038/nature11232
Authors: Robert E. Thurman, Eric Rynes, Richard Humbert, Jeff Vierstra, Matthew T. Maurano, Eric Haugen, Nathan C. Sheffield, Andrew B. Stergachis, Hao Wang, Benjamin Vernot, Kavita Garg, Sam John, Richard Sandstrom, Daniel Bates, Lisa Boatman, Theresa K. Canfield, Morgan Diegel, Douglas Dunn, Abigail K. Ebersol, Tristan Frum, Erika Giste, Audra K. Johnson, Ericka M. Johnson, Tanya Kutyavin, Bryan Lajoie, Bum-Kyu Lee, Kristen Lee, Darin London, Dimitra Lotakis, Shane Neph, Fidencio Neri, Eric D. Nguyen, Hongzhu Qu, Alex P. Reynolds, Vaughn Roach, Alexias Safi, Minerva E. Sanchez, Amartya Sanyal, Anthony Shafer, Jeremy M. Simon, Lingyun Song, Shinny Vong, Molly Weaver, Yongqi Yan, Zhancheng Zhang, Zhuzhu Zhang, Boris Lenhard, Muneesh Tewari, Michael O. Dorschner, R. Scott Hansen, Patrick A. Navas, George Stamatoyannopoulos, Vishwanath R. Iyer, Jason D. Lieb, Shamil R. Sunyaev, Joshua M. Akey, Peter J. Sabo, Rajinder Kaul, Terrence S. Furey, Job Dekker, Gregory E. Crawford & John A. Stamatoyannopoulos
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in
The accessible chromatin landscape of the human genome
Nature 489, 7414 (2012). doi:10.1038/nature11232
Authors: Robert E. Thurman, Eric Rynes, Richard Humbert, Jeff Vierstra, Matthew T. Maurano, Eric Haugen, Nathan C. Sheffield, Andrew B. Stergachis, Hao Wang, Benjamin Vernot, Kavita Garg, Sam John, Richard Sandstrom, Daniel Bates, Lisa Boatman, Theresa K. Canfield, Morgan Diegel, Douglas Dunn, Abigail K. Ebersol, Tristan Frum, Erika Giste, Audra K. Johnson, Ericka M. Johnson, Tanya Kutyavin, Bryan Lajoie, Bum-Kyu Lee, Kristen Lee, Darin London, Dimitra Lotakis, Shane Neph, Fidencio Neri, Eric D. Nguyen, Hongzhu Qu, Alex P. Reynolds, Vaughn Roach, Alexias Safi, Minerva E. Sanchez, Amartya Sanyal, Anthony Shafer, Jeremy M. Simon, Lingyun Song, Shinny Vong, Molly Weaver, Yongqi Yan, Zhancheng Zhang, Zhuzhu Zhang, Boris Lenhard, Muneesh Tewari, Michael O. Dorschner, R. Scott Hansen, Patrick A. Navas, George Stamatoyannopoulos, Vishwanath R. Iyer, Jason D. Lieb, Shamil R. Sunyaev, Joshua M. Akey, Peter J. Sabo, Rajinder Kaul, Terrence S. Furey, Job Dekker, Gregory E. Crawford & John A. Stamatoyannopoulos
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors [RESOURCES]
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors [RESOURCES]:
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line–specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line–specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements [METHOD]
Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements [METHOD]:
Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.
Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.
Sequence and chromatin determinants of cell-type-specific transcription factor binding [METHOD]
Sequence and chromatin determinants of cell-type-specific transcription factor binding [METHOD]:
Gene regulatory programs in distinct cell types are maintained in large part through the cell-type–specific binding of transcription factors (TFs). The determinants of TF binding include direct DNA sequence preferences, DNA sequence preferences of cofactors, and the local cell-dependent chromatin context. To explore the contribution of DNA sequence signal, histone modifications, and DNase accessibility to cell-type–specific binding, we analyzed 286 ChIP-seq experiments performed by the ENCODE Consortium. This analysis included experiments for 67 transcriptional regulators, 15 of which were profiled in both the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines. To model TF-bound regions, we trained support vector machines (SVMs) that use flexible k-mer patterns to capture DNA sequence signals more accurately than traditional motif approaches. In addition, we trained SVM spatial chromatin signatures to model local histone modifications and DNase accessibility, obtaining significantly more accurate TF occupancy predictions than simpler approaches. Consistent with previous studies, we find that DNase accessibility can explain cell-line–specific binding for many factors. However, we also find that of the 10 factors with prominent cell-type–specific binding patterns, four display distinct cell-type–specific DNA sequence preferences according to our models. Moreover, for two factors we identify cell-specific binding sites that are accessible in both cell types but bound only in one. For these sites, cell-type–specific sequence models, rather than DNase accessibility, are better able to explain differential binding. Our results suggest that using a single motif for each TF and filtering for chromatin accessible loci is not always sufficient to accurately account for cell-type–specific binding profiles.
Gene regulatory programs in distinct cell types are maintained in large part through the cell-type–specific binding of transcription factors (TFs). The determinants of TF binding include direct DNA sequence preferences, DNA sequence preferences of cofactors, and the local cell-dependent chromatin context. To explore the contribution of DNA sequence signal, histone modifications, and DNase accessibility to cell-type–specific binding, we analyzed 286 ChIP-seq experiments performed by the ENCODE Consortium. This analysis included experiments for 67 transcriptional regulators, 15 of which were profiled in both the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines. To model TF-bound regions, we trained support vector machines (SVMs) that use flexible k-mer patterns to capture DNA sequence signals more accurately than traditional motif approaches. In addition, we trained SVM spatial chromatin signatures to model local histone modifications and DNase accessibility, obtaining significantly more accurate TF occupancy predictions than simpler approaches. Consistent with previous studies, we find that DNase accessibility can explain cell-line–specific binding for many factors. However, we also find that of the 10 factors with prominent cell-type–specific binding patterns, four display distinct cell-type–specific DNA sequence preferences according to our models. Moreover, for two factors we identify cell-specific binding sites that are accessible in both cell types but bound only in one. For these sites, cell-type–specific sequence models, rather than DNase accessibility, are better able to explain differential binding. Our results suggest that using a single motif for each TF and filtering for chromatin accessible loci is not always sufficient to accurately account for cell-type–specific binding profiles.
Predicting cell-type-specific gene expression from regions of open chromatin [METHOD]
Predicting cell-type-specific gene expression from regions of open chromatin [METHOD]:
Complex patterns of cell-type–specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type–specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type–specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type–specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type–specific gene expression in mammalian organisms directly from regulatory sequence.
Complex patterns of cell-type–specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type–specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type–specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type–specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type–specific gene expression in mammalian organisms directly from regulatory sequence.
Personal and population genomics of human regulatory variation [RESEARCH]
Personal and population genomics of human regulatory variation [RESEARCH]:
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
Widespread plasticity in CTCF occupancy linked to DNA methylation [RESEARCH]
Widespread plasticity in CTCF occupancy linked to DNA methylation [RESEARCH]:
CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines. We observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. Unexpectedly, CTCF binding patterns were markedly different in normal versus immortal cells, with the latter showing widespread disruption of CTCF binding associated with increased methylation. Strikingly, this disruption is accompanied by up-regulation of CTCF expression, with the result that both normal and immortal cells maintain the same average number of CTCF occupancy sites genome-wide. These results reveal a tight linkage between DNA methylation and the global occupancy patterns of a major sequence-specific regulatory factor.
CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines. We observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. Unexpectedly, CTCF binding patterns were markedly different in normal versus immortal cells, with the latter showing widespread disruption of CTCF binding associated with increased methylation. Strikingly, this disruption is accompanied by up-regulation of CTCF expression, with the result that both normal and immortal cells maintain the same average number of CTCF occupancy sites genome-wide. These results reveal a tight linkage between DNA methylation and the global occupancy patterns of a major sequence-specific regulatory factor.
Understanding transcriptional regulation by integrative analysis of transcription factor binding data [RESEARCH]
Understanding transcriptional regulation by integrative analysis of transcription factor binding data [RESEARCH]:
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
Analysis of variation at transcription factor binding sites in Drosophila and humans
Analysis of variation at transcription factor binding sites in Drosophila and humans:
Background:
Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines.
Results:
We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding.
Conclusions:
Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Background:
Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines.
Results:
We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding.
Conclusions:
Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Functional analysis of transcription factor binding sites in human promoters
Functional analysis of transcription factor binding sites in human promoters:
Background:
The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1.
Results:
In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif.
Conclusions:
The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.
Background:
The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1.
Results:
In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif.
Conclusions:
The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.
Modeling gene expression using chromatin features in various cellular contexts
Modeling gene expression using chromatin features in various cellular contexts: Background:
Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.
Results:
We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.
Conclusions:
Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.
Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.
Results:
We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.
Conclusions:
Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.
Subscribe to:
Posts (Atom)