Wednesday, July 18, 2012

Proto-genes and de novo gene birth

Proto-genes and de novo gene birth:
Proto-genes and de novo gene birth

Nature 487, 7407 (2012). doi:10.1038/nature11184

Authors: Anne-Ruxandra Carvunis, Thomas Rolland, Ilan Wapinski, Michael A. Calderwood, Muhammed A. Yildirim, Nicolas Simonis, Benoit Charloteaux, César A. Hidalgo, Justin Barbette, Balaji Santhanam, Gloria A. Brar, Jonathan S. Weissman, Aviv Regev, Nicolas Thierry-Mieg, Michael E. Cusick & Marc Vidal



Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or ‘non-genic’ sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ∼1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

Thursday, July 12, 2012

Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions [Gene Expression]

Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions [Gene Expression]:
Identifying transcription factor (TF) binding sites is essential for understanding regulatory networks. The specificity of most TFs is currently modeled using position weight matrices (PWMs) that assume the positions within a binding site contribute independently to binding affinity for any site. Extensive, high-throughput quantitative binding assays let us examine, for the first time, the independence assumption for many TFs. We find that the specificity of most TFs is well fit with the simple PWM model, but in some cases more complex models are required. We introduce a binding energy model (BEM) that can include energy parameters for nonindependent contributions to binding affinity. We show that in most cases where a PWM is not sufficient, a BEM that includes energy parameters for adjacent dinucleotide contributions models the specificity very well. Having more accurate models of specificity greatly improves the interpretation of in vivo TF localization data, such as from chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments.

Efficient Reverse-Engineering of a Developmental Gene Regulatory Network

Efficient Reverse-Engineering of a Developmental Gene Regulatory Network:
by Anton Crombach, Karl R. Wotton, Damjan Cicin-Sain, Maksat Ashyraliyev, Johannes Jaeger



Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to discover whether there are rules or regularities governing development and evolution of complex multi-cellular organisms.

Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells

Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells:
Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells

Nature 487, 7406 (2012). doi:10.1038/nature11236

Authors: Brock A. Peters, Bahram G. Kermani, Andrew B. Sparks, Oleg Alferov, Peter Hong, Andrei Alexeev, Yuan Jiang, Fredrik Dahl, Y. Tom Tang, Juergen Haas, Kimberly Robasky, Alexander Wait Zaranek, Je-Hyuk Lee, Madeleine Price Ball, Joseph E. Peterson, Helena Perazich, George Yeung, Jia Liu, Linsu Chen, Michael I. Kennemer, Kaliprasad Pothuraju, Karel Konvicka, Mike Tsoupko-Sitnikov, Krishna P. Pant, Jessica C. Ebert, Geoffrey B. Nilsen, Jonathan Baccash, Aaron L. Halpern, George M. Church & Radoje Drmanac
Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a

Independent evolution of striated muscles in cnidarians and bilaterians

Independent evolution of striated muscles in cnidarians and bilaterians:
Independent evolution of striated muscles in cnidarians and bilaterians

Nature 487, 7406 (2012). doi:10.1038/nature11180

Authors: Patrick R. H. Steinmetz, Johanna E. M. Kraus, Claire Larroux, Jörg U. Hammel, Annette Amon-Hassenzahl, Evelyn Houliston, Gert Wörheide, Michael Nickel, Bernard M. Degnan & Ulrich Technau
Striated muscles are present in bilaterian animals (for example, vertebrates, insects and annelids) and some non-bilaterian eumetazoans (that is, cnidarians and ctenophores). The considerable ultrastructural similarity of striated muscles between these animal groups is thought to reflect a common evolutionary origin. Here we show that a muscle protein core set, including a type II myosin heavy chain (MyHC) motor protein characteristic of striated muscles in vertebrates, was already present in unicellular organisms before the origin of multicellular animals. Furthermore, ‘striated muscle’ and ‘non-muscle’ myhc orthologues are expressed differentially in two sponges, compatible with a functional diversification before the origin of true muscles and the subsequent use of striated muscle MyHC in fast-contracting smooth and striated muscle. Cnidarians and ctenophores possess striated musclemyhc orthologues but lack crucial components of bilaterian striated muscles, such as genes that code for titin and the troponin complex, suggesting the convergent evolution of striated muscles. Consistently, jellyfish orthologues of a shared set of bilaterian Z-disc proteins are not associated with striated muscles, but are instead expressed elsewhere or ubiquitously. The independent evolution of eumetazoan striated muscles through the addition of new proteins to a pre-existing, ancestral contractile apparatus may serve as a model for the evolution of complex animal cell types.

Transposon transformation into dPRL promoter [Evolution]

Transposon transformation into dPRL promoter [Evolution]: Transposable elements (TEs) are known to provide DNA for host regulatory functions, but the mechanisms underlying the transformation of TEs into cis-regulatory elements are unclear. In humans two TEs—MER20 and MER39—contribute the enhancer/promoter for decidual prolactin (dPRL), which is dramatically induced during pregnancy. We show that evolution of the strong human dPRL promoter was a multistep process that took millions of years. First, MER39 inserted near MER20 in the primate/rodent ancestor, and then there were two phases of activity enhancement in primates. Through the mapping of causal nucleotide substitutions, we demonstrate that strong promoter activity in apes involves epistasis between transcription factor binding sites (TFBSs) ancestral to MER39 and derived sites. We propose a mode of molecular evolution that describes the process by which MER20/MER39 was transformed into a strong promoter, called “epistatic capture.” Epistatic capture is the stabilization of a TFBS that is ancestral but variable in outgroup lineages, and is fixed in the ingroup because of epistatic interactions with derived TFBSs. Finally, we note that evolution of human promoter activity coincides with the emergence of a unique reproductive character in apes, highly invasive placentation. Because prolactin communicates with immune cells during pregnancy, which regulate fetal invasion into maternal tissues, we speculate that ape dPRL promoter activity evolved in response to increased invasiveness of ape fetal tissue.

Tuesday, July 10, 2012

EGFR-dependent network interactions that pattern Drosophila eggshell appendages [RESEARCH ARTICLES]

EGFR-dependent network interactions that pattern Drosophila eggshell appendages [RESEARCH ARTICLES]: David S. A. Simakov, Lily S. Cheung, Len M. Pismen, and Stanislav Y. Shvartsman


Similar to other organisms, Drosophila uses its Epidermal Growth Factor Receptor (EGFR) multiple times throughout development. One crucial EGFR-dependent event is patterning of the follicular epithelium during oogenesis. In addition to providing inductive cues necessary for body axes specification, patterning of the follicle cells initiates the formation of two respiratory eggshell appendages. Each appendage is derived from a primordium comprising a patch of cells expressing broad (br) and an adjacent stripe of cells expressing rhomboid (rho). Several mechanisms of eggshell patterning have been proposed in the past, but none of them can explain the highly coordinated expression of br and rho. To address some of the outstanding issues in this system, we synthesized the existing information into a revised mathematical model of follicle cell patterning. Based on the computational model analysis, we propose that dorsal appendage primordia are established by sequential action of feed-forward loops and juxtacrine signals activated by the gradient of EGFR signaling. The model describes pattern formation in a large number of mutants and points to several unanswered questions related to the dynamic interaction of the EGFR and Notch pathways.

Friday, July 6, 2012

The ontogeny of color: developmental origins of divergent pigmentation in Drosophila americana and D. novamexicana

The ontogeny of color: developmental origins of divergent pigmentation in Drosophila americana and D. novamexicana:

Pigmentation is a model trait for evolutionary and developmental analysis that is particularly amenable to molecular investigation in the genus Drosophila. To better understand how this phenotype evolves, we examined divergent pigmentation and gene expression over developmental time in the dark-bodied D. americana and its light-bodied sister species D. novamexicana. Prior genetic analysis implicated two enzyme-encoding genes, tan and ebony, in pigmentation divergence between these species, but questions remain about the underlying molecular mechanisms. Here, we describe stages of pupal development in both species and use this staging to determine when pigmentation develops and diverges between D. americana and D. novamexicana. For the developmental stages encompassing pigment divergence, we compare mRNA expression of tan and ebony over time and between species. Finally, we use allele-specific expression assays to determine whether interspecific differences in mRNA abundance have a cis-regulatory basis and find evidence of cis-regulatory divergence for both tan and ebony. cis-regulatory divergence affecting tan had a small effect on mRNA abundance and was limited to a few developmental stages, yet previous data suggests that this divergence is likely to be biologically meaningful. Our study suggests that small and developmentally transient expression changes may contribute to phenotypic diversification more often than commonly appreciated. Recognizing the potential phenotypic impact of such changes is important for a scientific community increasingly focused on dissecting quantitative variation, but detecting these types of changes will be a major challenge to elucidating the molecular basis of complex traits.

Thursday, July 5, 2012

Role of Architecture in the Function and Specificity of Two Notch-Regulated Transcriptional Enhancer Modules

Role of Architecture in the Function and Specificity of Two Notch-Regulated Transcriptional Enhancer Modules:
by Feng Liu, James W. Posakony



In Drosophila melanogaster, cis-regulatory modules that are activated by the Notch cell–cell signaling pathway all contain two types of transcription factor binding sites: those for the pathway's transducing factor Suppressor of Hairless [Su(H)] and those for one or more tissue- or cell type–specific factors called “local activators.” The use of different “Su(H) plus local activator” motif combinations, or codes, is critical to ensure that only the correct subset of the broadly utilized Notch pathway's target genes are activated in each developmental context. However, much less is known about the role of enhancer “architecture”—the number, order, spacing, and orientation of its component transcription factor binding motifs—in determining the module's specificity. Here we investigate the relationship between architecture and function for two Notch-regulated enhancers with spatially distinct activities, each of which includes five high-affinity Su(H) sites. We find that the first, which is active specifically in the socket cells of external sensory organs, is largely resistant to perturbations of its architecture. By contrast, the second enhancer, active in the “non-SOP” cells of the proneural clusters from which neural precursors arise, is sensitive to even simple rearrangements of its transcription factor binding sites, responding with both loss of normal specificity and striking ectopic activity. Thus, diverse cryptic specificities can be inherent in an enhancer's particular combination of transcription factor binding motifs. We propose that for certain types of enhancer, architecture plays an essential role in determining specificity, not only by permitting factor–factor synergies necessary to generate the desired activity, but also by preventing other activator synergies that would otherwise lead to unwanted specificities.

Butterfly genome reveals promiscuous exchange of mimicry adaptations among species

Butterfly genome reveals promiscuous exchange of mimicry adaptations among species:
Butterfly genome reveals promiscuous exchange of mimicry adaptations among species

Nature 487, 7405 (2012). doi:10.1038/nature11041




The evolutionary importance of hybridization and introgression has long been debated. Hybrids are usually rare and unfit, but even infrequent hybridization can aid adaptation by transferring beneficial traits between species. Here we use genomic tools to investigate introgression in Heliconius, a rapidly radiating genus of neotropical butterflies widely used in studies of ecology, behaviour, mimicry and speciation. We sequenced the genome of Heliconius melpomene and compared it with other taxa to investigate chromosomal evolution in Lepidoptera and gene flow among multiple Heliconius species and races. Among 12,669 predicted genes, biologically important expansions of families of chemosensory and Hox genes are particularly noteworthy. Chromosomal organization has remained broadly conserved since the Cretaceous period, when butterflies split from the Bombyx (silkmoth) lineage. Using genomic resequencing, we show hybrid exchange of genes between three co-mimics, Heliconius melpomene, Heliconius timareta and Heliconius elevatus, especially at two genomic regions that control mimicry pattern. We infer that closely related Heliconius species exchange protective colour-pattern genes promiscuously, implying that hybridization has an important role in adaptive radiation.

Tuesday, July 3, 2012

Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks [METHOD]

Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks [METHOD]:



Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ~300,000 regulatory edges in a network of ~600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.