Gene Annotation: Finding and characterizing genes in genomes

Malcolm Arnott ‘22, Blake Tellinghusen ‘23, Vy Lam ‘22, Jack Allen ‘21, Sam Alper ‘19, Juliana Choza ‘20, and Sami Zimmerman ‘19

  • A genome is a chain of molecules that include genes (regions that code for proteins) and non-coding regions. Our knowledge of gene structure is what allows researchers to determine where genes start and end and which sequence corresponds to exons (expressed sequence) vs introns (sequence that is removed before a gene is expressed). Drosophila melanogaster is a model organism whose genome was one of the first ones to be sequenced. The genes in the D. melanogaster genome have been identified through a process known as gene annotation. The genomes of other Drosophila species are yet to be identified. For this project, we are using a workflow for annotation developed by the Genomics Education Partnership, a multi-institutional collaboration that LC is part of. We are annotating a region (contig3) on the D element (chromosome 3) of the D. ananassae genome. Based on comparison against the D. melanogaster genome, this region may contain the following genes: Nopp 140, P5CDh1, Dbp21E2, and CG14565. We are using sequence similarity against D. melanogaster, computational gene predictors, gene expression data, and intron predictors as evidence to annotate the genes in contig3. We have completed preliminary annotation of Nopp140 and Dbp21E2. Only two out of the four annotated isoforms of Nopp140 are presented in the poster. We find that the structures of the genes are broadly similar to their orthologs (related genes) in D. melanogaster, though small deviations are found at the boundaries of some introns and exons. The preliminary annotations that have been completed so far will be completed again by a different student within the group, to provide an independent verification. The P5CDh1 and CG14565 genes on contig3 remain to be annotated, as well as contig12 on the D element and contig59 on the F element. These will be annotated over the course of the semester.