Content uploaded by Fernando Tavares
Author content
All content in this area was uploaded by Fernando Tavares on Oct 29, 2018
Content may be subject to copyright.
ABSTRACT
Adapting research-driven routines to the classroom context can promote innovative
and motivational learning environments. Using a case-study approach, we propose
a set of bioinformatics-based activities supported by a tutorial video aiming to
identify genes and disclosing their genomic context in different species. The
rationale is to strengthen teachers’competencies to introduce bioinformatics
resources and tools (e.g., NCBI, ORFinder, BLAST, and MaGe) in their teaching
practices. By doing so, teachers will ultimately enhance students’understanding
of how genomic data mining and comparative genomics are instrumental for
biological research.
Key Words: bioinformatics; comparative genomics; evolution; high school;
motivation.
Introduction
Nowadays computers have a central function in scientists’daily
routine. A personal computer connected to the web is all that
it takes to access a myriad of bioinformatics resources capable
of deconstructing genomic information
into biologically meaningful data. Bioin-
formatics provides tools to comprehen-
sively analyze and save large amounts of
biological data that would be impossible
to investigate without informatics-based
approaches (Bloom, 2001; Madigan et al.,
2018). Here, we present a series of bioin-
formatics activities that enable students,
under the guidance of their teachers, to
query an unknown DNA sequence, mimick-
ing a real research scenario. Activities that
encourage research-driven problems appear to be a stimulus to stu-
dents’interest in scientific careers (STEM), since research-inspired
activities allow them to get familiar with scientific professions and
the academic training required to pursue them (Kovarik et al., 2013).
In order to reconcile simple yet curriculum-oriented bioinfor-
matics activities intended for high school students (15–17 years
old) with high learning impact and didactic value, an inquiry-
based scenario structured in four bioinformatics exercises was
designed. Besides having a positive impact on students’engage-
ment and motivation (Campbell, 2003), the educational value of
these activities extends to the multiple curricular exploration
opportunities they offer. For instance, simply by selecting the
query DNA sequences to be used, teachers can address a plethora
of topics framed in the Next Generation Science Standards, such as gene
regulation, evolution, and drug resistance (Moss, 1997; Brock, 1998;
National Research Council, 2013; Taylor et al., 2014; Cooper, 2015;
Newman et al., 2016).
Online Resources
The bioinformatics applications used in the exercises detailed
below are open-access and web-based, with user-friendly interfa-
ces that run in common web browsers of PC
and Mac computers. Although the applications
chosen are hosted in long-established web-
based platforms that are widely used and cur-
rently indispensable in daily research routines, it
is important to instruct students about the evolv-
ing dynamics of these bioinformatics applications,
resulting from the addition of more data, the
development of new resources, or the display of
increasingly intuitive interfaces. A pilot trial of
these bioinformatics activities was carried out in
a classroom setting with the collaboration of 14
teachers from six schools and involving a total of 387 high school
students (15–18 years old).
Bioinformatics
provides tools to
comprehensively
analyze and save
large amounts of
biological data.
The American Biology Teacher, Vol. 80, No. 8, pp. 619–624, ISSN 0002-7685, electronic ISSN 1938-4211. ©2018 National Association of Biology Teachers. All rights
reserved. Please direct all requests for permission to photocopy or reproduce article content through the University of California Press’s Reprints and Permissions web page,
www.ucpress.edu/journals.php?p=reprints. DOI: https://doi.org/10.1525/abt.2018.80.8.619.
THE AMERICAN BIOLOGY TEACHER MINING THE GENOME
619
TIPS, TRICKS &
TECHNIQUES
Mining the Genome: Using
Bioinformatics Tools in the
Classroom to Support Student
Discovery of Genes
•ANA MARTINS, MARIA JOÃO FONSECA,
FERNANDO TAVARES
Learning Objectives
Specific learning objectives are detailed after each exercise. Through
all these activities, students
•strengthen their knowledge of concepts such as genome,
chromosomes, genes (structural, operator, repressor, regulator,
promoter), start and stop codons, and operons;
•learn new concepts such as open reading frames, synteny, and
comparative genomics; and
•improve their computational skills and increase their digital
literacy.
Class Workflow
To adapt the bioinformatics activities to a classroom context prop-
erly integrated in the high school curricula, the exercises were
designed in collaboration with the teachers who took part in the
pilot trial. Taking into account teachers’suggestions, we propose
a class workflow comprising four parts (I–IV), as schematically
represented in Figure 1 and detailed below. To further assist teachers
in implementing the class workflow, a tutorial video detailing the
four parts was produced (see the online version of the journal to
view the supplemental video). The estimated times correspond to
the average time required by teachers to implement the full set of
activities described below with their students. Regardless of the sug-
gested timeline, it is important to emphasize that each teacher may
easily reschedule the class workflow according to their teaching
agenda either by cutting one or more of the four parts or, alterna-
tively, by stimulating the students’discussion after each exercise.
I. Setting up the theoretical background (estimated time:
60 minutes): The teacher emphasizes the importance of identi-
fying genes from a genomic sequence. Besides recalling basic
concepts such as genome, chromosomes, genes (structural,
operator, repressor, regulator, promotor), and operons, stu-
dents are introduced to important notions, namely start and
stop codons, open reading frames (ORFs), synteny, and com-
parative genomics.
II. Introduction to bioinformatics databases and tools (estimated
time: 30 minutes): The teacher highlights the importance of
Figure 1. Proposed class workflow and timeline, taking into account the feedback of 14 inservice teachers who implemented
the exercises in their high school classes as a pilot trial.
THE AMERICAN BIOLOGY TEACHER VOLUME 80, NO. 8, OCTOBER 2018
620
bioinformatics by explaining the exercises and introducing
students to the bioinformatics resources and tools they will
use, namely NCBI database, NCBI ORF finder, NCBI BLAST,
and Microscope (MaGe). The tutorial video (Supplemental
Material) should help teachers in this task and assist students
throughout the exercises.
III. Bioinformatics exercises (estimated time: 70 minutes): Students
carry out the exercises autonomously with the teacher’s supervi-
sion to identify difficulties and answer questions.
IV. Discussion of the results (estimated time: 20 minutes): The
class discusses the results obtained in each exercise and assay to
draw conclusions. Ultimately, the teacher might challenge the
students to explore other case studies and study different geno-
mic regions. In addition, we should not neglect students’
endeavor to explore autonomously the bioinformatics resour-
ces, particularly taking into account their user-friendly and
intuitive interfaces. In fact, during the pilot trial, we observed
that some students took the initiative to extend their in silico
experiments beyond the assigned activities by pursuing their
own research queries, as, for instance: “What is the size of the
genome of a spider?”;“Are virus genomes such as HIV also
available at this database?”;or“Let’s search for the gene coding
for insulin.”
Bioinformatics Exercises
The bioinformatics-based activities described below are structured
according to four distinct exercises (see the video tutorial): 1 –getting
the target DNA sequence; 2 –looking for ORFs; 3 –deciding which of
the retrieved ORFs are likely to be genes; and 4 –analyzing the gene(s)
identified within their expected genomic context. Having in mind that
laboratory-based activities should meet the curricular agenda, and
acknowledging the fact that lac operon is a common example for
teaching gene regulation, the query DNA sequence chosen to exem-
plify these exercises corresponds to lacI and flanking regions. Further-
more, to frame the bioinformatics-based activities in an inquiry-based
approach, all exercises start with a guiding question.
1. Getting the DNA Sequence
This initial exercise aims to answer the question “How does one
access a comprehensive gene bank database to obtain the specific
DNA sequence to be studied?”
1.1. Access NCBI website: http://www.ncbi.nlm.nih.gov/.
1.2. Choose Genome in menu next to the search box.
1.3. Search by “E. coli”.
1.4. At the beginning of the new page, select Reference Genome
by clicking the E. coli strain K12.
1.5. Scroll down and click on the accessing number corre-
sponding to E. coli strain K12 in the Reference Sequence com-
mand to retrieve the full genome sequence.
1.6. Choose the FASTA format.
1.7. Open the selection box Change region shown and type
down the coordinates 366001–368041.
1.8. Copy, paste and save the sequence in a Word or Notepad
document.
Learning objectives. Through the exploration of the comprehensive
bioinformatics database NCBI, students learn
•how the database is organized, its complexity, and
•how to search for DNA sequences and gene sequences for dif-
ferent organisms.
2. Deconstructing the DNA Sequence
This exercise was planned to instruct students how to go from an
unknown DNA sequence to the identification of hypothetical coding
sequences. Students are introduced to the notion of ORFs, which
frequently escapes the scientific lexicon of elementary and high
school biology curricula, but which is instrumental for answering
the question “How is a new DNA sequence deconstructed?”
2.1. Access NCBI ORFfinder: http://www.ncbi.nlm.nih.gov/orf-
finder/.
2.2. Paste the sequence previously saved as Word or Notepad
document into the text box provided.
2.3. Choose the genetic code: 11. Bacterial, Archaeal and Plant
Plastid.
2.4. Choose the option “ATG”and alternative initiation codons.
2.5. Click Submit.
2.6. Analyze the obtained results (Figure 2).
Learning objectives. With this exercise, students
•recognize the six different reading frames in a DNA sequence,
•understand the meaning of ORF, and
•recognize the importance of start and stop codons for identify-
ing all possible ORFs.
3. Which ORFs Are Potential Genes?
Basic Local Alignment Search Tool (BLAST) is a powerful algorithm
capable of finding similarities between a query sequence (DNA or a
protein sequence) and the sequences available in databases (Alt-
schul et al., 1990). Using this application, the students can address
the following questions: “Which of the ORFs retrieved in the previous
exercise are probable genes? Which ORFs are unlikely functional coding
sequences?”
3.1. Select one ORF to study (example: ORF 28).
3.2. Start BLAST of the selected ORF by clicking on BLAST
ORF.
3.3. Click on BLAST in the new page opened.
3.4. Identify the gene (Figure 3).
3.5. Repeat the procedure for other ORFs and analyze the
results obtained.
Learning objectives. Students learn that
•not all DNA sequences bracketed by a start and a stop codon
(i.e., ORFs) are coding sequences,
•ORFs can be located in different reading frames and oriented in
either direction, and
•scrutinizing gene banks by a BLAST search is an effective
approach for identifying putative genes among retrieved ORFs.
Students can discuss possible scenarios to explain a BLAST
search in which no similarities are found.
THE AMERICAN BIOLOGY TEACHER MINING THE GENOME
621
Figure 2. ORFfinder output at NCBI, disclosing all possible open reading frames (ORFs) and their direction within the query DNA
sequence. In addition to the graphic view, details such as ORF coordinates, length, strand, and frame are highlighted in the table
below. By selecting each ORF, it is possible to obtain the translated aminoacid sequence. Immediate BLAST of each ORF may be
executed with the command BLAST ORF.
Figure 3. BLAST output at NCBI, highlighting the similarity scores between the query ORF and the 100 best BLAST hits retrieved
in the database. Clicking over a line displays the alignment between the query sequence and the subject sequence, allowing
identification of differences between the two sequences.
THE AMERICAN BIOLOGY TEACHER VOLUME 80, NO. 8, OCTOBER 2018
622
4. Comparative Genomics
To fully exploit the potential of this activity, the fourth exercise com-
pares the presence of the identified gene or genes, their genomic
context, and their occurrence across different taxa. Using MaGe, a
robust comparative genomics platform (Vallenet et al., 2006), the
students further confirm the identity and putative function of the
gene(s) determined during the BLAST search. Student might ask,“Is
there any evolutionary relationship to explain the occurrence of the stud-
ied genes across different taxa?”
4.1. Access MicroScope website: https://www.genoscope.cns.fr/
agc/microscope/home/index.php.
4.2. Choose Escherichia coli K12 and select Load into genome
browser.
4.3. To identify the gene, search for “lacI”and click Move to.
4.4. Identify lacI gene putting the mouse over each red bar.
4.5. Select options menu.
4.6. In the new window opened, look for the section Viewer
Comparative Map default and choose synteny.
4.7. In the section PkGDB Organism Synteny, press the button
CTRL and choose Bacillus anthracis, another Escherichia species,
Salmonella bongori,Shigella sonnei, and Vibrio cholera.
4.8. Click Save options.
4.9. Compare the presence and the function of the gene in dif-
ferent taxa (Figure 4).
Learning objectives. Through this simple comparative genomics analy-
sis, students learn
•to localize their target gene(s) within the chromosome,
•to identify the genomic features of the flanking regions,
•to determine gene homologies with selected taxa, and
•concepts such as synteny, homology, insertions, deletions, and
horizontal gene transfer.
Additional Remarks
The pilot trial showed that Internet access was not a limitation when
implementing these activities at schools. Nevertheless, teachers
could easily choose to exclude one of the exercises or, alternatively,
to challenge the students to carry them out as homework, and later
resume the bioinformatics exercises in the classroom.
Acknowledgments
Ana Sofia Martins is supported by a fellowship from Fundação para a
Ciência e Tecnologia –FCT (SFRH/BD/112038/2015). The authors
are grateful to all the participant schools and school teachers for
the opportunity to implement the bioinformatics exercises detailed
in this work, which contributed to improving the described activity.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990). Basic
local alignment search tool. Journal of Molecular Biology,215, 403–410.
Bloom, M. (2001). Biology in silico: the bioinformatics revolution. The
American Biology Teacher,63, 397–403.
Figure 4. Comparative genomics analysis carried out using MaGe. The genes and corresponding reading frames (+3, +2, +1, −1,
−2, −3) of the query genes are shown at the top. Below is an outline of other bacteria with which the query gene(s) are being
compared.
THE AMERICAN BIOLOGY TEACHER MINING THE GENOME
623
Brock, D.L. (1998). Now you see it, now you don’t! Making regulation of
gene expression come alive for all students. The American Biology
Teacher,60, 288–290.
Campbell, A.M. (2003). Public access for teaching genomics, proteomics,
and bioinformatics. Cell Biology Education,2,98–111.
Cooper, R.A. (2015). Teaching the big ideas of biology with operon models.
The American Biology Teacher,77,30–39.
Kovarik, D.N., Patterson, D.G., Cohen, C., Sanders, E.A., Peterson, K.A.,
Porter, S.G. & Chowning, J.T. (2013). Bioinformatics education in high
school: implications for promoting science, technology, engineering,
and mathematics careers. CBE Life Sciences Education,12, 441–459.
Madigan, M., Bender, K., Buckley, D., Sattley, W. & Stahl, D. (2018). Brock:
Biology of Microorganisms, 15th Ed. San Francisco, CA: Pearson
Education/ Benjamin Cummings.
Moss, R. (1997). A discovery lab for studying gene regulation. The
American Biology Teacher,59, 522–526.
National Research Council (2013). Next Generation Science Standards: For
States, By States. Washington, DC: National Academies Press.
Newman, L., Duffus, A.L.J. & Lee, C. (2016). Using the free program MEGA to
build phylogenetic trees from molecular data. The American Biology
Teacher,78, 608–612.
Taylor, J.M., Davidson, R.M. & Strong, M. (2014). Drug-resistant tuberculosis:
a genetic analysis using online bioinformatics tools. The American
Biology Teacher,76, 386–394.
Vallenet, D., Labarre, L., Rouy, Z., Barbe, V., Bocs, S., Cruveiller, S. &
Médigue, C. (2006). MaGe: a microbial genome annotation system
supported by synteny results. Nucleic Acids Research,34,53–65.
ANA MARTINS is a doctoral student in the Department of Biology, Faculty
of Sciences, University of Porto, Porto, Portugal; and a doctoral student at
CIBIO-InBIO –Research Center in Biodiversity and Genetic Resources,
University of Porto, Vairão, Portugal; email: asmartins@cibio.up.pt. MARIA
JOÃO FONSECA is External Collaborator at CIBIO-InBIO –Research Center
in Biodiversity and Genetic Resources, University of Porto, Vairão,
Portugal; and Head of Communication at MHNC-UP –Natural History and
Science Museum of the University of Porto, Porto, Portugal; email:
mjfonseca@mhnc.up.pt. FERNANDO TAVARES is Assistant Professor in the
Department of Biology, Faculty of Sciences, University of Porto, Porto,
Portugal, and Group Leader at CIBIO-InBIO –Research Center in
Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal;
email: ftavares@fc.up.pt.
THE AMERICAN BIOLOGY TEACHER VOLUME 80, NO. 8, OCTOBER 2018
624