ArticlePDF Available

Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples

American Society for Microbiology
Applied and Environmental Microbiology
Authors:

Abstract and Figures

To determine the significance of differences between clonal libraries of environmental rRNA gene sequences, differences between homologous coverage curves, CX(D), and heterologous coverage curves, CXY(D), were calculated by a Cramér-von Mises-type statistic and compared by a Monte Carlo test procedure. This method successfully distinguished rRNA gene sequence libraries from soil and bioreactors and correctly failed to find differences between libraries of the same composition.
Content may be subject to copyright.
APPLIED AND ENVIRONMENTAL MICROBIOLOGY,
0099-2240/01/$04.000 DOI: 10.1128/AEM.67.9.4374–4376.2001
Sept. 2001, p. 4374–4376 Vol. 67, No. 9
Copyright © 2001, American Society for Microbiology. All Rights Reserved.
Quantitative Comparisons of 16S rRNA Gene Sequence
Libraries from Environmental Samples
DAVID R. SINGLETON,
1
MICHELLE A. FURLONG,
1
STEPHEN L. RATHBUN,
2
AND WILLIAM B. WHITMAN
1
*
Departments of Microbiology
1
and Statistics,
2
University of Georgia,
Athens, Georgia 30602-2605
Received 8 March 2001/Accepted 11 June 2001
To determine the significance of differences between clonal libraries of environmental rRNA gene sequences,
differences between homologous coverage curves, C
X
(D), and heterologous coverage curves, C
XY
(D), were cal-
culated by a Crame´r-von Mises-type statistic and compared by a Monte Carlo test procedure. This method
successfully distinguished rRNA gene sequence libraries from soil and bioreactors and correctly failed to find
differences between libraries of the same composition.
The sequencing of 16S rRNA genes from clone libraries of
DNAs from environmental samples has led to a wealth of
information concerning prokaryotic diversity. However, in ad-
dition to methodological problems in producing libraries rep-
resentative of the environmental sample (for a review, see
reference 8), this approach is also limited by the difficulty in
comparing libraries and determining if they are significantly
different.
This problem can be addressed quantitatively by application
of the formula for coverage as described by Good (4). Let Xbe
a collection of sequences, such as a library of 16S rRNA genes.
Define the “homologous” coverage of X(or C
X
) by a sample
from Xto be C
X
1(N
X
/n), where N
X
is the number of
unique sequences in the sample (i.e., sequences without a
replicate) and nis the total number of sequences. In practice,
the definition of N
X
depends upon the criteria used to define
uniqueness. For instance, McCaig et al. (6) considered se-
quences without a homolog of 97% similarity to be unique.
Other authors have used 99% sequence similarity as the
criterion. In principle, uniqueness can be defined at any level
of sequence similarity or evolutionary distance (D) and a “ho-
mologous coverage curve,” or C
X
(D), can be generated by
plotting C
X
versus D(Fig. 1). The coverage curve then de-
scribes how well the sample represents the entire library Xat
various levels of relatedness. Typically, coverage might be low
at high levels of relatedness (low values of D), indicating that
only a small fraction of the sequences representing unique
species are, in fact, sampled. In contrast, coverage might be
much higher at low levels of relatedness, indicating that rep-
resentatives of most of the deep phylogenetic groups present in
Xare found in the sample.
While C
X
is the “homologous coverage” of Xby a sample of
X, it is also possible to calculate a “heterologous coverage” of
X(or C
XY
) by a sample Yfrom another collection of sequences
by the following formula: C
XY
1(N
XY
/n), where N
XY
is the
number of sequences in a sample of Xthat are not found in a
sample of Yand nis the number of sequences in the sample of
X. Similarly to N
X
,N
XY
can also be defined at different levels
of Dto generate a coverage curve, C
XY
(D). Moreover, if X
Y, one might expect the coverage curves C
X
(D) and C
XY
(D) [as
well as C
Y
(D) and C
YX
(D)] to be similar. Thus, a test for dif-
ferences between these coverage curves is also a test for dif-
ferences between Xand Y. To determine if the coverage curves
C
X
(D) and C
XY
(D) are significantly different, the distance be-
tween the two curves are first calculated by using the Crame´r-
von Mises test statistic (7):
CXY
D0.0
0.5
CXCXY2
where Dincreases in increments of 0.01. If XY, then C
XY
should not be significantly different than a Ccalculated after
randomly shuffling sequences between the two samples, Xand
Y. Typically, the sequences are randomly shuffled a large num-
ber (N) of times (e.g., N999) and C
XY
is calculated after
each shuffling. The randomized values plus the empirical value
of C
XY
are ranked from largest to smallest, and then the P
value is estimated to be r/(N1), where rdenotes the rank of
the empirical value of C
XY
(5). The two libraries are consid-
ered significantly different when P0.05. We have created a
computer program (LIBSHUFF) that uses a sorted distance
matrix containing both Xand Yas input and returns the cov-
erage curves C
X
(D), C
Y
(D), C
XY
(D), and C
YX
(D), as well as the
Pvalues for both C
XY
and C
YX
, from the distribution of C.
In addition, the distribution of (C
X
C
XY
)
2
with Dappears to
be informative and is given as well (see below). The computer
program LIBSHUFF was written in Perl and can be down-
loaded along with more detailed instructions on its use at http:
//www.arches.uga.edu/whitman/libshuff.html.
A first test of this method was done to ensure that samples
from the same library were not shown to be different. Thus, a
collection of clonal sequences (n275) from a soil community
study (6) was divided into two samples based upon accession
numbers (138 odds and 137 evens). Although the study con-
tained sequences from two sample sites (SL and SAF clones),
sequences from both sites were placed in each data set to form
nearly equivalent samples. A comparison of C
odds/evens
to C
* Corresponding author. Mailing address: Department of Microbi-
ology, University of Georgia, 527 Biological Sciences Bldg.; Athens,
GA 30602-2605. Phone: (706) 542-4219. Fax: (706) 542-2674. E-mail:
whitman@arches.uga.edu.
4374
values resulted in P0.871, which indicated that the two
samples were not significantly different (Fig. 1A). Similar re-
sults were obtained for C
evens/odds
and other arbitrarily di-
vided sequence libraries (Table 1). Thus, as expected, samples
taken from the same library were not found to be different.
To demonstrate that this procedure could correctly differ-
entiate samples from different libraries, sequences of clones
obtained from an activated sludge (SBR1; n97; reference 1)
were compared to grassland soil SL clones. The SBR1 clones
were found to be significantly different from the SL clones
(P0.001; Fig. 1B). More information on the nature of this
difference was obtained by examination of the distribution of
(C
X
C
XY
)
2
with D(Fig. 1B). At low D, the actual (C
X
C
XY
)
2
exceeded the comparable values at P0.05 obtained
during the calculation of C. This result suggested that the
libraries differed greatly at D0.10 but shared many deep
taxa. However, smaller differences at D0.3 suggested that
not all deep phylogenetic groups were found in both libraries.
Similar results were also obtained for comparisons of other soil
and bioreactor libraries (Table 1 and data not shown).
Three sequence collections consisting of multiple samples
were analyzed to determine if differences between the samples
could be detected (Table 1). Clonal libraries derived from the
microbial populations of phosphate-removing (SBR1) and
non-phosphate-removing (SBR2) bioreactors differed in the
abundance of certain taxa (1). However, these differences were
not shown to be significant by our method (Table 1). The
compositions of libraries from the microbial communities of
improved (SL) and unimproved (SAF) upland grass pasture
soils were not found to be significantly different (6). We also
obtained the same conclusion by our method (Table 1). Fi-
nally, comparisons of restriction fragment length types from
C0 and S0, two clonal libraries derived from arid soils, sug-
gested that C0 was more diverse than S0 (2). Our analysis of
the sequences obtained from this study was consistent with this
conclusion and further suggested that S0 was a subset of C0.
C
S0/C0
was not significant, which suggested that all of the taxa
present in S0 were also present in C0 (Table 1). However, the
reciprocal value C
C0/S0
was significant; therefore, C0 also
contained sequences of one or more taxa not found in S0. The
distribution of (C
X
C
XY
)
2
with Dfurther indicated that the
additional taxa in C0 represented moderately deep phyloge-
netic groups, 0.15 D0.25 (Fig. 1C).
FIG. 1. Results of selected LIBSHUFF comparisons. Homologous
(E) and heterologous (F) coverage curves for 16S rRNA gene se-
quence libraries from environmental samples are shown. Solid lines
indicate the value of (C
X
C
XY
)
2
for the original samples at each value
of D.Dis equal the Jukes-Cantor evolutionary distance determined
by the DNADIST program of PHYLIP (3). Broken lines indicate the
950th value (or P0.05) of (C
X
C
XY
)
2
for the randomized samples.
(A) Comparison of clones from grassland soils with odd (X) and even
(Y) accession numbers. (B) Comparison of bioreactor clones SBR1 (X)
and grassland soil SL clones (Y). (C) Comparison of C0 (X) and S0 (Y)
clones from arid soils.
TABLE 1. Comparisons of environmental clone libraries
Site (reference)
Homologous
(X)Heterologous
(Y)P
b
Clones nClones
Grassland soils (6) Odds
a
138 Evens
a
0.871
Evens
a
137 Odds
a
0.933
SAF 138 SL 0.120
SL 137 SAF 0.135
Bioreactors (1) Odds
a
95 Evens
a
0.853
Evens
a
94 Odds
a
0.623
SBR1 97 SBR2 0.308
SBR2 92 SBR1 0.824
Arid soils (2) Odds
a,c
56 Evens
a
0.251
Evens
a
56 Odds
a,c
0.516
C0 59 S0 0.042
S0 53 C0 0.398
Grassland soil/bioreactor SAF 138 SBR1 0.001
SBR1 97 SAF 0.002
SL 137 SBR1 0.001
SBR1 97 SL 0.001
a
Sequences with odd or even accession numbers. Contains mixtures of both
libraries described in the reference, and they are not expected to be different.
b
Value of r/(N1) as described in the text.
c
Accession number AF128647 could not be found and was not included.
VOL. 67, 2001 QUANTITATIVE COMPARISONS OF SEQUENCE LIBRARIES 4375
Sample size should have a major effect on comparisons of
libraries. The minimum number of sequences necessary to dis-
tinguish two dissimilar libraries was expected to increase with
the complexity of the libraries and decrease with the magni-
tude of the dissimilarity. This point was examined in detail by
using two libraries of high diversity and dissimilarity. Variable
numbers of clonal sequences were randomly selected from
either library SBR1 or SL (Y) and compared to the opposite
library (X), and Pvalues were determined for 10 replicates.
Approximately 20 and 25 sequences from SBR1 and SL, re-
spectively, were required to differentiate the two libraries (P
0.05) when Xwas represented by 97 and 137 sequences, re-
spectively (Fig. 2). Tests were also performed to investigate the
required sample size of X(SBR1) when the size of Y(SL) was
small. It was found that nearly all (90) of the sequences from
the SBR1 library were required to distinguish these libraries
when the SL library (Y) was represented by 20 sequences (data
not shown). When the sizes of both libraries were varied, they
were consistently detected as different when the SBR1 (X) and
SL (Y) libraries were represented by 40 and 30 sequences,
respectively (data not shown). While these results may not
generalize to all environmental samples, they should be repre-
sentative of comparisons of libraries from diverse communi-
ties, such as those found in soil and bioreactors. Importantly,
these results suggest than modestly sized libraries from micro-
bial communities similar in complexity to those used in this
study will be distinguished by this method.
We thank Kamyar Farahi and Rob Waldo for help with program-
ming in Perl. We also thank Lihua Wang of the Statistical Consulting
Office at the University of Georgia for help.
This work was supported in part by an award from the Division of
Molecular and Cellular Biosciences at NSF (MCB-0084164).
REFERENCES
1. Bond, P. L., P. Hugenholtz, J. Keller, and L. L. Blackall. 1995. Bacterial
community structures of phosphate-removing and non-phosphate-removing
activated sludges from sequencing batch reactors. Appl. Environ. Microbiol.
61:1910–1916.
2. Dunbar, J., S. Takala, S. M. Barns, J. A. Davis, and C. R. Kuske. 1999. Levels
of bacterial community diversity in four arid soils compared by cultivation and
16S rRNA gene cloning. Appl. Environ. Microbiol. 65:1662–1669.
3. Felsenstein, J. 1993. PHYLIP (phylogenetic inference package) version 3.5c.
University of Washington, Seattle.
4. Good, I. J. 1953. The population frequencies of species and the estimation of
population parameters. Biometrika 40:237–264.
5. Hope, A. C. A. 1968. A simplified Monte Carlo significance test procedure.
J. Royal Statist. Soc. B 30:582–598.
6. McCaig, A. E., L. A. Glover, and J. I. Prosser. 1999. Molecular analysis of
bacterial community structure and diversity in unimproved and improved
upland grass pastures. Appl. Environ. Microbiol. 65:1721–1730.
7. Pettitt, A. N. 1982. Cramer-von Mises statistic, p. 220–221. In S. Kotz and
N. L. Johnson (ed.), Encyclopedia of statistical sciences. Wiley-Interscience,
New York, N.Y.
8. von Wintzingerode, F., U. B. Go¨bel, and E. Stackebrandt. 1997. Determina-
tion of microbial diversity in environmental samples: pitfalls of PCR-based
rRNA analysis. FEMS Microbiol. Rev. 21:213–229.
FIG. 2. Effect of sample size on the discrimination of libraries. A
comparison of the SL library from grassland soil (Y;nvariable) to
the bioreactor library SBR1 (X;n97) (F) and a comparison of the
SBR1 (Y;nvariable) library to the SL (X;n137) library (E)
shown. Each point represents an average of 10 replicates, and the error
bars are 1 standard deviation. The broken line indicates P0.05.
4376 SINGLETON ET AL. APPL.ENVIRON.MICROBIOL.
... Over a period, the analyses started including algorithms for comparing gene sequence libraries and an explanation for structural and compositional differences. One of the first algorithms introduced for such analysis was LIBSHUFF (Singleton et al., 2001). Soon, AMOVA, -LIBSHUFF, DOTUR, UniFrac, SONS, TreeClimber, LibraryCompare, Metastats, and many others were developed. ...
... With the shift to sequence-based microbial diversity studies, the need for quantitative comparison of sequence libraries was soon realized as an essential component of hypothesis-driven ecological research. The first step in this direction was taken by Whitman and colleagues at the University of Georgia with the development of LIBSHUFF, which was created to explain whether two or more communities have the same structure (Singleton et al., 2001). The methodology required that sequences in a given dataset be arranged in an ordered manner with sequences from one library be listed in the FASTA file together followed by those from the second library. ...
Chapter
Coral reefs, an oasis of the marine ecosystem, harbour millions of microorganisms. They are among the most diverse and productive, yet one of the most threatened ecosystems on the earth. Ideally, coral reefs are considered as “rain forests of the sea” because they have a comparable primary production rate with rain forests. Although they represent approximately less than 0.1% of the total ocean surface and host nearly 25% of marine species, corals are known to rely on diverse free-living and associated microbial consortiums to drive the recycling of nutrients and support the sustainability of marine life. In addition, microbial diversity maintains the holobiont health and resilience of ecosystems in tremendous environmental stress, such as anthropogenic disturbances. Consequently, restoration and introduction of microbial diversity in the ocean are of utmost importance in order to effectively conserve and build coral reefs. Recently, significant studies have been made on the profiling of associated diverse microbial consortia. This chapter presents an overview of microbial diversity hotspots in marine corals.
... In meantime, algorithms for comparing gene sequence libraries and an explanation for structural and compositional variations began to be included in the analyses. LIBSHUFF was one of the first algorithms for this type of analysis (Singleton et al. 2001). Several analysis tools were developed, including AMOVA (Excoffier et al. 1992), ∫-LIBSHUFF, DOTUR, (Schloss et al. 2004;Schloss 2008), Mothur (Schloss et al. 2009), UniFrac (Lozupone and Knight 2005;Lozupone et al. 2006), simplex SONS (Glover and Klingman 1981), PSON (Xiao et al. 2011), TreeClimber, ∫-LIBSHUFF, Library Compare, Metastats (Schloss 2008), and many others. ...
Chapter
Whole-genome sequencing (WGS) has proven to be a reliable method. Additionally, it is a commonly used approach in research and surveillance investigations. With the goal of achieving a whole or nearly complete genome sequence, sequencing technologies have been essential from the beginning. This will make sequencing technologies a preferred platform for the investigations of microbiome and single cells from even pristine and extraterrestrial samples. The sequencing and analysis of microbial taxa can currently be done on a variety of platforms and with a variety of assembly techniques. Recently, the whole genome has received significant attention and has been recommended by the Bacteriological Code and the SeqCode. Indeed, the whole genome of axenic culture as well as uncultivated taxa provides comprehensive information for naming a strain, for the enumeration of axenic culture in the future, and for the investigation of important genes that may be useful for various biotechnological applications. This chapter summarizes sequencing methods, a combination of sequencing technologies, algorithms, software solutions, and services for whole genome-based microbial taxonomy.
... Alphadiversity (diversity within a sample) and beta-diversity (diversity across samples) can be estimated by QIIME, Mothur, PhyloToAST (Dabdoub et al., 2016), UPARSE (Edgar, 2013). A broad range of qualitative (presence/absence of taxa) and quantitative (taxon abundance) measures of community distance are available using several tools, including Libuff (Singleton et al., 2001), P-test, ...
Chapter
Full-text available
Microorganisms are important constituent of earth's biodiversity. Advancement in microbial genomics and sequencing technologies has opened floodgates for generation of huge amount of biological data. This has lead to the usage of high-end computing resources for proper data analysis and data management, paving the way for new field of science to emerge, bioinformatics. Bioinformatic approach has opened new avenues for data analysis providing new in-depth knowledge of microbes and their habitat. High thorough put sequencing technologies together with multi-omics approach has further transformed our understanding of microbial communities from a variety of environment which was previously unthinkable. New computational methods are constantly being developed to collect process and extract meaningful biological information from these complex dataset. In this chapter we discuss about these new-age bioinformatics tools that are reshaping the present day microbial research by deciphering complex but vital biological information. The new tools are increasing the resolution at which microbial communities, their complexities and dynamics, can be studied to reveal their genetic potential and their functional diversity.
... Next, 3,000 high-quality reads were randomly selected per sample and analyzed to minimize the overestimation of species richness during clustering associated with the sequencing error (Kim et al., 2013). Good's coverage index (Singleton et al., 2001) for the 3,000 reads per sample in the current study was 0.980, indicating a high degree of coverage and a sufficient read number for the fecal microbiome analysis. Furthermore, the reads were sorted into operational taxonomic units (OTUs) using the UCLUST algorithm, at a sequence identity threshold of 97% (Stackebrandt and Goebel, 1994;Konstantinidis and Tiedje, 2005). ...
Article
Full-text available
Introduction Haploinsufficiency of A20 (HA20) is a form of inborn errors of immunity (IEI). IEIs are genetically occurring diseases, some of which cause intestinal dysbiosis. Due to the dysregulation of regulatory T cells (Tregs) observed in patients with HA20, gut dysbiosis was associated with Tregs in intestinal lamina propria. Methods Stool samples were obtained from 16 patients with HA20 and 15 of their family members. Infant samples and/or samples with recent antibiotics use were excluded; hence, 26 samples from 13 patients and 13 family members were analyzed. The 16S sequencing process was conducted to assess the microbial composition of samples. Combined with clinical information, the relationship between the microbiome and the disease activity was statistically analyzed. Results The composition of gut microbiota in patients with HA20 was disturbed compared with that in healthy family members. Age, disease severity, and use of immunosuppressants corresponded to dysbiosis. However, other explanatory factors, such as abdominal symptoms and probiotic treatment, were not associated. The overall composition at the phylum level was stable, but some genera were significantly increased or decreased. Furthermore, among the seven operational taxonomic units (OTUs) that increased, two OTUs, Streptococcus mutans and Lactobacillus salivarius, considerably increased in patients with autoantibodies than those without autoantibodies. Discussion Detailed interaction on intestinal epithelium remains unknown; the relationship between the disease and stool composition change helps us understand the mechanism of an immunological reaction to microorganisms.
Article
Viral gastrointestinal infections are an important public health concern, and the occurrence of asymptomatic enteric virus infections makes it difficult to prevent and control their spread. This study aimed to determine the prevalence of and factors associated with asymptomatic enteric virus infection in adults in northern Laos. Fecal samples were collected from apparently healthy participants who did not report diarrhea or high fever at the time of the survey in northern Laos, and enteric viruses were detected using polymerase chain reaction (PCR) and reverse transcription (RT)-PCR. Individual characteristics, including the gut microbiome, were compared between asymptomatic carriers and noncarriers of each enteric virus. Of the participants ( N = 255), 12 (4.7%) were positive for norovirus genogroup I (GI), 8 (3.1%) for human adenovirus, and 1 (0.4%) for norovirus GII; prevalence tended to be higher in less-modernized villages. Gut microbial diversity (evaluated by the number of operational taxonomic units) was higher in asymptomatic carriers of norovirus GI or human adenovirus than in their noncarriers. Gut microbiome compositions differed significantly between asymptomatic carriers and noncarriers of norovirus GI or human adenovirus (permutational analysis of variance, P < 0.05). These findings imply an association between asymptomatic enteric virus infection and modernization and/or the gut microbiome in northern Laos.
Article
Full-text available
Recovering a sufficient amount of microbial DNA from extremely low-biomass specimens, such as human skin, to investigate the community structure of the microbiome remains challenging. We developed a sampling solution containing agar to increase the abundance of recovered microbial DNA. Quantitative PCR targeting the 16S rRNA gene revealed a significant increase in the amount of microbial DNA recovered from the developed sampling solution compared with conventional solutions from extremely low-biomass skin sites such as the volar forearm and antecubital fossa. In addition, we confirmed that the developed sampling solution reduces the contamination rate of probable non-skin microbes compared to the conventional solutions, indicating that the enhanced recovery of microbial DNA was accompanied by a reduced relative abundance of contaminating microbes in the 16S rRNA gene amplicon sequencing data. In addition, agar was added to each step of the DNA extraction process, which improved the DNA extraction efficiency as a co-precipitant. Enzymatic lysis with agar yielded more microbial DNA than conventional kits, indicating that this method is effective for analyzing microbiomes of low-biomass specimens.
Chapter
This chapter investigates the movement of moving beyond OTU methods and discusses the necessity and possibility of this movement. First, it describes clustering-based OTU methods and the purposes of using OTUs and definitions of species and species-level analysis in microbiome studies. Then, it introduces the OTU-based methods that move toward single-nucleotide resolution. Third, it describes moving beyond the OTU methods. Finally, it discusses the necessity and possibility of moving beyond OTU methods as well as the issues of sub-OTU methods, assumption of sequence similarity predicting the ecological similarity, and functional analysis and multi-omics integration.KeywordsClustering-based OTU methodsHierarchical clustering OTU methodsHeuristic clustering OTU methodsTaxonomyOTUsSequencing errorSpecies and species-level analysisEukaryote speciesProkaryote or bacterial species16S rRNA methodPhysiological characteristicsSingle-nucleotide resolution-based OTU methodsDistribution-based clustering (DBC)Swarm2Entropy-based methodsOligotypingDenoising-based methodsPyrosequencing flowgramsCluster-free filtering (CFF)DADA2UNOISE2UNOISE3DeblurSeekDeepSub-OTU methodsSequence similarityEcological similarityFunctional analysisMulti-omics integration
Article
Full-text available
The implementation of molecular data for the analysis of nemertean diversity has unraveled the taxonomic status of several species and many higher taxa within the group. Nowadays, a large proportion of novel putative species are being discovered and it is necessary to add molecular data to the morphological description to obtain a correct identification. In this study, we used mitochondrial cytochrome oxidase I gene (COI) as molecular marker to investigate the diversity of nemerteans from a marine Sub-Antarctic environment. We used Automatic Barcode Gap Discovery (ABGD), Poisson Tree Processes (PTP) and Bayesian implementation of the PTP model (bPTP) as well as reciprocal monophyly on neighbor-joining (NJ) and maximum likelihood (ML) trees for species delimitation. ABGD showed a clear barcoding gap (6 to 10%) and the presence of 15 different putative nemertean species in the dataset of 54 COI sequences from the marine protected area “Namuncurá–Burdwood Bank” (MPA-BB), expanding the known biodiversity for the Sub-Antarctic region. Ten monostiliferan and five heteronemertean species were found. Our results also confirm the presence of two Parborlasia corrugatus cryptic species in the Antarctic and Sub-Antarctic region. This work highlights the importance of the MPA-BB as a biodiversity hotspot and provide molecular data for the Phylum Nemertea in the Southwestern Atlantic Ocean.
Chapter
Advances in sequencing technology have played a critical role in our understanding of the microbial diversity on the planet. The qualitative assessment of microbial diversity began with the use of Sanger sequencing-based cloning-dependent approach in the late 20th century. The quantitative assessment of microbial diversity began with LIBSHUFF, which laid the foundation of statistical tools for microbial diversity studies. However, the development of high-throughput sequencing, such as 454 pyrosequencing, Ion Torrent, Illumina, Nanopore posed specific challenges for post-sequencing analysis. Today, researchers widely use Usearch, K-shuff, FLASH, MOTHUR, Quantitative Insights Into Microbial Ecology (QIIME), Rtools, and many others along with many databases like NCBI, RDP, Greengenes, SILVA, and EzBioCloud for accurate taxonomic assignment of Bacteria, Archaea, and Eukarya. In this chapter, the developments in sequencing technology, databases and analyses tools will be discussed. In addition, the current strategy which is used for the assessment of microbial diversity will be discussed briefly.
Article
Bacterial community structure and diversity in rhizospheres in two types of grassland, distinguished by both plant species and fertilization regimen, were assessed by performing a 16S ribosomal DNA (rDNA) sequence analysis of DNAs extracted from triplicate soil plots. PCR products were cloned, and 45 to 48 clones from each of the six libraries were partially sequenced, Phylogenetic analysis of the resultant 275 clone sequences indicated that there was considerable variation in abundance in replicate unfertilized, unimproved soil samples and fertilized, improved soil samples but that there were no significant differences in the abundance of any phylogenetic group. Several clone sequences were identical in the 16S rDNA region analyzed, and the clones comprised eight pairs of duplicate clones and two sets of triplicate clones. Many clones were found to be most closely related to environmental clones obtained in other studies, although three clones were found to be identical to culturable species in databases. The clones were clustered into operational taxonomic units at a level of sequence similarity of >97% in order to quantify diversity. In all, 34 clusters containing two or more sequences were identified, and the largest group contained nine clones. A number of diversity, dominance, and evenness indices were calculated, and they all indicated that diversity was high, reflecting the low coverage of rDNA libraries achieved. Differences in diversity between sample types were not observed. Collector's curves, however, indicated that there were differences in the underlying community structures; in particular, there was reduced diversity of organisms of the or subdivision of the class Proteobacteria (or-proteobacteria) in improved soils.
Article
The use of Monte Carlo test procedures for significance testing, with smaller reference sets than are now generally used, is advocated. It is shown that, for given α = 1/n, n a positive integer, the power of the Monte Carlo test procedure is a monotone increasing function of the size of the reference set, the limit of which is the power of the corresponding uniformly most powerful test. The power functions and efficiency of the Monte Carlo test to the uniformly most powerful test are discussed in detail for the case where the test criterion is N(γ, 1). The cases when the test criterion is Student's t‐statistic and when the test statistic is exponentially distributed are considered also.
Article
This article has no abstract.
Article
A random sample is drawn from a population of animals of various species. (The theory may also be applied to studies of literary vocabulary, for example.) If a particular species is represented r times in the sample of size N , then r &sol; N is not a good estimate of the population frequency, p , when r is small. Methods are given for estimating p , assuming virtually nothing about the underlying population. The estimates are expressed in terms of smoothed values of the numbers n r ( r &equals; 1, 2, 3, ...), where n r is the number of distinct species that are each represented r times in the sample. ( n r may be described as ‘the frequency of the frequency r ’.) Turing is acknowledged for the most interesting formula in this part of the work. An estimate of the proportion of the population represented by the species occurring in the sample is an immediate corollary. Estimates are made of measures of heterogeneity of the population, including Yule's ‘characteristic’ and Shannon's ‘entropy’. Methods are then discussed that do depend on assumptions about the underlying population. It is here that most work has been done by other writers. It is pointed out that a hypothesis can give a good fit to the numbers n r but can give quite the wrong value for Yule's characteristic. An example of this is Fisher's fit to some data of Williams's on Macrolepidoptera.