PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

-For the latest version of this article, please read "Antithesis"- Is it possible to generalise biodiversity metrics to quantify life? We believe so and we point to how and why. Defining Life essentially creates a dichotomy between living matter and non-living. Here we propose how the existence of life in a given space might be quantified using appropriate metrics. We will treat life as a phenomenon of information accumulation and attempt to explore generic information-complexity density metrics that could correlate spatially with known-as-living objects. One such "life density" metric is, put simply, the number of different repeated structures found in a space, in all orders of scale, divided by that space.
Quantifying Life
Ioannis Tamvakis
April 5, 2022
Sainsbury lab, Cambridge University, United Kingdom
Abstract
Defining Life essentially creates a dichotomy between living matter
and non-living. Here we propose how the existence of life in a given
space might be quantified using appropriate metrics, generalizations
of the concept of biodiversity. We will treat life as a phenomenon
of information accumulation and attempt to explore generic mutual
information-complexity density metrics that could correlate spatially
with known-as-living objects. One such “life density” metric is, put
simply, the number of different repeated structures found in a space,
in all orders of scale, divided by that space.
1 Introduction
Rutherford famously said: “All science is either physics or stamp collecting”.
Indeed, biology seems to be the second. The last century we unraveled a
world of byzantine complexity, where in every order of scale between the
atomic scale and the scale of the whole organism there are interesting objects
and processes to characterize. If Plato could see through a microscope at
life, his theory of Forms [1] would have to be greatly expanded, and so would
be the number of Forms considered by d’Arcy Thompson, and the number
of transformations by which a form becomes another [2]. Living systems
are surprisingly ordered as well as complex [3], and one has to understand
them in depth to reconcile their existence with the laws of thermodynamics.
Precisely, by feeding on negative entropy [4], organisms thrive as energy
transfer systems [5]. Living entities, especially the ones we are familiar, like
a horse or an olive tree, can be thought of as replicating objects in space
that have gathered, over considerable evolutionary time, immense amounts
of functional information on how to solve a multitude of problems they
might face in their lifetime [6]. As such it might be invaluable to understand
them under information theory terms [7]. However, although attempts to
find quantifiable properties of life, using complexity theory, have been made
[7, 8], a quantification relating the perceived information complexity and
how much space it occupies has not been explored. On the other hand,
1
an important concept in the public discourse about biology is biodiversity,
normally referring to the number of species in an ecosystem. We will argue
here that biodiversity can be generalized as a metric for the magnitude of
biology, and that if we could conjure reliable and generic complexity density
metrics, we might be able to directly measure the effects of emergence of
Darwinian evolution in chemical (or other) systems. If life correlates with
such metrics, we could provide a biophysical - bioinformatical understanding
of what life is, invaluable for astrobiology.
2 Argument - Results
We would like to assume here on that an intrinsic property of anything we
might call “information” persisting in a space is that it is copied at some
point in time. In contrast to Shannon [9] who considers information transfer
through a medium, here we consider “replicating information”. We argue
that if there is no replicating process, any information existing in a space
where random processes operate would be just subject to decay. A book
that makes no copies of itself will only have limited lifetime. Under this
assumption, we can find what constitutes possible information in a space by
looking for structures that exist more than once. This is analogous to the
search for repetitions that humans use when they are looking at a system
they know nothing about: the first thing we do is go for stamp collecting,
and the way to define what a stamp is can be done in itself, when we use
one instance of a particular stamp as the definition for the next one that
looks exactly like it. It is by this method that we have invented the words
“proteins” “mitochondria”, “cells”, the list goes on and on. This is a crucial
first step in quantifying complexity, as without it we might end up trying to
quantify the complexity of random sequences existing once in our sample,
which are computationally irreducible and score high in Shannon entropy [7].
A second assumption that we would like to use hereon is the existence of an
observer “ Π” [1] able to identify repeated structures in any arbitrary senso-
rial setting. When such a structure is found, we would like to be capable of
understanding how complex this structure is, as different structures might
contain, in a continuous linked fashion, different numbers of complex sub-
structures. To do so we would like to use our third assumption, the existence
of a universal descriptive language, capable of producing descriptive linear
strings that correspond precisely to the structure we are trying to describe.
One example of such a language is the nucleic acid nomenclature (luckily
DNA and RNA, under certain abstraction, are linear as well). Another one,
again from the molecular scale, is the Simplified molecular-input line-entry
system (SMILES) notation for producing descriptive linear strings for chem-
ical substances. When we have descriptive linear strings it is possible to use
Kolmogorov-type complexity metrics, like the one proposed by Chaitin [10]
2
to get an understanding of how complex this repeated structure is. For the
next section however, we will not use our third assumption.
2.1 A thought experiment
Here I would like to propose the following thought experiment. Assume
that you could make yourself arbitrarily small or large, able to observe your
surroundings with a precision related exactly to your size, disregarding for a
moment (or using) how physical laws change in the magnitude of their effects
in different spatial scales. Now imagine that you fixate your attention to
a specific region of space during a specific time, for example a cubic meter
during a minute. Imagine that you can look at this piece of space-time “Υ
for as long as you like, in order to find any structures that you can easily
identify in themselves, and you do this in all orders of scale smaller than
a meter and a minute. Whenever you find one, you add it as an element
in the set “Ο”, possibly recording as metadata the scale you found it in
and a concise description. When confronted with a clear pattern, like waves
or a fractal construction, you treat this self-similarity as a type of distinct
structure that you found in abundance, and add it to Ο. When you think
you are done (you don’t find new ones), you divide by the space-time Υyou
found them in (1·m3·1·min) to get a density metric
T=|Ο|
Υ(1)
Now I can present some thought results to this thought experiment.
When I consider an Υof 1·m3·1·min that has just air in it, then little
more can be found than the different classes of air molecules, dust molecules,
photons, subatomic objects. The absolute position and orientation of the
molecules, as well as the relevant position of groups of air molecules, does
not seem to follow specific patterns as it can be approximated by a ran-
dom positioning model, with the exception of some pressure waves, a sound
passing by. All in all, I can identify a number of repeated structures on the
molecular and atomic scale, lets say 20, relating to the types of molecules,
and another one in the scale of centimeters (the sound waves). That would
mean that the metric T would give 21 repeated structures/1·m3·1·min .
When I take a second sample of air, it happens to have some pollen
from a tree nearby, that carries with it a few cells of a certain bacterium
species. A great deal of new structures can be found. These new structures
are found in all scales from the atomic to the one of the size of the pollen, in
varying time-scales. In terms of molecules, we can identify through their rep-
etition a great number of new molecular species (amino-acids, metabolites,
3
nucleotides, lipids, sugars etc.) In the few orders of scale above the molecu-
lar, there is a deluge of new identifiable objects through their repetition, like
proteins, mRNAs, DNAs, protein complexes, nuclear pores, mitochondria,
cell nuclei, flagella. All these are easily identified by an observer that has
as a visual sensory input at the relevant scale of each object. The number
of these new repeated structures is mindbogglingly high, as these few cells
can hold a fair fraction of what biologists have characterized until now. It
is fair to say that just the number of proteins one can find through their
repetition should be in the order of thousands. All these molecular assem-
blies are a medium for the existence of more identifiable patterns. As in
the case of the sound wave in the previous paragraph, here we can find, for
example, waves of calcium on the membranes of the nuclear envelopes [11].
Many other spatio-temporal processes can be found, although the observer
might not be able to find the right sensorial subspace where these are evi-
dent through their repetition. All in all, the metric T would give a number
for this space-time that is orders of magnitude higher than 21, the result of
the empty air sample. Note that if we were to take the new sample, heat
it up until it becomes plasma and cool it down to room temperature again,
this high density of highly complex repeated structures would disappear.
2.2 A more elaborate metric
One interesting problem that arises when we try to characterize repeated
structures, as observers, is to define when one stops and another starts.
If you consider a protein, what stops us to characterize two half proteins?
Would you characterize a protein as a separate entity, and then go on to
characterize the protein complex it takes part as well? This can bring great
subjectivity to the metric T results. Another problem is, if you encounter a
large crystal, how do you tackle the regularity that can give opportunities
for so many different characterized molecular assemblies? To circumvent
these problems, we have already introduced the assumption that there is a
universal descriptive language. This language could possibly help us tackle
the problems above. For one, by describing a repeated structure in a lin-
ear way, we can apply notions like Kolmogorov complexity to tackle the
regularity of crystals. Although there are many possible configurations for
a crystal, and a great many of them can be seen as to be duplicated in
space, under the algorithmic compression of Kolmogorov complexity all lin-
ear descriptive strings gathered will have one characteristic that is common
between them, which is their production rules. Using this processes of the
development of a descriptive linear string and algorithmic compression, it
could be possible to circumvent the perceived complexity of crystal struc-
tures, and understand how much possible repeated information they hold.
We should note at this point that a general function that can calculate the
4
Kolmogorov complexity of an arbitrary string does not exist, as it is itself
an uncomputable function [12]. This makes any effort to be precise about
the information complexity density of a sample inherently impossible. This
does not mean though that the observations herein do not hold true, in other
words, the fact that we cannot be precise does not mean that the effects of
life on matter we try to describe here do not exist or are not apparent, using
approximations of the idea of our metric. One can use sufficiently general
compression algorithms and in most cases draw the same conclusions. An-
other point to make is that Chaitin, in a profoundly philosophical piece, has
previously attempted to formalize the idea of mutual information between
mathematical objects, where the information he measures is the one shared
by these objects [8]. We believe his work to be in the same vein of inquiry,
so we can also describe the metrics herein as mutual information density
metrics.
Using the capability of measuring how complex each repeated structure
is, we can define a more elaborate metric T2, which is the density of the
sum of the complexities C of algorithmically compressed descriptions K of
the repeated structures O found in all orders of scale of a space-time Υ:
T2 = PC(K(O))
Υ(2)
This metric can be applied readily to situations where we have already a
conveniently descriptive linear string, using techniques devised for bioinfor-
matics [13]. Lets imagine for a moment a region of space that holds the
chromosome 8 of a pollen grain of the plant species Medicago truncatula
(as described in the latest genome version 4.01, provided by Phytozome on-
line database). We used the bioinformatic definitions for repeated elements
and discovery tools provided by Becher et al [13] to quantify how many re-
peated elements exist in the DNA sequence of the chromosome, which has
a length of 46329445 nucleotides, and compare it with with the same se-
quence appropriately randomized (using the fasta-shuffle-letters tool of the
MEME suite [14]). We discovered 20853361 repeated elements of 13.45 mean
length. Using the same discovery method on the randomized sequence gave
23403763 repeated elements, with 11.67 mean length. Although the num-
bers look comparable, in the randomized case the repeats do not exceed the
length of 28 nucleotides, whereas the biggest motif in the natural chromo-
some is of length 76132, giving a long-tailed distribution of lengths. If we for
a moment assume that our sequences are not algorithmically compressible,
so their descriptions stay the same, and naively define as the complexity C
of each repeat the inverse of the chance to find two of the same at random,
so its alphabet A to the power of the sequence length l,C(K(repeat)) = Al,
then the sum of complexities of the repeats found in the natural chromo-
some was calculated in the order of 1045837. The sum of complexities C
5
for the randomized chromosome was calculated to be in the order of 1016.
Since the space Υoccupied by both natural and randomized chromosomes
is the same, under these assumptions the metric T2 gives many orders of
magnitude higher value for the natural chromosome.
If we assume instead that the complexity of each repeat is its length l,
C(K(repeat)) = l, the metric T2 is only marginally higher in the case of
the natural chromosome than in the randomized one. To give an idea of how
the long tailed distribution of motif lengths looks like, we can calculate the
largest motif we expect in the random case by using the commonly used ap-
proximation for the probability of a given pattern to appear at least t times
in N strings with alphabet A: P r(N , A, Pattern, t)(Nt·(k1)|t)/A(t·k),
where n=Nt·k, and(m|n) = m!/[k!(mk)!] is a binomial coefficient. To
ask the question at which length k do all k-mers together have a probability
less than 0.05 to appear (assuming equal probabilities of A, G, C, T to be
in the sequence) we calculate when P r(46329445,4, k, 2) ·4k= 0.05, and we
do get k27. So, by finding what is the largest repeated structure length
we expect by chance, and applying this threshold in both cases for the nat-
ural and randomized chromosome, we do not get any large repeats in the
randomized one, but we get 192882 repeats of total length 10695588 in the
natural chromosome case. This indicates that there are replicative processes
acting on large scale structures on our medium (the DNA string), which was
the signature of life we were trying to quantify. It would be interesting to
compare the natural chromosome repeat properties to the ones created in
an artificial chromosome made by a simple model of replicative processes
acting on a random string. All the above are a weak argument for how a
single chromosome might contribute to an increase of information density
measured by T2, but one has to consider that there is a high probability of
finding the same chromosome duplicated nearby in the space-time Υ, mak-
ing the entire chromosome a repeated structure.
Another system where the metric T2 can be readily applied is 1D cel-
lular automata. In general, cellular automata are known for the emergent
mesoscale patterns that can be found or constructed, which look remark-
ably like elementary particle physics [15] as well as life-like. We pondered
if applying the metric T2 to cellular automata simulation results could be
used to distinguish which update rules gave the most interesting behavior.
Our motivation was to compare our results to other metrics used to quantify
the emergent properties of this system, for different rules. To this end, we
simulated the evolution of the full set of 1D cellular automata with rules
acting on the immediate neighborhood, with the size of each automaton to
be 119 cells, the starting configuration to be all cell states to 0 and a small
population of cells in the middle having all possible triplets, next to each
other. We then applied the same bioinformatic pipeline used above, for the
strings produced after 10000 simulation steps. In this setup, our results
6
(not shown here) indicated that using the appropriate functions C and K
was crucial to distinguish between the different automata behaviors. Due
to the resulting strings sometimes being highly repetitive and compressible,
different assumptions gave radically different results, rendering our naive
approach inconclusive. We invite researchers with proper understanding of
compression algorithms and complexity metrics to investigate further.
3 Discussion
In this paper we propose that a scientifically-philosophically productive ap-
proach in understanding life might be to attempt to quantify it, rather than
define it. Mathematical models that give complex emergent dynamics rem-
iniscent of many properties of living systems [16] blur our preconceptions
of the medium where life exists and the chemistry it might use. Trying to
define life in terms of replication stumbles on the existence of replicating
solitons in Reaction Diffusion systems [17], and adding more properties in
any life definition devolves it into a precariously fragile checklist (especially
for astrobiology). We believe that the notion of life density is important in
the discussion of what life is. Equally important is finding commonalities in
the different levels of organization of life. For example, an observer should be
able to distinguish anything resembling life in the level of the human brain
and its processes, as well as in the micro-scale of the cellular organelles. If
this can be done in a generic way, quantifying the life density of a dying
human could be shown to follow an exponential decay, due to the ability to
capture the existence of, and enumerate, higher level as well as lower level
functions. It is interesting to note at this point that making sense and enu-
merating different object classes in input data has recently proven feasible
using large scale deep learning Neural Net implementations [18]. Finding
commonalities across scales and contexts might lead us to a generalized un-
derstanding of the effect of Darwinian evolution on physical matter, and the
capability of quantifying, thus identifying, life, in systems as exotic as neu-
tron stars; where the underlying chemistry might be vastly different than
the homely warm pond of earth, but capable of information persistence,
propagation and evolution, nevertheless.
References
[1] RM Dancy. Platos introduction of forms, cambridge. 2004.
[2] W Thompson d’Arcy et al. On growth and form. On growth and form,
1942.
7
[3] James Ladyman, James Lambert, and Karoline Wiesner. What is
a complex system? European Journal for Philosophy of Science,
3(1):33–67, 2013.
[4] Erwin Schrodinger. What is life, volume 90. Cambridge: Cambridge
University Press, 1944.
[5] Ville RI Kaila and Arto Annila. Natural selection for least action.
Proceedings of the Royal Society A: Mathematical, Physical and Engi-
neering Sciences, 464(2099):3055–3070, 2008.
[6] Martijn A Huynen and Paulien Hogeweg. Pattern generation in molec-
ular evolution: exploitation of the variation in rna landscapes. Journal
of Molecular Evolution, 39(1):71–79, 1994.
[7] David L Abel. Is life reducible to complexity. Fundamentals of Life,
pages 57–72, 2002.
[8] Gregory J Chaitin. Toward a mathematical definition of life. Informa-
tion, Randomness & Incompleteness: Papers on Algorithmic Informa-
tion Theory, pages 86–104, 1979.
[9] Claude E Shannon. A mathematical theory of communication. The
Bell system technical journal, 27(3):379–423, 1948.
[10] Gregory J Chaitin. Algorithmic information theory. IBM journal of
research and development, 21(4):350–359, 1977.
[11] Giles ED Oldroyd and J Allan Downie. Nuclear calcium changes at
the core of symbiosis signalling. Current opinion in plant biology,
9(4):351–357, 2006.
[12] Michael Sipser. Introduction to the theory of computation . course
technology. Inc.,, 2005.
[13] Verónica Becher, Alejandro Deymonnaz, and Pablo Heiber. Efficient
computation of all perfect repeats in genomic sequences of up to half
a gigabyte, with a case study on the human genome. Bioinformatics,
25(14):1746–1753, 2009.
[14] Timothy L Bailey, James Johnson, Charles E Grant, and William S
Noble. The meme suite. Nucleic acids research, 43(W1):W39–W49,
2015.
[15] James E Hanson. Cellular automata, emergent phenomena in., 2009.
[16] Enrico Sandro Colizzi and Paulien Hogeweg. Evolution of functional
diversification within quasispecies. Genome biology and evolution,
6(8):1990–2007, 2014.
8
[17] Kyoung-Jin Lee, William D McCormick, John E Pearson, and Harry L
Swinney. Experimental observation of self-replicating spots in a
reaction–diffusion system. Nature, 369(6477):215–218, 1994.
[18] Quoc V Le. Building high-level features using large scale unsupervised
learning. In 2013 IEEE international conference on acoustics, speech
and signal processing, pages 8595–8598. IEEE, 2013.
9
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
According to Quasispecies Theory, high mutation rates limit the amount of information genomes can store (Eigen's Paradox), while genomes with higher degrees of neutrality may be selected even at the expenses of higher replication rates (the "Survival of the Flattest" effect). Introducing a complex genotype to phenotype map, such as RNA folding, epitomises such effect because of the existence of neutral networks and their exploitation by evolution, affecting both population structure and genome composition. We re-examine these classical results in the light of an RNA-based system that can evolve its own ecology. Contrary to expectations, we find that quasispecies evolving at high mutation rates are steep and characterized by one master sequence. Importantly, the analysis of the system and the characterization of the evolved quasispecies reveal the emergence of functionalities as phenotypes of non replicating genotypes, whose presence is crucial for the overall viability and stability of the system. In other words, the master sequence codes for the information of the entire ecosystem, while the decoding happens, stochastically, via mutations. We show that this solution quickly outcompetes strategies based on genomes with a high degree of neutrality. In conclusion, individually coded but ecosystem based diversity evolves and persists indefinitely close to the Information Threshold.
Article
Full-text available
The second law of thermodynamics is a powerful imperative that has acquired several expressions during the past centuries. Connections between two of its most prominent forms, i.e. the evolutionary principle by natural selection and the principl of least action, are examined. Although no fundamentally new findings are provided, it is illuminating to see how the tw principles rationalizing natural motions reconcile to one law. The second law, when written as a differential equation o motion, describes evolution along the steepest descents in energy and, when it is given in its integral form, the motion i pictured to take place along the shortest paths in energy. In general, evolution is a non-Euclidian energy density landscap in flattening motion.
Article
Full-text available
Nature 369, 215 - 218 (19 May 1994); doi:10.1038/369215a0 Experimental observation of self-replicating spots in a reaction–diffusion system KYOUNG-JIN LEE*, WILLIAM D. MCCORMICK*, JOHN E. PEARSON† & HARRY L. SWINNEY * Center for Nonlinear Dynamics and the Department of Physics, The University of Texas, Austin, Texas 78712, USA † Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA IN his classic 1952 paper, Turing1 suggested a possible connection between patterns in biological systems and patterns that could form spontaneously in chemical reaction–diffusion systems. Turing's analysis stimulated considerable theoretical research on mathematical models of pattern formation, but Turing-type patterns were not observed in controlled laboratory experiments until 19902,3. Subsequently there has been a renewed interest in chemical pattern formation and in the relationship of chemical patterns to the remarkably similar patterns observed in diverse physical and biological systems4. Numerical simulations of a simple model chemical system have recently revealed spot patterns that undergo a continuous process of 'birth' through replication and 'death' through overcrowding5. Here we report the observation of a similar phenomenon in laboratory experiments on the ferrocyanide–iodate–sulphite reaction. Repeated growth and replication can be observed for a wide range of experimental parameters, and can be reproduced by a simple two-species model, suggesting that replicating spots may occur in many reaction–diffusion systems.
Article
R.M. Dancy explains the Theory of Forms of the Phaedo and Symposium as the outgrowth of the quest for definitions canvassed in the Socratic dialogues. He, therefore, constructs a Theory of Definition for the Socratic dialogues based on the refutations of definitions in those dialogues, and shows how that theory is mirrored in the Theory of Forms. His discussion ranges in detail over a number of Plato's early and middle dialogues, and will be of interest to readers in Plato studies and in ancient philosophy more generally.
Conference Paper
We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a deep sparse autoencoder on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting from these learned features, we trained our network to recognize 22,000 object categories from ImageNet and achieve a leap of 70% relative improvement over the previous state-of-the-art.
Article
IN his classic 1952 paper, Turing1 suggested a possible connection between patterns in biological systems and patterns that could form spontaneously in chemical reaction-diffusion systems. Turing's analysis stimulated considerable theoretical research on mathematical models of pattern formation, but Turing-type patterns were not observed in controlled laboratory experiments until 19902, 3. Subsequently there has been a renewed interest in chemical pattern formation and in the relationship of chemical patterns to the remarkably similar patterns observed in diverse physical and biological systems4. Numerical simulations of a simple model chemical system have recently revealed spot patterns that undergo a continuous process of 'birth' through replication and 'death' through overcrowding5. Here we report the observation of a similar phenomenon in laboratory experiments on the ferrocyanide-iodate-sulphite reaction. Repeated growth and replication can be observed for a wide range of experimental parameters, and can be reproduced by a simple two-species model, suggesting that replicating spots may occur in many reaction-diffusion systems.
Article
In discussions of the nature of life, the terms \complexity," \organ-ism," and \information content," are sometimes used in ways remark-ably analogous to the approach of algorithmic information theory, a mathematical discipline which studies the amount of information nec-essary for computations. We submit that this is not a coincidence and that it is useful in discussions of the nature of life to be able to refer to analogous precisely de ned concepts whose properties can be rigorously studied. We propose and discuss a measure of degree of organization
Article
Glossary Definition of the Subject Introduction Synchronization Domains in One Dimension Particles in One Dimension Emergent Phenomena in Two and Higher Dimensions Future Directions Bibliography