Content uploaded by Herve Seligmann
Author content
All content in this area was uploaded by Herve Seligmann on Jul 04, 2023
Content may be subject to copyright.
More missing age data in VAERS COVID19 injecon reports for severe than mild adverse events in children and
women at peak ferlity ages
Albert Benavides and Hervé Seligmann
Summary
Numerous reports at VAERS for COVID19 injecons lack informaon (sex, age, state, death etc) at adequate spaces in
VAERS forms. That informaon oen exists within the form's write-up text describing case specics (example in
Figure 1). VaersAware.com screens report texts for missing informaon and compares stascs in the original
database with the cleansed database aer screening for "missing" informaon (Figure 2). Biases for missing
informaon increase as a funcon of event severity and decrease with age. Tendencies for increased bias in missing
informaon with event severity are greatest in children, and decrease with age. Pre- and post-screening biases in
female/male raos increase with event severity for ages above 17, the paern is strongest at peak female ferlity
ages, 18-29. No form adapted to include data of both pregnant women and their un- or new-born child exists,
suggesng a systemic, in addion to systemac, maladministraon of the VAERS informaon database. Could
mechanisms lacking intent explain these observaons?
Introducon
In the VAERS report in Figure 1 the paent age is missing at the form locaon dedicated to this informaon (see
arrow). The informaon exists elsewhere in the form (highlighted within the case descripon). Missing age
informaon occurs in about 28% of reports on adverse events associated with COVID19 injecons. The CDC search
engine includes such reports in a category where age is unknown. Hence these reports are not included in descripve
stascs exploring associaons between event frequencies and age, for example. The underlying assumpon of
analyses of the original reports as in VAERS is that such missing informaon and possible errors distribute randomly
and hence do not alter overall paerns. Here, we test this assumpon for age and sex raos using data cleansed by
VaersAware report screening. Similar bias analyses for other informaon (delays in report publicaon) and other
vaccine types (inuenza, HPV etc) are planned.
Figure 1. Report in VAERS on 2-year old male dead aer COVID19-Janssen injecon with missing age at the form
locaon dedicated to this informaon (indicated by an arrow) but included in the write-up describing case specics
(highlighted).
Materials and Methods
VaersAware.com is updated each Friday, upon VAERS updates. It compares reports with previous downloads and only
adds new reports, aer comparing older versions with the latest one. It also records changes in the informaon in
subsequent downloads for the same report identy. VaersAware screens systemacally reports for COVID19
injecons for missing informaon such as in the Figure 1 example, using adequate key words used in texts in relaon
with the missing informaon, such as year old for age etc. This method might not catch all cases but will improve the
completeness of the data.
VaersAware ranks event severity as follows (from lowest to highest severity rank): oce visit, emergency, hospital,
life threat, birth defect, permanent disability, and death. Analyses here consider also another event severity ranking,
where event severity is proporonal to event rarity, and birth defect gets the same rank as permanent disability
(from lowest to highest severity rank): ): oce visit, hospital, emergency, permanent disability+birth defect, life
threat, and death.
Results
Missing data by event category
Figure 2a-h compares counts for events by age categories according to the original CDC reports and aer screening
for missing age (cleansed). Reports for all events with unknown age drop from 472827 before screening to 132305
aer screening. Considering only deaths, unknown ages drop aer cleansing from 12171 to 3196.
Figure 2. Counts of reports by event type (a-all, b-death, c-life threat, d-permanent disability, e-emergency, f-hospital,
g-oce visit, h-birth defect) from original CDC reports at the form's dedicated locaon and aer screening for
missing age informaon in the form's case descripon write-up by VaersAware (cleansed). VaersAware.com was
accessed July 1st 2023.
Cleansed/original bias in missing age vs age
For deaths (data from Figure 2b), the bias for missing age informaon, esmated by the rao between cleansed and
original death counts, is highest for younger age categories and decreases with age (Pearson coecient correlaon r
= -0.843; aer excluding age category 0-6-years-old r = -0.914 (two-tailed P < 0.01 in both cases), Figure 3). The high
rao for age category 0-6 is probably inated by prenatal deaths for which age might not be indicated in original
reports.
Figure 3. Rao between counts of deaths according to cleansed VAERS reports for missing age and original reports as
a funcon of mean of age range per age category. Numbers near datapoints indicate cleansed/original counts for
each age category, data from Figure 2b.
This analysis is repeated for each event type (birth defects excluded) and produces three addional negave
correlaons, two stascally signicant (hospital, r = -0.681; oce visit, r = -0.763, two-tailed P < 0.01 in both cases;
permanent disability, r = -0.119, P > 0.05). For two events this analysis produces posive correlaons that are not
stascally signicant (life threat, r = 0.307; emergency, r = 0.509, both have P < 0.05). Overall, missing age
informaon biases are largest for the young and decrease with age.
Cleansed/original bias in missing age vs event severity
Analyses such as in Figure 3 consider separately each event type and explore variaon within each event type in the
cleansed/original counts across dierent ages. Analyses below consider separately each age category and explore
variaon within each age category in the cleansed/original counts across dierent events. Results from the former
analysis suggest that posive correlaons are to be expected (the more severe the greater the bias), jusfying the
use of one tailed stascal tests. Figure 4 plots cleansed/original raos for event counts for age category 6-11 years
old as a funcon of event severity ranked according to VaersAware (lled symbols) and their rarity in the cleansed
COVID19-injecon report database (hollow symbols). Missing age data biases overall increase with event severity (r =
0.749, P = 0.043 and r = 0.793, P = 0.0299, respecvely).
54/30 136/79
104/65
669/440
966/627
1392/963
2787/2066
5362/4241
8047/6178
8760/6089
4017/2655
206/153
92/20 ->0-6y-old
0-6years-old excluded
y = 4.6145x-0.288
R² = 0.8347
1
020 40 60 80 100
Ratio between cleansed/original cdc death report counts, log scale
Age
Figure 4. Rao between event counts according to cleansed VAERS reports for missing age and original report counts
as a funcon of event severity for age category 6-11 years old. Filled (interrupted line) and hollow (doed line) circles
are for event severity ranked according to VaersAware and event rarity, respecvely (oce visits and deaths have
lowest and highest severity ranks in both rankings). Data are from Figure 2b-h and were cleansed by VaersAware.
An overall tendency for increasing bias for missing data with event severity exists across all age categories besides
the oldest, 100-119y-old (Table 1). Obtaining posive correlaons in 12 among 13 tests has a one tailed stascal
signicance P = 0.0017 according to a sign test using the binomial distribuon.
Age cat.
r rarity
r VaersAware
0-5
0.78
0.81
6-11
0.79
0.75
12-15
0.17
0.61
16-17
0.55
0.81
18-29
0.40
0.62
30-39
0.48
0.68
40-49
0.35
0.59
50-59
0.31
0.53
60-69
0.36
0.55
70-79
0.49
0.59
80-89
0.51
0.56
90-99
0.45
0.55
100-119
-0.21
-0.40
Table 1. Pearson correlaon coecient r of rao between cleansed and original VAERS event counts and event
severity for 13 age categories, such as in the example in Figure 4. Correlaons were using the VaersAware event
severity ranks, and for events ranked by increasing rarity. Data are from Figure 2b-h.
Age, event severity and cleansed/original bias in missing age
Results in Table 1 show that the increase in biases in missing age data with event severity is highest for the young and
decreases with age (Figure 5). This overall decrease is stascally signicant at P < 0.05 for analyses ranking event
severity as in VaersAware and according to event rarity (r = -0.698 and r = -0.558, respecvely).
R² = 0.5614
R² = 0.6291
1
1.2
1.4
1.6
1.8
2
01234567
Ratio between cleansed/original cdc report counts, age
category 6-11y-old
Event severity ranked by event rarity
low ---------------------------------------------------------------------high severity
Figure 5. Pearson correlaon coecients r (Table 1) as a funcon of mean age range (lled circles-event severity
ranking according to VaersAware, hollow circles- ranking according to event rarity).
Results overall indicate that missing data for age in VAERS are most frequent for more severe events, and in the
youngest age groups, and this nonrandom bias in missing data is ubiquitous across events and ages.
Biases in missing informaon aect sex raos in relaon to event severity
These results on biases in missing age data increasing with event severity especially in children suggest that missing
informaon biases might occur specically for women, especially at peak female ferlity ages. Therefore, data in
Figure 2 are used to calculated pre- and post-screening female/male raos for each age and event category, including
for birth defect (Figure 2h). The bias in missing age informaon for sex raos is calculated by dividing the post-
screening sex rao by the pre-screening sex rao (Table 2).
Severity
VaersAware
1
3
2
6
4
7
5
rarity
1
2
3
4
5
6
4
Age
Office visit
Hosp.
Emerg.
Perm. Dis.
Life threat
Death
Birth defect
r-VaersAw.
r-rarity
3
1.004
0.952
0.983
1.097
1.043
0.944
2.577
0.265
0.189
8.5
0.998
0.968
0.992
0.965
1.075
0.977
1.000
-0.176
0.247
13
1.010
0.984
0.973
0.983
0.997
0.868
-0.711
-0.690
16.5
1.007
0.976
0.991
0.994
1.015
0.907
1.071
-0.314
-0.302
23.5
0.972
0.978
0.994
1.012
0.992
1.052
1.005
0.884
0.851
34.5
0.975
0.982
0.989
1.008
0.973
1.036
1.001
0.833
0.665
44.5
0.966
0.980
0.991
1.010
0.961
1.006
1.004
0.714
0.516
55.5
0.963
0.989
0.998
1.006
0.975
0.971
1.006
0.391
0.302
65.5
0.970
0.991
0.990
1.000
0.997
0.983
0.994
0.669
0.662
75.5
0.977
0.998
1.000
1.008
0.995
1.020
0.905
0.105
0.167
85.5
0.979
1.020
1.013
1.013
0.980
1.056
1.050
0.665
0.515
95.5
1.014
1.042
1.050
1.013
1.032
1.065
0.351
0.494
109.5
1.004
1.014
1.037
0.963
1.142
1.077
0.212
0.578
Table 2. Rao between post- and pre-screening female/male rao for each age category and event in Figure 2. The
two last columns indicate Pearson correlaon coecients r between ranked event severity and raos for a given age
R² = 0.4871
R² = 0.3108
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
020 40 60 80 100
Pearson correlation coefficient r between bias in
missing data and event severity
Mean of age range
according to VaersAware severity ranks, and events ranked according to event rarity. Stascally signicant Pearson
correlaon coecients r are underlined in bold.
The bias in missing age informaon for post- and pre-screening sex raos increases with event severity in 10 and 11
among 13 age categories (posive r in the two last columns in Table 2) for event severity ranked according to
VaersAware and event rarity, respecvely (P = 0.0461, and P = 0.0112, respecvely, one tailed sign tests using the
binomial distribuon). Figure 5 plots this bias in missing age informaon for sex raos as a funcon of event severity
for age categories 19-29 and 30-39 years old.
A
R² = 0.7759
R² = 0.6879
0.960
0.970
0.980
0.990
1.000
1.010
1.020
1.030
1.040
1.050
1.060
0 1 2 3 4 5 6 7
Ratio between post- and pre-cleansed female/male ratios
Events ranked according to severity
Low------------------------------------------------------------------------------------->High severity
Figure 6. Rao between post- and pre-cleansed female/male raos for women 18-29 and 30-39 years old (lled and
hollow symbols, and interrupted and doed lines, respecvely) versus event severity rank. A-event severity ranks
according to VaersAware, B- event severity ranks according to event rarity.
General discussion
There is a surprising amount of structure in the distribuon of missing age data in VAERS reports, in relaon to age,
event severity and sexes. Such biases could not be detected if the CDC administraon would not make public the
data, as most other administraons. Nevertheless, the very fact that no form adapted to include data of both
pregnant women and their un- or new-born child exists shows a systemic, in addion to systemac,
maladministraon of the VAERS informaon database. We hope that explanaons implying unintenonal
mechanisms for these nonrandom biases will be proposed. Below, the possibility that this structure results from
simple mathemacal eects is discussed.
Considering age categories, one could suggest that the number of missing data among death events that are
recovered is approximately constant. Considering deaths, lets assume for the sake of the example that one missing
age is found by the cleansing process at VaersAware for all age categories. In that case, the proporon that this single
case represents decreases with the number of deaths in that age category. As deaths typically increase with age, this
would produce negave associaons between bias and age. A similar raonale could also explain the posive
correlaons observed between bias in missing data and event severity, for rankings where event numbers decrease
with their severity. However, the data in Figure 2 clearly show that numbers of reports for which age is recovered are
not constant at all. In addion, paerns are in most analyses stronger when using the VaersAware event severity
ranking, which is not proporonal to event rarity. This invalidates the proposed mathemacal trivial explanaon,
which does not explain the observed paerns.
This mathemacal explanaon is even less relevant in relaon to the rao between post- and pre-cleansed sex raos,
because it is a rao between two raos. Such raos between raos are unaected by the proposed scenario where
the number of recovered missing age data is approximately constant. In other terms, at this point and unl new
explanaons and analyses are proposed, the paerns in the distribuon of missing age data are considered to reect
a real process.
B
R² = 0.7213
R² = 0.441
0.960
0.970
0.980
0.990
1.000
1.010
1.020
1.030
1.040
1.050
1.060
0 1 2 3 4 5 6
Ratio between post- and pre-cleansed female/male ratios
Events ranked according to severity
Low------------------------------------------------------------------------------------->High severity