ArticlePDF Available

Abstract and Figures

Adetailed analysis of traffic measurements shows that the aggregation of these measurements forms a statistical distribution, which is approximated with high accuracy by the log-normal distribution. The inter-arrival times and packet sizes, contributing to the formation of network traffic, can be considered as independent. Applying the wavelet transform to traffic measurements, we demonstrate the multiplicative character of traffic series. This result confirms that the scheme, developed by Kolmogorov [Dokl. Akad. Nauk SSSR 31 (1941) 99] for the homogeneous fragmentation of grains, applies also to network traffic.
Content may be subject to copyright.
Physica D 167 (2002) 72–85
On the log-normal distribution of network traffic
I. Antonioua,b,*, V.V. Ivanova,c, Valery V. Ivanovc,d, P.V. Zrelovc
aInternational Solvay Institutes for Physics and Chemistry, CP-231, ULB, Bd. du Triomphe,
1050 Brussels, Belgium
bDepartment of Mathematics, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece
cLaboratory of Information Technologies, Joint Institute for Nuclear Research, 141980 Dubna, Russia
dUniversity Scientific Center, Joint Institute for Nuclear Research, 141980 Dubna, Russia
Received 19 November 2001; accepted 14 March 2002
Communicated by H. Müller-Krumbhaar
Abstract
Adetailedanalysisoftrafficmeasurementsshowsthattheaggregationof these measurements formsastatistical distribution,
whichisapproximatedwith high accuracy by the log-normal distribution.Theinter-arrivaltimes and packet sizes,contributing
to the formation of network traffic, can be considered as independent. Applying the wavelet transform to traffic measurements,
we demonstrate the multiplicative character of traffic series. This result confirms that the scheme, developed by Kolmogorov
[Dokl. Akad. Nauk SSSR 31 (1941) 99] for the homogeneous fragmentation of grains, applies also to network traffic.
© 2002 Elsevier Science B.V. All rights reserved.
PACS: 47.20.Ky; 12.40.Ee
Keywords: Internet; Traffic; Flow; Log-normal; Self-similarity; Wavelet
1. Introduction
Within the global framework of the information so-
ciety fast, reliable and safety data exchange between
local and terrestrial wide-area computer networks has
become a priority issue. Evidence of traffic complex-
ity appears in many forms, such as the heavy tailed
distributions, long-range correlations, self-similarity
found in traffic measurements at different time scales
[2–7]. The complexity revealed from the traffic mea-
surements has led to the suggestion that the traffic of
information cannot be analyzed within the framework
of available models [8,9]. Moreover, the performance
Corresponding author. Fax: +32-2-650-50-28.
E-mail address: iantonio@vub.ac.be (I. Antoniou).
of computer networks crucially depends on the traffic
assessment.
In this context, a major challenge for the emerg-
ing high-speed integrated-services communication
networks is to elaborate a reliable model that can
realistically capture the salient features of network
traffic. Such model may serve as the basis for
the development of methods and tools for quality
assessment, providing more efficient control and
management of information flow in the Internet
[10,11].
We develop here a background traffic model, which
can be used for the analysis and control of network
traffic. In our study we use the traffic measurements
obtained at the input of the Dubna University [12] local
area network (LAN), which includes approximately
0167-2789/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved.
PII: S0167-2789(02)00431-1
I. Antoniou et al./ Physica D 167 (2002) 72–85 73
200–250 interconnected computers. We describe in
Section 2 the data acquisition system of this LAN re-
alized on the basis of a standard IBM PC. In Section
3we analyze the influence of the aggregation process
on the form of the statistical distribution of traffic
flows. In Section 4 we analyze the reasons of traffic
log-normality using real traffic measurements and
model data.
In Section 5, after recalling a “small paper” by
Kolmogorov [1] on the log-normal distribution, we
demonstrate, using wavelets, the multiplicative struc-
ture of traffic measurements, which confirms the
applicability of the Kolmogorov’s scheme [1] to the
network traffic.
Fig. 1. Scheme of a data acquisition system.
2. Data acquisition system
Two protocols are used in the “Dubna” LAN.
The NetBEUI protocol is applied only for internal
exchanges, and the TCP/IP for external communica-
tions. The measurements of network traffic have been
realized at the external side of the input lock of LAN.
The performance of the data acquisition system is
based on an open mode driver [13] (see Fig. 1).
Instandard conditions the network adapter ofa com-
puter is in a mode of detecting a carrying signal (main
harmonic 4–6MHz). After appearing in the cable bits
of the package preamble, the network adapter comes
to a mode of 1bit and 1byte synchronization with the
74 I. Antoniou et al./ Physica D 167 (2002) 72–85
transmitter and starts receiving first bytes of the pack-
age heading. As soon as one succeeds in extracting
the MAC address of the shot receiver from the first
bytes taken by the adapter, the network adapter com-
pares it to its own. In the case of negative result of
the comparison, the network adapter ceases to record
the shot’s bytes into its internal buffer and cleans
its contents and then waits until the next package
appears.
In order to provide conditions for receiving and
analysis of all the packages transmitted over the net-
work, it is necessary to move the adapter devices
to a free mode when all possible shots are recorded
in the buffer. This operation is executed through the
instructions of the NDIS driver.
The free mode driver records the accepted packages
in the preliminary capture buffer and displays the flag
of receiving the package. Then the receiving package
module is activated and analysis of the margin of the
package’s type is carried out to extract TCP/IP pack-
ages from the whole stream.
After identification it is possible to separate and
delete the data block as well as to record the head-
ers to the SQL server database. The recording is per-
formed together with the time data with a frequency up
to 10kHz. Although the recording is performed with
buffering, the mode of saving the packages’ headers
requires enormous server’s resources, as in this case
there is a permanent procedure of recording with small
portions to the hard disk. That is why this mode is
switched on if required at the management system’s
instruction.
The system also provides control over the exter-
nal traffic of the LAN on the basis of controlling the
records in the router table. Initial information on the
legal IP addresses is saved in the database of the LAN
computers from which data on legal addresses are
loaded into the main memory array. The users which
do not participate in forming the external traffic, are
not taken into account when calculating the number
of transferred and received bytes. In order to decrease
the number of sessions of recording the information
on the external traffic in the database, a timer of load
out of the buffer and a timer of changing a current
date have been introduced into the system.
The recorded traffic data correspond approximately
to 20h (1600000 records with a frequency up to
10kHz, which corresponds to 1ms bin size) of mea-
surements. The part of this series corresponding ap-
proximately to 1h of measurements and aggregated
with different bin sizes is presented in Fig. 2. The con-
tribution of the NetBEUI traffic has been estimated
around one to six packages per second during daily
working hours. This is negligibly small compared to
the TCP/IP traffic. In this connection, we may neglect
the influence of non-IP traffic on the TCP/IP traffic.
3. Aggregation of traffic measurements
In order to reconstruct and identify the dynami-
cal system underlying the traffic measurements, we
applied [14] a nonlinear time series analysis and a
feed-forward layered neural network to traffic data.
We demonstrated that the corresponding dynamical
system has reliable values for the time lag and the
embedding dimension. This permitted us to success-
fully apply a feed-forward layered neural network for
the identification and reconstruction of the dynamical
system. We found that the trained neural network re-
produces the statistical distribution of packet sizes of
traffic measurements aggregated with 1s bin size. This
distribution looks like the log-normal distribution.
Having available traffic data measured at high-
frequency (each arriving packet has been recorded
independently, see Section 2), we obtained the pos-
sibility to analyze the influence of the aggregation
bin on the form of the packet size distribution. Fig. 3
shows the packet size distribution for raw traffic mea-
surements, while Figs. 4–6 present the distributions
for measurements aggregated with bin sizes 10ms,
100ms and 1s, respectively.
One can clearly see that for the aggregation with
small bin sizes the packet size distributions have rather
chaotic and nonsystematic character. However, when
the aggregation bin size approaches 1 s (seeFig. 6) the
distribution assumes a stable form that does not change
with further increase of the aggregation bin (see, for
example, Fig. 7) corresponding to the aggregation with
the bin size 10s.
I. Antoniou et al./ Physica D 167 (2002) 72–85 75
Fig. 2. Traffic measurements aggregated with different bin sizes: 0.1, 1 and 10 s.
Fig. 3. Packet size distribution for traffic measurements.
The distributions in Figs. 6 and 7 are well approxi-
mated by the log-normal function [15]:
f(x)=A
2πσ
1
xexp 1
2σ2(lnxµ)2,(1)
where xis the variable, σand µthe parameters of
log-normal distribution and Ais the normalizing mul-
tiplier.
The fitting procedure was realized with the help
of the MINUIT package [16] in the frame of the
well-known Physical Analysis Workstation (PAW),
see details in [17]. The MINUIT package is con-
ceived as a tool to find the minimum value of a
multi-parameter function and to analyze the shape of
the function around the minimum [16]. The principal
application is foreseen for statistical analysis, working
on chi-square or log-likelihood functions, to com-
pute the best-fit parameter values and uncertainties,
76 I. Antoniou et al./ Physica D 167 (2002) 72–85
Fig. 4. Packet size distribution for traffic measurements aggregated
with bin size 10ms.
Fig. 5. Packet size distribution for traffic measurements aggregated
with bin size 100ms.
including correlations between the parameters. It is
especially suited to handle difficult problems.
In Table 1 we present the results of fitting of the
packet size distributions aggregated with different bin
sizes with the help of the log-normal distribution (1).
Fig. 6. Packet size distribution for traffic measurements aggregated
with bin size 1s: fitting curve corresponds to the function (1).
Fig. 7. Packet size distribution for traffic measurements aggregated
with bin size 10s: fitting curve corresponds to the function (1).
Here the calculated value of the minimized function
FCN, usually defined as χ2[16]:
χ2(a) =
N
i=1
[xiyi(a)2]
e2
i
,(2)
I. Antoniou et al./ Physica D 167 (2002) 72–85 77
Table 1
Results of fitting of the packet size distributions aggregated with
different bin sizes by the function (1)
Bin
(s) σµA(×108)νχ
2
1 0.927 ±0.003 8.85 ±0.01 1.04 ±0.01 47 1447
2 0.896 ±0.004 9.62 ±0.01 1.25 ±0.01 47 987
3 0.906 ±0.005 10.06 ±0.01 1.14 ±0.01 47 667
4 0.891 ±0.006 10.40 ±0.01 1.15 ±0.01 47 667
5 0.888 ±0.006 10.62 ±0.01 1.19 ±0.01 47 348
10 0.843 ±0.007 11.36 ±0.01 1.33 ±0.02 46 225
Table 2
Results of fitting of the packet size distributions aggregated with
different bin sizes by the function (3)
Bin
(s) σ(×104)µB(×108)νχ
2
1 0.92 ±0.01 1.476 ±0.007 1.13 ±0.01 47 8242
2 1.89 ±0.01 1.489 ±0.010 1.09 ±0.01 47 5085
3 2.90 ±0.03 1.500 ±0.013 0.99 ±0.01 47 3459
4 4.32 ±0.06 1.432 ±0.018 0.99 ±0.01 47 2771
5 5.65 ±0.10 1.400 ±0.019 1.02 ±0.01 47 2167
10 13.01 ±0.25 1.390 ±0.020 1.13 ±0.02 46 1151
where ais the parameter vector, e2
i=xithe square of
the error on the individual observations, and Nis the
number of channels in the fitting histogram.
One can see that the fitting curves corresponding to
the log-normal distribution approximate experimental
distributions with a reliable accuracy on all regions of
the analyzed distributions.
Feldmann and Whitt [18] noticed that Internet traf-
fic measurements can be well described by the Pareto
and Weibull distributions. We checked, therefore, the
correspondence of our experimentally obtained distri-
butions to the Weibull distributions [15]:
f(x)=
σx
ση1exp x
ση,x0,(3)
which for the definitely chosen of parameters, σand η,
has the form similar to the analyzed real distributions.
Here Bis the normalizing multiplier.
In Table 2 we present the results of fitting of the
packet size distributions aggregated with different bin
sizes with the help of the Weibull distribution (3). One
can see that the corresponding fitting curves approxi-
mate the experimental distributions significantly worse
Fig. 8. Packet size distribution for traffic measurements aggregated
with bin size 1s: fitting curve corresponds to the function (3).
Fig. 9. Packet size distribution for traffic measurements aggregated
with bin size 10s: fitting curve corresponds to the function (3).
compared to the results obtained for the log-normal
distribution (see Figs. 8 and 9).
As we mentioned above, the fitting curves corre-
sponding to the log-normal distribution approximate
experimental distributions with a reliable accuracy on
78 I. Antoniou et al./ Physica D 167 (2002) 72–85
Table 3
Results of fitting of daily part of packet size distributions aggre-
gated with different bin sizes by the function (1)
Bin (s) νχ
2α(%)
1 47 49.84 32.30
2 47 44.76 52.51
3 47 41.53 65.98
all regions of the analyzed distributions. However, as it
can be seen from Table 1, they did not pass the χ2-test.
The main reason is that the distributions presented
in Figs. 6–9 are based on the whole set of data, which
corresponds approximately to 20h of measurements.
But the traffic series, as well as corresponding statisti-
cal distributions, behave differently depending, if the
measurements were done during working hours or not.
In this connection, we tested the correspondence of
experimental distributions to the null-hypothesis (1)
applying the χ2goodness-of-fit criterion using part of
the daily traffic, which is shown in Fig. 2. The results
of this analysis are presented in Table 3.
Here αis the probability (%) that the observed
chi-square will exceed the value χ2by chance even
for a correct model, see, for instance, [15,19]. These
results show that the hypothesis (1) can be accepted
with a high probability (see also Fig. 10). At the same
time it must be noted (see Figs. 6 and 7) that the influ-
ence of the inactive period of LAN does not change
significantly the fundamental form of the statistical
distribution of traffic data.
We conclude, therefore, that
the aggregation of traffic measurements forms
(starting from some threshold value of the aggrega-
tion window) a statistical distribution, which does
not change its form with further increase of the
aggregation window;
this distribution is approximated with high accuracy
by the log-normal distribution.
4. What are the reasons of the log-normality of
network traffic?
Let us try to understand the reasons which may
cause the above observed statistical distribution. We
Fig. 10. Packet size distribution for daily traffic measurements
aggregated with bin size 1s: fitting curve corresponds to the
function (1).
know that this distribution is formed by two dynamical
processes, related to (1) the inter-arrival time intervals,
Tint, and (2) the packet sizes, Ps.
The statistical distributions corresponding to each
of these random variables are presented in Figs. 3
and 11.
The distribution of inter-arrival time intervals ver-
sus the packet sizes is presented in Fig. 12. This
distribution shows that the variables Tint and Pscan
be considered to be independent. In order to check
the correlation between variables Tint and Psstatisti-
cally, we applied the Pearson’s correlation coefficient
and the Fisher’s z-transformation, see details in [19,
p. 636]. Both approaches gave the result near zero,
that shows that variables Tint and Pscan be considered
as uncorrelated.
The part (around 92% of intervals) of the distribu-
tion of inter-arrival time intervals for Tint <150ms,
superimposed by the fitting curve corresponding to the
exponential distribution, is presented in Fig. 13.
At present there are various traffic models based
on Poisson and Poisson-like processes (see [9,20] and
references therein). We decided, therefore, to verify
the reliability of these models using a model for the
I. Antoniou et al./ Physica D 167 (2002) 72–85 79
Fig. 11. Distribution of inter-arrival time intervals for traffic mea-
surements.
generation of traffic measurements based on the fol-
lowing assumptions (Model 1):
1. The inter-arrival times Tint follow the exponential
distribution with parameters shown in Fig. 13.
Fig. 12. Distribution of inter-arrival time intervals vs. packet sizes.
Fig. 13. Distribution of inter-arrival time intervals for intervals
<150ms.
2. The corresponding packet sizes Psfollow the sta-
tistical distribution presented in Fig. 3.
In order to generate the random variables Tint and
Ps, distributed according to any function FUNC (in our
case, the exponential distribution) or one-dimensional
distribution (histogram), we used the following sub-
routines from CERNLIB [21]:
1. FUNPRE, which calculates the percentiles of
FUNC(X) between Xlow and Xhigh using a combi-
nation of trapezoidal and Gaussian integration, and
FUNRAN, which generates random variables be-
tween Xlow and Xhigh with probabilities according
to FUNC(X).
2. HISRAN, which generates random variables be-
tween Xlow and Xhigh according to a one-dimen-
sional distribution, supplied by HISPRE in the
form of a histogram; a uniform random number
generated by RNDM is transformed to the user’s
distribution using the cumulative distribution con-
structed from the histogram.
According to the above described algorithm we gen-
erated a file of model data, which then was aggregated
with different sizes of the aggregation window, starting
80 I. Antoniou et al./ Physica D 167 (2002) 72–85
Fig. 14. Distribution of packet sizes aggregated with 1s bin:
Model 1.
from the window of 1s. Fig. 14 shows the distribution
of packet sizes aggregated with 1s.
We see that this distribution is very far from
the distribution presented in Fig. 6. Moreover, it
can be approximated with a reliable accuracy by
Gaussian distribution (see Fig. 14). Thus, our result
gives additional confirmation on the inconsistency
of Poisson-like models. The counting of large val-
ues of inter-arrivals, Tint 150ms, and generation
of the time arrivals (using HISRAN) according to
one-dimensional histogram presented in Fig. 13 does
not change the shape of Fig. 14.
Thus, if we model the traffic measurements by
superposing the dynamical processes, related to the
inter-arrival times Tint and the packet sizes Ps,wedo
not obtain the desired result.
In order to improve our model, we prepared a spe-
cial data file based on real traffic measurements under
the following assumptions (Model 2):
1. The inter-arrival times are the same as in real traffic
data.
2. All packet sizes are set equal to 1.
These data were again aggregated with different
sizes of the aggregation window. The aggregation of
Fig. 15. Distribution of packet rates aggregated with 1s bin:
Model 2.
this data file gives the values of packet rates. Figs. 15
and 16 show the distributions of packet rates for the
aggregation windows 1 and 10s, respectively, and they
clearly follow the log-normal form (1).
Although Figs. 15 and 16 are qualitatively satisfac-
tory, they do not match precisely the real distributions
Fig. 16. Distribution of packet rates aggregated with 10s bin:
Model 2.
I. Antoniou et al./ Physica D 167 (2002) 72–85 81
Fig. 17. Distribution of packet sizes aggregated with 1s bin:
Model 3.
in Figs. 6 and 7. In order to improve further our model,
we generated a new data file with the following as-
sumptions (Model 3):
1. The inter-arrival times are the same as in real traffic
data.
2. The corresponding packet sizes follow the statisti-
cal distribution shown in Fig. 3.
These data were again aggregated with different
sizes of the aggregation window. Figs. 17 and 18 show
the packet size distributions, superimposed by the fit-
ting curve corresponding to the function (1), for the
aggregation windows 1 and 10s, respectively. One can
clearly see that these distributions are quite close to the
distributions of real traffic measurements presented in
Figs. 6 and 7.
The results of this section are summarized as fol-
lows:
The variables Tint and Pscan be considered as in-
dependent.
The dynamical process related to the inter-arrival
times plays the key role in the formation of the
observed packet sizes distribution.
The independence of the variables Tint and Psand
the form of the packet size distribution permits to
Fig. 18. Distribution of packet sizes aggregated with 10s bin:
Model 3.
consider the dynamical process corresponding to
traffic series as the superposition of independent
processes corresponding to the packets contributing
to the network traffic (see, for example, Fig. 3).
5. Self-similarity and power law of traffic
measurements
The log-normal distribution has been first observed,
to our knowledge, by Lucas et al. [4] for the empirical
probability distributions of packet arrivals aggregated
at 100ms. Later they developed the background traf-
fic model, or (M, P, S) model [22], which realistically
generated the aggregated traffic flows for a large cam-
pus network. The log-normal distributions for packet
arrivals have been observed at different stream scales
[22]. Similar inter-arrival time distributions for chan-
nel arrivals have been observed in cellular telephony
[23].
It has long been observed that in a large variety
of physical phenomena, where self-similar processes
take place, the logarithms of dynamical variables are
normally distributed. This holds for grain sizes in
crust fragmentation [24], for energy released in seis-
mic events [25,26], for the distribution of topographic
82 I. Antoniou et al./ Physica D 167 (2002) 72–85
contours, tree rings, leaves, rivers, see, for example,
[27].
The theoretical explanation of the appearance of the
log-normal distribution in nature was first given, to
our knowledge, by Kolmogorov in 1941 in a “small
paper” [1] not well known in the Western literature.
Kolmogorov proposed a general scheme of a random
process of the homogeneous fragmentation or subdivi-
sion of grains, for which the distribution of logarithms
of the grain sizes comes arbitrary close to the Gaus-
sian distribution together with the infinitely continued
fragmentation process.
A simplified explanation of Kolmogorov’s result
[25, p. 206] is the following. Suppose that we have
a big rock which crumbles into sand. If the environ-
mental stresses are the same whatever the size of the
rock, the probability that a given piece of rock is frag-
mented into nismaller rocks is independent of the
stage iof the fragmentation process. Therefore, if we
start out with a single rock (n0=1), in the next stage
we have n1smaller rocks, in the next stage each of
these smaller rocks is fragmented into n2still-smaller
rocks, and so on. As the niare independent random
variables, the number of grains at the kth stage of frag-
mentation must be
Nk=
k
i=1
ni=n1n2···nk,(4)
or
lnNk=
k
i=1
lnni.(5)
The grain sizes Skare inversely proportional to the
number of grains Nk. Applying a variant the central
limit theorem, Kolmogorov found that the logarithms
of the grain sizes are normally distributed [1], i.e. the
distribution of grain sizes is log-normal.
The basic feature of log-normality is the power law
or self-similarity. Let Xand Ybe two random variables.
Then if Xis log-normal and if
Y=aXd,(6)
Yis also log-normal. The parameter ais called
the scale factor and the exponent dis the fractal
dimension. Power laws such as (6) are known as
self-similarity relations. Conversely, if both Xand
Yare known to be log-normal, there must exist a
self-similarity relation, such as (6), between them.
Kolmogorov invoked this property to deduce that, if
the distribution of grain sizes of sand is log-normal,
so are the grain volumes and the fractions by weight
retained in sieves of different mesh size.
In [35] the wavelet transform has been applied to the
self-similar stochastic processes, which Kolmogorov
used in his theory of turbulence [28]. For such pro-
cesses, after suitable re-scaling, the wavelet transform
at a given position becomes a stationary random func-
tion of the logarithm of the scale argument in the
transform [35]. The re-scaling depends on the scaling
component.
Unfortunately, the approach of Vergassola and
Frisch [35] cannot be directly applied to network
traffic measurements, because they have significantly
more complex, multi-fractal structure [29–31].How-
ever, the wavelet transform, being a very powerful
technique for extracting specific information from
a given data [19,32,33], may provide additional in-
formation necessary for understanding the reason of
traffic log-normality.
5.1. Wavelet analysis of traffic measurements
The signal frepresenting the network traffic data is
examined by the wavelet transform with the help of
the so-called wavelet ψwith zero average:
−∞
ψ(t)dt=0.
The corresponding wavelet basis is
ψa,b(t) =1
aψtb
a,
where a[0,)and b(−∞,)are the scale
and translation parameters, respectively. The wavelet
transform W(a,b)offis (see, for example, [34]):
W(a,b) =1
a
−∞
f(t)ψtb
adt. (7)
The parameter bshifts the wavelet so that local infor-
mation about fat time t=bis contained in W(a,b).
I. Antoniou et al./ Physica D 167 (2002) 72–85 83
Fig. 19. Shade plot of the CWT coefficients for traffic measurements aggregated with 1s window.
Fig. 20. Shade plot of the CWT coefficients for model traffic series (Model 2) aggregated with 1s window.
84 I. Antoniou et al./ Physica D 167 (2002) 72–85
The scale parameter acontrols the size of the region
of influence, for a0 the wavelet transform ‘zooms’
in on t=b.
The wavelet transform permits to focus on localized
signal structures along with a zooming procedure that
progressively reduces the scale parameter. It has been
shown (see, for instance, [36]) that the local signal
regularity is characterized by the decay of the wavelet
transform amplitude across scales. Singularities and
edges are identified by following the wavelet transform
local maxima at fine scales.
All these features appear in complex signals like
multi-fractals. The wavelet transform takes advan-
tage of multi-fractal self-similarities, in order to
compute the distribution of the singularities of the
signals.
In order to reveal the self-similarity of traffic mea-
surements at different scales, we applied the contin-
uous wavelet transform (CWT) to traffic measure-
ments aggregated with 1s window. Fig. 19 shows
the shade plot of the CWT, based on the biorthogo-
nal spline wavelets, of the time series analyzed. The
self-similar, multi-fractal character of traffic measure-
ments is clearly shown in the tree-like fragmentation
structure. Similar behavior demonstrates the shade plot
corresponding to data generated in accordance with
Model 2 (see Fig. 20).
Figs. 19 and 20 clearly demonstrate the multiplica-
tive character of traffic measurements. This result is in
agreement with formula (4) and confirms the applica-
bility of the Kolmogorov’s scheme to the description
of network traffic.
6. Conclusion
Self-similarity, multi-fractal behavior and long-
range dependence have been observed in a large
number of Internet traffic measurements. Our work,
based on the detailed analysis of traffic measure-
ments obtained at the input of a medium size LAN,
demonstrates that the reason of these effects may be a
simple aggregation of real data. In fact we show that
the aggregation of traffic measurements forms (start-
ing from some threshold value of the aggregation
window, which provides necessary conditions for ful-
fillment of the central limit theorem requirements)
a statistical distribution, which does not change its
form with further increase of the aggregation window.
This distribution is approximated with high accuracy
by the log-normal distribution. We found that the two
stochastic variables contributing to network traffic,
namely the inter-arrival times Tint and packet sizes
Ps, can be considered as independent. In addition, the
form of the packet size distribution suggests that
the dynamical process underlying the traffic series is
the superposition of few independent processes cor-
responding to the packets contributing to the network
traffic. Applying the CWT to traffic measurements,
we show that the information traffic has multiplica-
tive character. This confirms the applicability of the
Kolmogorov’s scheme [1] to the description of net-
work traffic. This problem is under study, but it goes
out of scope of the current paper.
The above obtained results highlighted the statis-
tical character of the network traffic. However, the
question concerning the log-normality of the traffic
measurements is still open. The dynamical processes
underlying the inter-arrival time series need further
study.
Acknowledgements
We thank Yu. Kryukov for the preparation of traffic
data. We are grateful to Prof. I. Prigogine and Prof.
V.G. Kadyshevsky for encouragement and support.
This work has been done in the frame of the study
contract no. 16263-2000-06 F1SC ISP BE between
the International Solvay Institutes for Physics and
Chemistry (Brussels, Belgium) and the Joint Research
Center of the European Commission (Ispra, Italy).
References
[1] A.N. Kolmogorov, Über das logarithmisch normale
Verteilungsgesetz der Dimensionen der Teilchen bei
Zerstückelung, Dokl. Akad. Nauk SSSR 31 (1941) 99–101.
[2] W.E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On
the self-similar nature of ethernet traffic (extended version),
IEEE/ACM Trans. Network. 2 (1) (1994) 1–15.
I. Antoniou et al./ Physica D 167 (2002) 72–85 85
[3] W. Willinger, M.S. Taqqu, W.E. Leland, D.V. Wilson,
Self-similarity in high-speed packet traffic: analysis and
modeling of ethernet traffic measurements, Stat. Sci. 10 (1)
(1995) 67–85.
[4] M.T. Lucas, D.E. Wrege, B.J. Dempsey, A.C. Weaver,
Statistical characterization of wide-area self-similar network
traffic, University of Virginia Technical Report CS97-04, 9
October 1996.
[5] M.E. Crovella, A. Bestavros, Self-similarity in world web
traffic: evidence and possible causes, IEEE/ACM Trans.
Network. 5 (6) (1997) 835–846.
[6] V. Misra, W.-B. Gong, A hierarchical model for teletraffic,
Department of Electrical and Computer Engineering,
University of Massachusetts, Amherst, MA, 1998.
[7] J.M. Peha, Protocols can make traffic appear self-similar, in:
Proceedings of the 1997 IEEA/ACM/SCS Communication
Networks and Distributed Systems Modeling and Simulation
Conference, 1997.
[8] A. Erramilli, P. Pruthi, W. Willinger, Recent developments in
fractal traffic modelling, in: Proceedings of the International
Teletraffic Seminar, St. Petersburg, 26 June–2 July 1995.
[9] D.L. Jagerman, B. Melamed, W. Willinger, Stochastic
modeling of traffic processes, Technical Report, 1996.
[10] T. Tuan, K. Park, Multiple time scale congestion control
for self-similar network traffic, Network Systems Lab,
Department of Computer Sciences, Purdue University, West
Lafayette, IN, USA, Elsevier, Amsterdam, Preprint, submitted
for publication.
[11] S. Ata, M. Murata, H. Miyahara, Analysis of network traffic
and its application to design of high-speed routers, IEICE
Trans. Inform. Syst. E83-D (2000) 988–995.
[12] The State University, Dubna, http://www.uni-dubna.ru.
[13] P.V. Vasiliev, V.V. Ivanov, V.V. Korenkov, Yu.A. Kryukov,
S.I. Kuptsov, System for acquisition, analysis and control of
network traffic for the JINR local network segment: the Dubna
University example, JINR Communications, D11-2001-266,
JINR, Dubna, Russia, 2001.
[14] P. Akritas, P.G. Akishin, I. Antoniou, A.Yu. Bonushkina, I.
Drossinos, V.V. Ivanov, Yu.L. Kalinovsky, V.V. Korenkov, P.V.
Zrelov, Nonlinear analysis of network traffic, Chaos, Solitons
and Fractals 14 (2002) 595–606.
[15] W.T. Eadie, D. Dryard, F.E. James, M. Roos, B. Sadoulet,
Statistical Methods in Experimental Physics, North-Holland,
Amsterdam, 1971.
[16] F. James, M. Roos, MINUIT—function minimization and
error analysis, CERN Program Library D506, 1988.
[17] R. Brun, O. Couet, C. Vandoni, P. Zanarini, PAW—Physics
Analysis Workstation, CERN Program Library Q121, 1989.
[18] A. Feldmann, W. Whitt, Fitting mixtures of exponentials to
long-tail distributions to analyze network performance models
(AT&T laboratory research), in: Proceedings of the IEEE
INFOCOM’97, Kobe, Japan, April 1997.
[19] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery,
Numerical Recipes in C: The Art of Scientific Computing,
2nd Edition, Cambridge University Press, Cambridge, 1992.
[20] V. Paxson, S. Floyd, Wide-area traffic: the failure of Poisson
modeling, IEEE/ACM Trans. Network. 3 (3) (1995) 226–244.
[21] General Information, Program Library, CERN Computer
Centre, 1989.
[22] M.T. Lucas, B.J. Dempsey, D.E. Wrege, A.C. Waver,
(M, P, S)—an efficient background traffic model for
wide-area network simulation, Technical Report, Department
of Computer Science, University of Virginia, 1997.
[23] J.I. Sánchez, F. Barceló, J. Jordán, Inter-arrival time
distribution for channel arrivals in cellular telephony, in:
Proceedings of the Fifth International Workshop on Mobile
Multimedia Communication (MoMuc’98), 12–14 October
1998, Berlin.
[24] N.K. Razumovsky, On a distribution character of metals
contents in ore fields, Dokl. Akad. Nauk SSSR 28 (1940)
815–817 (in Russian).
[25] C. Lomnitz, Fundamentals of Earthquake Prediction, Wiley,
New York, 1994.
[26] V.I. Keilis-Borok, Symptoms of instability in a system of
earthquake-prone faults, Physica D 77 (1994) 193–199.
[27] J. Aitchison, J.A.C. Brown, The Lognormal Distribution,
Cambridge University Press, Cambridge, 1957, 176 pp.
[28] A.N. Kolmogorov, The local structure of the turbulence in
incompressible viscous fluid for very large Reynolds numbers,
Dokl. Akad. Nauk SSSR 30 (1941) 301.
[29] M.S. Taqqu, V. Teverovsky, W. Willinger, Is network traffic
self-similar or multifractal?, Fractal 5 (1997) 63–73.
[30] G. Taubes, Fractals reemerge in the new math of the Internet,
Science 281 (1998) 1947–1948.
[31] R.H. Riedi, M.S. Crouse, V.J. Riberio, R.G. Baraniuk, A
multifractal wavelet model with application to network traffic,
IEEE Trans. Inform. Theory 45 (1999) 6992–10018.
[32] C.K. Chui, An Introduction to Wavelets, Academic Press,
New York, 1992, pp. 1–18.
[33] I. Daubechies, Wavelets, SIAM, Philadelphia, 1992.
[34] A.K. Louis, P. Maab, A. Rieder, Wavelets: Theory and
Applications, Wiley, 1997.
[35] M. Vergassola, U. Frisch, Wavelet transforms of self-similar
processes, Physica D 54 (1991) 58–64.
[36] S. Mallat, A Wavelet Tour of Signal Processing, Academic
Press, New York, 1999.
... At first sight, data transmission in the computer networks has a random character; however, it obeys certain laws [1]. Studying these laws, understanding the reasons for their appearance, and developing methods for their application open up new possibilities for optimizing computer system operations, detecting data transmission errors, protecting computer networks from malicious impacts, etc. ...
... According to the analysis of the network traffic in the trunk channel [2] that we performed in [3] and [4], the probability density (PD) of the network traffic intensity (Fig. 1) and the PD of the transmission rate of network packets (Fig. 2) are approximated with a high accuracy by the following lognormal law [5]: (1) where is a variable, and are the distribution parameters, and is a normalizing factor. ...
... Gualandi and Toscani studied the connection between human behaviour and lognormal distribution also considering cases of modelling drivers in traffic (Gualandi and Toscani 2019). Furthermore, Antoniou et al. (2002) concluded that the aggregation of traffic measurements forms a statistical distribution, which is quite accurately described by a lognormal distribution. The goodness of fit test results based on the K-S method are reported in Appendix Part B. The parameters of the lognormal PDFs for all drivers are given for reference in Appendix Part C. ...
Article
Full-text available
Drivers’ heterogeneity and the broad range of vehicle characteristics on public roads are primarily responsible for the stochasticity observed in road traffic dynamics. Understanding the behavioural differences in drivers (human or automated systems) and reproducing observed behaviours in microsimulation has lately attracted significant attention. Calibration of car-following model parameters is the prevalent way to chracterize different driving behaviours. However, most car-following models do not realistically reproduce free-flow accelerations and therefore, model parameters are usually mainly the result of over-fitting with limited possibility to reproduce realistic drivers’ heterogeneity in simulation. To solve this problem, the present study proposes a novel framework to identify individual driver fingerprints based on their acceleration behaviours and reproduce them in microsimulation. The paper also discusses the unsuitability of vehicle acceleration to properly characterise alone the aggressiveness of a driver. A large experimental campaign and simulation results demonstrate the robustness of the proposed method.
... For ease of presentation, from here on, we drop the and when specifying H ( , ), L ( , ), L ( , ) and L ( , ). To visualize the effect of and on client consumption, consider 1000 users whose rates are drawn from a log-normal distribution [5] with parameters = 1 and = 0.25. Figure 1a depicts the distribution of user rates as a histogram, which also maps to desired consumption within the cycle. ...
Preprint
Full-text available
Internet providers often offer data plans that, for each user's monthly billing cycle, guarantee a fixed amount of data at high rates until a byte threshold is reached, at which point the user's data rate is throttled to a lower rate for the remainder of the cycle. In practice, the thresholds and rates of throttling can appear and may be somewhat arbitrary. In this paper, we evaluate the choice of threshold and rate as an optimization problem (regret minimization) and demonstrate that intuitive formulations of client regret, which preserve desirable fairness properties, lead to optimization problems that have tractably computable solutions. We begin by exploring the effectiveness of using thresholding mechanisms to modulate overall bandwidth consumption. Next, we separately consider the regret of heterogeneous users who are {\em streamers}, wishing to view content over a finite period of fixed rates, and users who are {\em file downloaders}, desiring a fixed amount of bandwidth per month at their highest obtainable rate. We extend our analysis to a game-theoretic setting where users can choose from a variety of plans that vary the cap on the unbounded-rate data, and demonstrate the convergence of the game. Our model provides a fresh perspective on a fair allocation of resources where the demand is higher than capacity, while focusing on the real-world phenomena of bandwidth throttling practiced by ISPs. We show how the solution to the optimization problem results in allocations that exhibit several desirable fairness properties among the users between whom the capacity must be partitioned.
... Gualandi and Toscani studied the connection between human behaviour and lognormal distribution also considering cases of modelling drivers in traffic (Gualandi and Toscani 2019). Furthermore, Antoniou et al. (2002) concluded that the aggregation of traffic measurements forms a statistical distribution, which is quite accurately described by a lognormal distribution. The goodness of fit test results based on the K-S method are reported in Appendix Part B. The parameters of the lognormal PDFs for all drivers are given for reference in Appendix Part C. ...
... Moreover, resource allocation strategies are usually based on point traffic estimates (i.e., expected rate, peak-rate) [7], [11], [13]- [16], that may lead to connection under-or overprovisioning (when the expected-rate and the peak-rate estimates are considered, respectively). This holds particularly under a highly asymmetrical traffic of probabilistic shape (e.g., Internet traffic characterized by skewness and heavy-tails [17]- [19]), where not only the divergence between the average to peak-rate demand is high (and continuously increasing) [20], but also the holding time of the peak-rate demand is low. Thus, the probability of traffic fluctuations exceeding the expectedrate may be considerably high, while the probability of traffic fluctuations approaching the peak-rate may be low (i.e., peakrate rarely occurs), rendering such estimates not appropriate for best fitting the traffic demand behavior. ...
... We also study the statistics of the return intervals between extreme events in figure 2(c). Recurrence interval has a broad range of applications in hazard estimations like the interarrival packet times on internet traffic [86], earthquakes [87], floods [88], and many more. We select the extreme events from each burst and collect the return interval between two consecutive such extreme events. ...
Article
Full-text available
Understanding and predicting uncertain things are the central themes of scientific evolution. Human beings revolve around these fears of uncertainties concerning various aspects like a global pandemic, health, finances, to name but a few. Dealing with this unavoidable part of life is far tougher due to the chaotic nature of these unpredictable activities. In the present article, we consider a global network of identical chaotic maps, which splits into two different clusters, despite the interaction between all nodes are uniform. The stability analysis of the spatially homogeneous chaotic solutions provides a critical coupling strength, before which we anticipate such partial synchronization. The distance between these two chaotic synchronized populations often deviates more than eight times of standard deviation from its long-term average. The probability density function of these highly deviated values fits well with the Generalized Extreme Value distribution. Meanwhile, the distribution of recurrence time intervals between extreme events resembles the Weibull distribution. The existing literature helps us to characterize such events as extreme events using the significant height. These extremely high fluctuations are less frequent in terms of their occurrence. We determine numerically a range of coupling strength for these extremely large but recurrent events. On-off intermittency is the responsible mechanism underlying the formation of such extreme events. Besides understanding the generation of such extreme events and their statistical signature, we furnish forecasting these events using the powerful deep learning algorithms of an artificial recurrent neural network. This Long Short-Term Memory (LSTM) can offer handy one-step forecasting of these chaotic intermittent bursts. We also ensure the robustness of this forecasting model with two hundred hidden cells in each LSTM layer.
Article
The stochastic behavior is one of the key for the current state of vehicles flow for the real time traffic behavior. This paper describe the study to investigate the stochastic behavior of real time traffic flow for a section of road using probability distribution fit over the section of road, the traffic data was collected for a week from 7:00 to 19:00 at the location Nawabshah Pakistan. The different distribution such as Normal, Lognormal, Weibull, Gamma, Exponential distribution was fit using MATLAB distribution fit by probability plot of traffic flow data. The same distribution was used for the goodness-of-fit tests by considering Kolmogorov-Smirnov, Kolmogorov-Smirnov modified, Anderson-Darling were used with p-values at 95% of confidence level and justification to accept the hypothesis test are accepted or rejects. The hypothesis accept for Normal, Weibull and Gamma distribution which accept the all hypothesis test and among these three accepted fit distribution the Normal probability distribution fit is most fitted distribution using the rank by p-value of the hypothesis tests. Keywords: Traffic flow, Goodness-of-fit, Probability Distributions, Nawabshah
Article
Full-text available
A rapid growth of the Internet and proliferation of new multimedia applications lead to demands of high speed and broadband network technologies. Routers are also necessary to follow up the growth of link bandwidths. From this reason, there have been many researches on high speed routers having switching capabilities. To have an expected effect, however, a control parameters set based on traffic characteristics are necessary. In this paper, we analyze the network traffic using the network traffic monitor and investigate the Internet traffic characteristics through a statistical analysis. We next show the application of our analytical results to parameter settings of high speed switching routers. Simulation results show that our approach makes highly utilized VC space and high performance in packet processing delay. We also show the effect of flow aggregation on MPLS. From our results, the flow aggregation has a great impact on the performance of MPLS.
Article
Full-text available
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
§1. We shall denote by u α ( P ) = u α ( x 1 , x 2 , x 3 , t ), α = 1, 2, 3, the components of velocity at the moment t at the point with rectangular cartesian coordinates x 1 , x 2 , x 3 . In considering the turbulence it is natural to assume the components of the velocity u α ( P ) at every point P = ( x 1 , x 2 , x 3 , t ) of the considered domain G of the four-dimensional space ( x 1 , x 2 , x 3 , t ) are random variables in the sense of the theory of probabilities (cf. for this approach to the problem Millionshtchikov (1939) Denoting by Ᾱ the mathematical expectation of the random variable A we suppose that ῡ ² α and (d u α /d x β ) ² ― are finite and bounded in every bounded subdomain of the domain G .