Conference PaperPDF Available

Abstract and Figures

In this paper, a real case study on a Goal Line Monitoring system is presented. The core of the paper is a refined ball detection algorithm that analyzes candidate ball regions to detect the ball. A decision making approach, by means of camera calibration, decides about the goal event occurrence. Differently from other similar approaches, the proposed one provides, as unquestionable proof, the image sequence that records the goal event under consideration. Moreover, it is non-invasive: it does not require any change in the typical football devices (ball, goal posts, and so on). Extensive experiments were performed on both real matches acquired during the Italian Serie A championship, and specific evaluation tests by means of an artificial impact wall and a shooting machine for shot simulation. The encouraging experimental results confirmed that the system could help humans in ambiguous goal line event detection.
Content may be subject to copyright.
Non-Invasive Soccer Goal Line Technology: A Real Case Study
Paolo Spagnolo, Marco Leo, Pier Luigi Mazzeo, Massimiliano Nitti, Ettore Stella, Arcangelo Distante
National Research Council of Italy
spagnolo@ba.issia.cnr.it
Abstract
In this paper, a real case study on a Goal Line Moni-
toring system is presented. The core of the paper is a re-
fined ball detection algorithm that analyzes candidate ball
regions to detect the ball. A decision making approach, by
means of camera calibration, decides about the goal event
occurrence. Differently from other similar approaches, the
proposed one provides, as unquestionable proof, the image
sequence that records the goal event under consideration.
Moreover, it is non-invasive: it does not require any change
in the typical football devices (ball, goal posts, and so on).
Extensive experiments were performed on both real matches
acquired during the Italian Serie A championship, and spe-
cific evaluation tests by means of an artificial impact wall
and a shooting machine for shot simulation. The encour-
aging experimental results confirmed that the system could
help humans in ambiguous goal line event detection.
1. Introduction
Soccer is the world’s most popular sport and an enor-
mous business, and every match is currently refereed by a
single person who ”has full authority to enforce the Laws of
the Game”. So, controversies are inevitable, and the most
glaring of them are usually about referee calls for which
no interpretation is required and concern about whether the
ball has completely crossed goal line or not. Recently, fa-
mous ’bad calls’ happened during the Euro 2012 (Ukraine
scored a goal against England that clearly went over the line
but was disallowed by referee, see fig. 1) and World Cup
2010 (England scored a goal against Germany that was dis-
allowed by referee) Competitions.
In cases like these, the referee’s call is influenced by,
among other things, three ineluctable factors:
the referee’s position on the field: he is not aligned
with the goal line and then a parallax error affects his
decision;
the high speed of the ball that can reach up to 120km/h.
It is impossible for human visual and cognitive systems
Figure 1. During the 2012 Euro Competition, England’s defender
John Terry lunges for the ball, which appears to be over the line
(as well as for standard broadcast images, at 25fps) to
estimate the position of such a moving object continu-
ously.
the considerable distance (about 35-40 m.) between
the linemen and the goal post: this makes it very hard
to evaluate goal events with a resolution of about 1-2
cm.
The only way to definitively avoid these kinds of contro-
versies is to introduce a ”goal line technology”, i.e an au-
tomatic system to assist the referee in decisions concerning
goal events.
For this purpose different technologies have been pro-
posed. The earliest ones were based on instant replay: in
case of a controversial call about a goal event the referee (or
an assistant) could stop the game and watch the images (ac-
quired from broadcast or dedicated cameras). This would
slow down the game taking away possible plays and an-
noying the audience. Thus attention has recently turned to
technologies able to decide autonomously whether or not
the ball has crossed the goal line. One of the most promis-
ing approaches uses a magnetic field to track a ball with
a sensor suspended inside [3]. Thin cables with electrical
current running through them are buried in the penalty box
and behind the goal line to make a grid. The sensor in the
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops
978-0-7695-4990-3/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPRW.2013.147
990
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops
978-0-7695-4990-3/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPRW.2013.147
998
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops
978-0-7695-4990-3/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPRW.2013.147
1005
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops
978-0-7695-4990-3/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPRW.2013.147
1011
ball measures the magnetic grids and relays the data to a
computer which determines if the ball has crossed the line
or not. However, this kind of technology cannot provide
unquestionable proof of detected events; and requires sub-
stantial modifications to the infrastructure of the stadium
and game component (ball, playing-field, goalposts,...).
For these reasons, the efforts of several companies and
research institutes are currently focused on the development
of non-invasive goal line technologies. In particular, vision-
based systems appear to be very promising considering their
capability to provide a posteriori verification of the system’s
operations [1, 2].
The main issue of an automatic system is the detection of
the ball; it is very difficult when images are taken from fixed
or broadcast cameras with a wide camera view since the
ball is represented by a small number of pixels and more-
over it can have different scales, textures and colors. For
this reason, most ball detection approaches are based on an
evaluation of the ball trajectory. The underlying idea is that
the analysis of kinematic parameters can point out the ball
among a set of ball candidates [13, 15, 11, 10].
However, trajectory based approaches are generally off-
line since the evaluation of the kinematic parameters for
all ball candidates requires a long period of observation;
so they are not suitable to be used in a real time goal line
monitoring system.
In recent years, few research groups have started work-
ing on visual frameworks with the aim of recognizing real
time events. These systems have also to address problems
associated with the time spent on image acquisition, trans-
mission and processing (often the frame rate is even higher
than for standard TV cameras). Furthermore, the ability to
work autonomously for several hours and in all environmen-
tal conditions are additional characteristics required in this
kind of systems. In [5] the authors present a real time vi-
sual system for goal detection which can be used as decision
support by the referee committee. A system for automatic
judgment of offside events is presented in [7]. The authors
propose the use of 16 cameras located along both sides of
the soccer field to cover the whole area. The integration of
results from multiple cameras is used for offside judgment.
Six fixed cameras were used in [4] to cover the whole field
and to acquire image sequences from both sides of the sta-
dium. Player and ball tracking processes run parallel on the
six image sequences and extract the player and ball posi-
tions in real time.
However, the ball detection approaches proposed in
these works are developed to perform mainly in single im-
age; they don’t use temporal consistency to reinforce the
detection, and also integration between different views is
quite superficial. In our work we integrate all information
to realize a system able to work consistently for long time
periods.
In this paper, a visual system able to detect the goal event
through real time processing of the acquired images and
immediately provide the image sequence that records the
goal event under consideration is presented. The system
has been implemented at the Friuli Stadium in Udine. It
has been tested both during real matches of the Italian Serie
A championship, and specific simulation sessions: in this
case, the ball was shot by a shooting machine in different
contexts, as explained in detail in the experimental results
section, in order to validate the system in terms of both spa-
tial and temporal accuracy.
2. Overview of the System
In figure 2(a) the visual system is outlined. Six cameras
are placed on the stands of the stadium. For each side of
the pitch, two of the three cameras have their optical axes
parallel to the goal frame, the remaining one is placed be-
hind the goal with its optical axis perpendicular to the goal
frame. Each camera is connected to a processor (node) that
records and analyzes the acquired images. In figure 2(b) a
schematic diagram of the processing steps executed by each
node is shown.
(a)
(b)
Figure 2. The scheme of the visual system (2(a)), and a schematic
diagram of the processing steps executed by each Node (2(b))
The six processors are connected to a main node, which
has the supervisor function. The supervisor node has a de-
cision making function by combining the processing results
coming from the cameras. The strategy is based on some
heuristics that perform data fusion evaluating the time space
coherence of the ball’s 3D trajectory. The processing results
of the three corresponding nodes are compared and a goal
99199910061012
event probability function is evaluated.
3. Preliminary Steps: Calibration and Moving
Object Segmentation
First of all, it is necessary to do a calibration step for
each node in which the correspondences between the im-
age plane and a plane in the 3D world are assessed. This
step is fundamental in determining the 3D position of the
ball. In other words, the homography transformation ma-
trix is estimated by using Random Sample Consensus for
each node [6] in this step. Each homography transforma-
tion matrix Mirelates the points on the image plane to the
corresponding points on the 3D plane. The only constraint
to be considered when choosing the planes is that they must
not be perpendicular to the image plane of the associated
camera. This calibration needs to be done only once, after
camera installation, and if the cameras remain in place these
measures are still valid for any subsequent matches. For
the experiments reported in this paper, the calibration phase
was carried out using non-deformable steel structures: each
structure defines a plane in the 3D world and specific mark-
ers were used for the identification of the control points.
The segmentation of the image to extract moving ob-
jects is the first processing step executed by each node. It is
fundamental as it limits the ball detection to moving areas
and reduces computational time. For this purpose a back-
ground subtraction-based segmentation algorithm was im-
plemented. Firstly, a background model has to be generated
and then continuously updated to include lighting variations
in the model. The implemented algorithm uses the mean
and standard deviation to provide a statistical model of the
background. Detection is then performed by comparing the
pixel current intensity value with its statistical parameters,
as explained in several works on this topic (a good review
can be found in [12]). Details about the implemented ap-
proach can be found in [14] Finally, after the detection of
moving points, a connected components analysis detects the
blobs in the image by grouping neighboring pixels. After
this step, regions with an area less than a given threshold
are considered as noise and removed, whilst remaining re-
gions are evaluated in the following steps.
4. Ball Detection and Tracking
An automatic method that detects ball position in each
image is the central step to building the vision system. In the
soccer world, a great number of problems have to be man-
aged, including occlusions, shadowing, mis-detection (the
incorrect detection of objects similar to the ball), and last
but not least, real time processing constraints. The ball de-
tection method has to be very simple, fast and effective as a
great number of images per second must be processed. This
kind of problem can be addressed by considering two dif-
ferent detection systems: geometric approaches that can be
applied to match a model of the object of interest to differ-
ent parts of the image in order to find the best fit; or example
based techniques that can be applied to learn the salient fea-
tures of a class of objects from sets of positive and negative
examples.
This method uses two different techniques together in or-
der to take advantage of their peculiarities: first of all, a fast
circle detection (and/or circle portion detection) algorithm,
based only on edge information, is applied to the whole im-
age to limit the image area to the best candidate containing
the ball; second, an appearance based distance measure is
used to validate ball hypothesis.
The Circle Hough Transform (CHT) aims to find circu-
lar patterns of a given radius R within an image. Each edge
point contributes a circle of radius R to an output accumu-
lator space. The peak in the output accumulator space is
detected where these contributed circles overlap at the cen-
ter of the original circle. In order to reduce the computa-
tional burden and the number of false positives typical of
the CHT, a number of modifications have been widely im-
plemented in the last decade. The use of edge orientation
information limits the possible positions of the center for
each edge point. This way only an arc perpendicular to the
edge orientation at a distance R from the edge point needs
to be plotted. The CHT, as well as its modifications, can be
formulated as convolutions applied to an edge magnitude
image (after suitable edge detection). We have defined a
circle detection operator that is applied over all the image
pixels, which produces a maximal value when a circle is
detected with a radius in the range [Rmin,Rmax]:
u(x, y)= 
D(x,y)e(α, β)·
O(αx, β y)dαdβ
2π(Rmax Rmim)(1)
where the domain D(x,y) is defined as:
D(x, y)={(α, β)∈
2|R2
min (αx)2+(βy)2R2
max}
(2)
e is the normalized gradient vector:
e(x, y)=[
Ex(x, y)
|E|,Ey(x, y)
|E|]T(3)
and
Ois the kernel vector
O(x, y)=[
cos(tan1(y/x))
x2+y2,sin(tan1(y/x))
x2+y2]T(4)
The use of the normalized gradient vector in (1) is nec-
essary in order to have an operator whose results are inde-
pendent from the intensity of the gradient in each point: we
want to be sure that the circle detected in the image is the
992100010071013
most complete in terms of contours and not the most con-
trasted in the image. Indeed, it is possible that a circle that is
not well contrasted in the image gives a convolution result
lower than another object that is not exactly circular but has
a greater gradient. The kernel vector contains a normaliza-
tion factor (the division by the distance of each point from
the center of the kernel) which is fundamental to ensuring
that we have the same values in the accumulation space
when circles with different radii in the admissible range are
found. Moreover, normalization ensures that the peak in the
convolution result is obtained for the most complete circle
and not for the greatest in the annulus. As a final considera-
tion, in equation (1) the division by (2Π ·(Rmax Rmin))
guarantees the final result of our operator in the range [-1,1]
regardless of the radius value considered in the procedure.
The masks implementing the kernel vector have a dimen-
sion of (2 ·Rmax + 1)(2 ·Rmax +1)and they represent the
direction of the radial vector scaled by the distance from the
center in each point. The convolution between the gradient
versor images and these masks evaluates how many points
in the image have a gradient direction concordant with the
gradient direction of a range of circles. Then the peak in
the accumulator array provides the center of the sub-image
with higher circularity that is finally passed to the valida-
tion step. Examples of sub-images given as input to the ball
recognition process are shown in figure 3.
Figure 3. Some images of the training set: in the first row there
are some negative examples of the ball, in the second row some
positive examples
The validation step assesses the similarity of appearance
between the candidate region and a set of positive examples
stored previously. The similarity is evaluated by comput-
ing the Bhattacharyya distance among histograms (reduced
to 64 bins): this measure is not computationally time con-
suming but at the same time it is sufficiently robust as it
is invariant to the rotation of the target (textured balls are
considered) and also to slight changes in scale. One of the
strengths of the proposed system is that the construction
and updating of the set of reference examples for validation
takes place automatically.
Initially, the set of reference examples is empty and all
the moving objects with the highest value of circularity
(greater than a weak threshold) and with an area compatible
with that of the ball are taken into account. Their displace-
ment on the image plane frame after frame is then evaluated
in order to estimate some motion parameters e.g. direction
and velocity. The candidate regions are then included in the
reference set if the associated motion parameters are com-
patible with those that only a ball can have in case of a shot
on goal (direction towards the goal and plausible number
of pixel displacement between frames). At the same time
the relative distance into the image plane between the can-
didate regions and the other moving object in the scene is
evaluated: if the relative distance is low and almost consis-
tent, the ball candidate is discarded since it has likely been
produced by incorrect segmentation of players’ bodies.
The same criteria are used to add new examples in the
reference set, additionally considering the value of the mea-
surement of similarity with the pre-existing examples. The
maximum number of examples in the reference set is 100
and it is managed as a circular buffer.
The reference set is re-initialized when circular objects
with peculiar motion parameters and low similarity (i.e.
higher distances in the space of histograms) to the exam-
ples in the reference set appear in a number of consecutive
frames. This way the introduction of a new type of ball or
sudden and substantial changes in lighting conditions (due
to clouds or floodlights) can be handled automatically.
The ball has to be detected in more consecutive images
in order to be sure that a true positive has been found. In
this case, a different and more reliable procedure for select-
ing candidate moving regions is used (tracking phase). A
ball position probability map, covering all the points of the
processing image, is created as follows:
P(x, y)= e(
|(x−|x+Vxsign(cos θ)|)+(y−| y+Vysig n(sin θ)|)|2
2σ2)
σ2π(5)
where (x, y)is the last known ball position and
σ=RpVmaxn
RcmT(6)
where Vand θare the local velocity and the direction
of the ball in the image plane respectively, Rpis the Ball
radius in pixels, Rcm is the Ball radius in centimeters and
Vmax is the maximum admissible velocity of the ball (in
cm/sec), Tis the camera frame rate and nis the number
of frames between the past and actual ball detection (1 if,
in this case, the two frames are consecutive). This way the
maximum probability value is related to the point where,
on the basis of past information about the ball’s movement,
the ball should be found (predicted point). The probability
value decreases exponentially as the distance from the pre-
dicted point becomes close to 0 for points far from the last
known ball position that cannot be reached considering the
maximum speed limits (usually 120 km/h). In the follow-
ing frames, the probability map is used to select candidate
993100110081014
moving regions (like those with a probability greater than
zero). This way, the ball can be detected both in case of
merging with players and in case of partial occlusions. The
ball velocity, direction and probability map are always up-
dated using the proper value for n (i.e. the number of frames
between the actual frame and the last ball detection). If the
ball is not detected for three consecutive seconds (i.e. nbe-
comes greater than T*3) the past information is considered
outdated and the ball detection procedure starts again con-
sidering all the candidate ball regions in the whole image.
5. Supervisor node
The supervisor node has a decision-making function ac-
cording to the processing results coming from the nodes.
For each frame the processing units send several items of
information to the supervisor, including the last frame num-
ber processed, the position of the ball (if detected), and the
number of consecutive frames in which the ball has been
correctly tracked. It should be noted that even if the images
are synchronized in the acquisition process, the processing
results are not necessarily synchronized, since each node
works independently from the others. Moreover, a node
may jump some frames having accumulated a significant
delay during the processing. When the supervisor receives
the results of three nodes for a given frame, or when it de-
tects that synchronized data obtained cannot be retrieved for
a given frame, the supervisor processes the obtained infor-
mation to evaluate the occurrence of a goal event. This is
done by evaluating the goal line crossing in the available
2D images. However, this way it is not possible to evaluate
if the ball crossed the goal line inside the goal posts or not.
For this reason, the 3D ball position and its trajectory before
crossing the goal line are evaluated. This requires a calibra-
tion procedure (described in section ??), and an accurate
evaluation of the 3D position of the ball.
If the ball position is evaluated in the image plane, it is
possible to estimate the corresponding projection line. The
intersection of the three projection lines provides the esti-
mate of the ball position in the real world coordinate sys-
tem as shown in figure 4. In practice, this process entails
uncertainty, so corresponding lines of sight may not meet in
the scene. Furthermore, it is likely that in certain moments
it is not possible to see the ball by one or more cameras
because of occlusions, for example created by the players,
the goalkeeper or the goalposts. For these reasons a special
procedure for estimating the 3D position of the ball was in-
troduced. If the ball is visible in only 2 cameras the 3D dis-
tance between the two projection lines is firstly computed.
Then, if this distance is smaller than a selected threshold
(typically about the size of the diameter of the ball, ie 22
cm.) the two projection lines are considered as referring to
the same object (dealing with possible errors of the detec-
tion algorithms of the ball which are described in section 4)
and then the mid-point of the common perpendicular to the
two projection lines is chosen as an estimate of the 3D po-
sition of the ball. If the ball is visible in all three cameras,
the mutual 3D distance among the three projection lines is
calculated. The two projection lines with shorter distance
are then considered the most reliable and this leads the cal-
culation to the previous case. We are aware that different
approaches have been proposed in literature to handle the
3D position estimation issue by triangulation ([9], [8]), but
we have not considered using them because of the difficul-
ties of their implementation and their high computational
costs that make them unsuitable for a real-time system.
Finally, if the ball is only in a single camera, its 3D posi-
tion can be estimated if some previous temporally close 3D
positions are available. In this case, a linear filter is used to
predict the next 3D position and then to estimate the projec-
tion lines of the missing views.
Figure 4. The intersection of the three projection lines produces
the estimated ball position.
6. Experimental Results
A prototypal system was installed at the Friuli Stadium
in Udine. The system uses Mikrotron MC1362 cameras
with a spatial resolution of 1024x768 pixels at 504 fps and
Xeon QuadCore E5606 2,13 Ghz as the processing node.
Each node is equipped with a X64-CL iPro PCI frame grab-
ber capable of acquiring images from one Medium Camera
LinkTM camera and performing image transfers at rates of
up to 528 MB/s. The system was extensively tested dur-
ing real ”Serie A” championship matches and a specific ex-
perimental session (making use of the impact wall, slide,
ball shooting machine, etc.) was conceived to evaluate goal
event detection accuracy. Thus, both the system’s reliability
in the real context of a soccer match and its performance in
very stressful sessions of shots executed using a ball shoot-
ing machine (which also allows repeatable results to be ob-
tained) were tested.
An observation about experimental tests is mandatory:
a comparison with other approaches is unfeasible, due to
the complexity of the whole system. It could be interesting
a comparison with commercial/industrial systems ([1], [2],
[3]), but companies do not release technical information and
data such as to make possible such comparisons.
994100210091015
6.1. Benchmark Results
Here we focus our attention on benchmark sessions per-
formed outside of the match. In detail, four different exper-
iments were carried out:
Impact Wall shots (fig. 5(a) - 5(b)): during this test, an
artificial wall was placed behind the goal line in two
different positions: in the first, the ball position at the
moment of impact is ”No Goal”, while in the second
it is ”Goal” (ground truth). The ball was shot by a
shooting machine at different speeds. This way the ac-
curacy of the system to detect the goal event in terms
of both spatial resolution and mostly temporal resolu-
tion (at high ball speed, even at 200 fps, there are just
1-2 frames to correctly detect the goal event) can be
tested.
Sled (fig. 5(c)): in this test, the ball is positioned on
a mobile sled, and slowly moved from a non-goal to
a goal position. The system’s precision to detect the
exact ball position (in terms of cm over the goal line)
was tested.
Free Shots: during these experiments, several shots
were performed by means of the shooting machine,
in different positions with respect to the goal: left,
right, middle, just under the horizontal crossbar, and
just over the ground; each of them at different ball
speeds. We tested whether the system fails to detect
a specific portion of the goal area. Moreover, the reli-
ability of the 3D reconstruction procedure (to separate
shots inside and outside the goal posts) was tested.
Outer Net (fig. 5(d)): in this session, specific shots on
the outer net were performed with the final position of
the ball inside the volumetric area of the goal, but arriv-
ing from outside the goal posts. We tested the system’s
capability of tracking the ball accurately (even in 3D),
by separating balls that arrived in the goal volumet-
ric area from a ’goal trajectory’ (inside the goal posts)
from balls that arrived from an external trajectory.
In order to show the weak impact of light conditions,
some tests were also performed at night, in the presence
of artificial stadium light. In table 1 the final results are
reported. The Impact Wall sessions gave good results, with
an overall success rate of more than 92%. The experiments
were carried out by shooting the ball at different speeds,
impacting the wall in different positions.
All shots in the Outer Net session were correctly de-
tected; for the Sled session, we reported the mean distance
over the goal line detected by the system, while a more de-
tailed analysis of this data, to emphasize that the system
mostly detected the event within 2 cm, is reported in table
2.
(a) Impact
Wall Position
for No Goal
Simulation
(b) Impact
Wall Position
for Goal
Simulation
(c) Sled (d) Outer Net
(note that the
ball is over the
net)
Figure 5. Example of different tests
The Free Shots session realized different results accord-
ing to test lighting conditions: in the daylight test a suc-
cess rate of over 98% was obtained. On the contrary, in the
night test, an 88.54% success rate was obtained, in the same
test. This was due to the different performances of the al-
gorithms. First of all, the background subtraction together
with computational aspects: if the segmentation algorithm,
due to artificial light flickering, detects a number of moving
points greater than reality, the following algorithms have to
process more points causing a growing computational load,
which leads to problems with memory buffers, and subse-
quently some frames are discarded (it should be noted that
all our experiments were performed in realtime). To con-
firm this, this test session was off-line processed again, ob-
taining results comparable to those in daylight. It can be
concluded that this drawback can easily be overcome, by
simply optimizing the code and/or improving hardware with
more performing components. The same observations are
valid for the night session of the Impact Wall test.
Figure 6 reports images acquired during the experimen-
tal phase and corresponding to two goal events; the first row
refers to a goal event during the Free Shot session, while in
the second row images from the Impact Wall session are
shown (just 2 cameras are reported for this experiment, the
third one is evidently occluded and cannot help in any way).
6.2. Real match results
In order to evaluate the system’s robustness in real
uncontrolled conditions, test were conducted during real
matches of the Italian Serie A Championship; in particular,
995100310101016
Table 1. Overall performance of the GLT system during extensive
experimental sessions.
Test Results
Impact Wall - Daylight 175/186 - 94.09%
Outer Net - Daylight 70/70 - 100%
Free Shots - Daylight 165/168 - 98.21%
Impact Wall - Night 84/93 - 90.32%
Free Shots - Night 85/96 - 88.54%
Sled - Daylight average of 3.8 cm
Table 2. Sensibility evaluation of the system in the sled test.
Distance Results
0-2cm 17/32 - 53.125%
2-3cm 8/32 - 25.00%
3-5cm 4/32 - 12.50%
>5cm 3/32 - 9.375%
(a) camera 1 (b) camera 2 (c) camera 3
(d) camera 1 (e) camera 2
Figure 6. Some images acquired during the experimental phase
the system was tested during 19 matches played at the Friuli
Stadium in Udine (Italy). Table 3 reports the goal detection
results. In a real context, the important events (goals) are
limited in number so the benchmark sessions reported in
the previous section are mandatory in order to test the sys-
tem exhaustively. On the other hand, in a benchmark ses-
sion, it is really hard to introduce and simulate all possible
events that could happen during a real match: the presence
of players in the scene that could alter the detection of the
ball (some body parts, like legs and shoulders, could be er-
roneously detected as the ball); the presence of logos and/or
particular combinations of colors on the players’ uniforms
that can influence the ball detection procedure; the possibil-
ity that players randomly cross the goal line (goalkeepers
often do); the presence of objects in the goal area (towels,
bottles, etc.) that could lead to misclassifications.
As it can be noted, during the 19 matches there were 33
goal events that were correctly detected (no misdetections)
and just 1 false positive occurrence.
In figure 7, one of the goal events correctly detected
(even if the ball was occluded by one camera and the ball
appearance is not very different from the player’s jersey) is
shown. During this experimental phase, in addition to goal
events, a very controversial situation occurred: the goal-
keeper saved a shot when the ball was on the goal line (see
fig. 8). The system evaluated that situation as No-goal and
the image clearly evidences that it was right.
A false positive also occurred during a complex defen-
sive action: four defenders were close to the goal line, try-
ing to protect the goal and one of them kicked the ball away
clearly before it crossed the line (see fig. 9). Afterwards, a
defender crossed the goal line and, unfortunately, the sys-
tem recognized the pattern of the ball on his shorts (whose
position was also consistent with the trajectory predicted by
the ball tracking procedure). Cases like this (although rare)
could happen again, and certainly need further investiga-
tion. Considering that the system processed a huge amount
of data, i.e a total of over 1.7K minutes of play, which cor-
respond to about 20M of images, the percentage of errors
can be considered acceptable.
Finally, something about computational load: a speedy
response is mandatory for the system to actually be used.
For this reason we evaluated the delay in response for each
test. In fig. 10 a summary of the response time is reported.
As evidenced, considering the realistic threshold of 2 sec-
onds for the system’s response, it can be noted that in about
80% of the total number of experiments the response time
was acceptable. Considering that algorithms can be further
improved and optimized, it can be concluded that the real-
time constraint can easily be achieved.
Table 3. Performance in real conditions.
Goal True False False
Events Positives Negatives Positives
33 Goals 33 0 1
7. Acknowledgments
The authors thank Liborio Capozzo and Arturo Argen-
tieri for technical support in the setup of the hardware used
for data acquisition and processing.
References
[1] http://en.wikipedia.org/wiki/Hawk-Eye.
[2] http://goalcontrol.de/gc4d.html.
[3] www.iis.fraunhofer.de/en/bf/ln/referenzprojekte/goalref.html.
996100410111017
Figure 7. One of the 33 goal events occurred during the test on real
matches.
Figure 8. A controversial situation occurred during a real match
and rightly evaluated by the system as no-goal
Figure 9. A situation in which the system erroneously detected a
goal
[4] T. D’Orazio, M. Leo, P. Spagnolo, P. Mazzeo, M. Nitti,
N. Mosca, and A. Distante. An investigation into the fea-
sibility of real-time soccer offside detection from a multiple
camera system. IEEE Transaction on Circuits and Systems
for Video Technology, 19(12):1804 1817, 2009.
[5] T. D’Orazio, M. Leo, P. Spagnolo, M. Nitti, N. Mosca, and
A. Distante. A visual system for real time detection of goal
events during soccer matches. Computer Vision and Image
Understanding, 113(5):622 632, 2009.
Figure 10. Plot of the response time
[6] A. M. Fischler and R. C. Bolles. Random sample consensus:
a paradigm for model fitting with applications to image anal-
ysis and automated cartography. Commun. ACM, 24(6):381–
395, 1981.
[7] S. Hashimoto and S. Ozawa. A system for automatic judg-
ment of offsides in soccer games. In IEEE International
Conference on Multimedia and Expo, pages 1889–1892,
2006.
[8] K. Kanatani, Y. Sugaya, and H. Niitsuma. Triangulation
from two views revisited: Hartley-sturm vs. optimal correc-
tion. In BMVC, 2008.
[9] P. Lindstrom. Triangulation made easy. In Computer Vision
and Pattern Recognition (CVPR), 2010 IEEE Conference on,
pages 1554 –1561, june 2010.
[10] Y. Liu, D. Liang, Q. Huang, and W. Gao. Extracting 3d infor-
mation from broadcast soccervideo. Image and Vision Com-
puting, 24(10):1146–1162, 2006.
[11] V. Pallavi, J. Mukherjee, A. Majumdar, and S. Sural. Ball
detection from broadcast soccervideos using static and dy-
namic features. Journal Visual Communication and Image
Representation, 19(7):426–436, 2008.
[12] M. Piccardi. Background subtraction techniques: a review.
In IEEE SMC 2004 International Conference on Systems,
Man and Cybernetics, 2004.
[13] J. Ren, J. Orwell, G. Jones, and M. Xu. Real-time modeling
of 3-d soccer ball trajectories from multiple fixed cameras.
IEEE Transaction on Circuits and Systems for Video Tech-
nology, 18(3):350–362, 2008.
[14] P. Spagnolo, N. Mosca, M. Nitti, and A. Distante. An unsu-
pervised approach for segmentation and clustering of soccer
players. In IMVIP Conference, pages 133–142, 2007.
[15] X. Yu, H. Leong, C. Xu, and Q. Tian. Trajectory-based
ball detection and tracking in broadcast soccervideo. IEEE
Transaction on Multimedia, 8(6):1164–1178, 2006.
997100510121018
... Researchers have worked on capturing and analyzing soccer events for a long time, for various applications ranging from the verification of goals through goal-line technology [42], to player and ball tracking for segmentation and analysis [43][44][45], and the automatic control of camera movements for following the gameplay [46]. Events have also been extracted offline using metadata from news and media [47]. ...
Article
Full-text available
Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events.
... However, it is extremely expensive for amateur customers. Another well-known ball tracking application is the goal-line technology [26] for soccer, which can assist the referee to determine whether the ball has completely crossed the goal line or not. The system is very similar to the above Hawk-Eye system because both of them use multiple high-end cameras for ball tracking and require strict camera calibration. ...
Article
Full-text available
Accurate ball tracking in sports is vital for automatic sports analysis yet it is challenging mainly due to the small size and occlusions. This study proposes a novel multi-camera 3D ball tracking (MBT) framework for sports video. The proposed framework consists of four parts: 2D ball detection, 2D ball tracking, 3D position fusion, and 3D ball tracking. In 2D aspect, the multi-scale features are introduced to enhance the 2D ball detection, and the 2D ball tracking is also improved by exploring cross-view information to handle the occlusion and timely updating tracking model with detection results to alleviate the problem of tracking drift. For 3D ball, a novel 3D position fusion method is proposed to optimise the ball position and the 3D ball tracking approach with improved Kalman filter is finally applied to ensure a smooth 3D ball trajectory. Moreover, compared to the existing products in commercial, the proposed framework does not require any special equipment and is thus low cost. Extensive experiments for 2D and 3D ball on a public dataset demonstrate that the proposed framework is robust to ball tracking in sports video, even in the presence of environmental interferences, substantial occlusions, and even calibration errors.
... Visual object tracking is an important research topic and many algorithms have been proposed for it [3,4]. It is especially important for sports video analysis because the positions of players and specific objects (e.g., balls) are the focus of much interest from audiences [9][10][11][12][13]. Visual object tracking is basically performed by matching the representation of a target model built from the previous frame. ...
Article
Full-text available
We developed a system called Sword Tracer that visualizes sword trajectories in fencing matches. Sword Tracer tracks the tips of the swords in the image coordinates and visualizes their movements with computer graphics (CGs). It measures each sword’s position in the infrared (IR) image by detecting IR light reflected from retroreflective tape placed on the tip of the sword. It uses only a single camera and a single marker at the tip, so the system is compact enough to be used in official fencing matches. It accurately detects the tips of the swords by using supervised machine learning and tracks them by predicting their positions in the next frame. The trajectory CGs of the sword tips can be composited on the broadcast image in real-time. Sword Tracer was first used for a broadcast at the All Japan Fencing Championships in December 2017 and has since been used for four other broadcast programs and five exhibition events from 2018 to 2020. TV viewers and guests at the events approved of this new video effect because it helped them to follow the fast-moving swords and gain a better understanding of the swordplay.
... In the last two decades, researchers have made important contributions to individual and team sports analysis through the development of video-based computer-aided systems (Intille and Bobick, 1995;Iwase and Saito, 2004;Figueroa et al., 2006b;Barros et al., 2007Barros et al., , 2011Gomez et al., 2014;Morais et al., 2014). In team sports, these studies particularly focus on tracking the players (Figueroa et al., 2006b;Barros et al., 2011;Morais et al., 2014) and the ball (Stennett, 2003;Spagnolo et al., 2013). When tracking objects, one goal is to obtain the object's trajectory as a function of time, however, doing so requires a previous step: accurately detecting the object of interest. ...
Article
Full-text available
Sports complexity must be investigated at competitions; therefore, non-invasive methods are essential. In this context, computer vision, image processing, and machine learning techniques can be useful in designing a non-invasive system for data acquisition that identifies players’ positions in official basketball matches. Here, we propose and evaluate a novel video-based framework to perform automatic 3D localization of multiple basketball players. The introduced framework comprises two parts. The first stage is player detection, which aims to identify players’ heads at the camera image level. This stage is based on background segmentation and on classification performed by an artificial neural network. The second stage is related to 3D reconstruction of the player positions from the images provided by the different cameras used in the acquisition. This task is tackled by formulating a constrained combinatorial optimization problem that minimizes the re-projection error while maximizing the number of detections in the formulated 3D localization problem.
... Researches have already been done to implement goalline technology in football matches. Spagnolo et al. [5] have proposed an algorithm where candidate ball regions have been analyzed for detection of the ball. Here, the image sequences, that are formed, record the goal event via camera calibration. ...
Conference Paper
Full-text available
With the growing popularity of football globally, the accuracy of each and every events and technology involved in the game is a concern. Critical situations arise when the referee cannot discriminate a goal or no goal by fine margins due to human visual limitations. Nowadays Video-Assistant Referee (VAR) and other technologies perform accurate decision-making can be implemented in a live match. But the process consumes a lot of time which reduces the fast pace of the game and can cause unnecessary distractions. This study aims to design an automatic goal-line detection system with the help of radio frequency identification (RFID) - Arduino interfacing. RFID incorporates the use of radio waves to extract the information stored in a tag attached to an object. The proposed system uses RFID tags which are fitted on the inside surface of a football. The information embedded in the tags is read by RFID readers, placed behind the goalpost. This technology does not require any additional programming and camera analysis in decision-making, thus making the system faster to help the referees in quick decision-making and maintaining the pace of the game.
... Gol-line tehnologija je omogućila da više neće biti dilema da li je gol postignut ili ne. Par godina kasnije tehnologija je unapređena u VAR tehnologiju kako bi se pokrilo više segmenata igre [6]. ...
... Spagnolo vd. [12] önerdikleri çalışmalarında, bir gol çizgisi görüntüleme sistemi geliştirmişlerdir. Görüntü işleme algoritmalarına göre çalışan sistemde, aday çerçeve içerisinde top belirlenerek gol çizgisini geçip geçmediğinin tespiti yapılmaktadır. ...
Article
Full-text available
The need for autonomous systems increases day by day in order to be able to appeal to the referee's decisions in soccer. The main reason for this increase in demand is the fact that referees can make critical mistakes that could have an effect on important matches. Therefore, in addition to referees in games, assistant referee systems with a computer vision system have been engaged in soccer games. It is aimed to provide fair competition environment by reducing mistakes with such systems. In this study, a computer vision system, GolKaSis, was designed for the determination of the goal event in football matches. In the GolKaSis, firstly, the images taken from the videos obtained from the cameras placed in a region close to the tower in the designed football field prototype are separated as negative and positive. The ones that are positive from these images are those showing the goal event, the ones that are negative are the video images where the goal event is not coming to fruition. In the developed computer vision system, the matching of the positive video with the video images taken in real time is determined by providing Haar Cascade Classifier. It has been seen that the computerized vision system proposed in the test procedures on the designed prototype correctly determined the goal event with 91% success rate.
Article
Full-text available
Bu çalışma futbolda kullanılan Video Yardımcı Hakem (VAR) sistemine yönelik taraftar tutumlarını belirlemek amacıyla yapılan bir ölçme aracı geliştirme araştırmasıdır. Çalışma genel tarama modeli kullanılan betimsel bir araştırmadır. Araştırmanın grubu VAR sisteminin uygulandığı süper lig profesyonel futbol kulüplerinin taraftarlarıdır. Araştırmaya 397 katılımcı gönüllü olarak katılmıştır. Veriler, spor kulüplerinin taraftar gruplarının sosyal medya hesapları aracılığıyla web tabanlı olarak toplanmıştır. Geliştirilen ölçme aracına açıklayıcı ve doğrulayıcı faktör analizleri yapılmış ve uyum iyiliği indekslerine uygun değerler elde edilmiş, geliştirilen ölçeğin geçerlilik ölçütlerine uygunluğu tespit edilmiştir. Ölçme aracının iç tutarlık katsayısı α=,873 tespit edilerek ölçme aracının güvenilir olduğu ve alt boyutlarında hedeflenen özellikleri ölçtüğü ve geçerli olduğu sonucuna ulaşılmıştır. “Video Yardımcı Hakem Sistemi Taraftar Tutum Ölçeği” adı verilen ölçme aracı 9 ifadeli 7’li likert tipi ile hazırlanmış, futbola katkı, oyun yönetimi ve yarışma baskısı adı verilen 3 alt boyuttan oluşmaktadır. Araştırma sonucunda VAR sitemine karşı taraftar tutumlarını ölçmek amacıyla kullanılabilir, geçerli ve güvenilir bir ölçek oluşturulmuştur.
Article
Full-text available
In this paper, we investigate on the feasibility of multiple camera system for automatic offside detection. We propose six fixed cameras, properly placed on the two sides of the soccer field (three for each side) to reduce perspective and occlusion errors. The images acquired by the synchronized cameras are processed to detect the players' position and the ball position in real-time; a multiple view analysis is carried out to evaluate the offside event, considering the position of all the players in the field, determining the players who passed the ball, and determining if active offside condition occurred. The whole system has been validated using real-time images acquired during official soccer matches, and quantitative results on the system accuracy were obtained comparing the system responses with the ground truth data generated manually on a number of extracted significant sequences.
Conference Paper
Full-text available
We describe a simple and efficient algorithm for two-view triangulation of 3D points from approximate 2D matches based on minimizing the L2 reprojection error. Our iterative algorithm improves on the one by Kanatani et al. by ensuring that in each iteration the epipolar constraint is satisfied. In the case where the two cameras are pointed in the same direction, the method provably converges to an optimal solution in exactly two iterations. For more general camera poses, two iterations are sufficient to achieve convergence to machine precision, which we exploit to devise a fast, non-iterative method. The resulting algorithm amounts to little more than solving a quadratic equation, and involves a fixed, small number of simple matrix-vector operations and no conditional branches. We demonstrate that the method computes solutions that agree to very high precision with those of Hartley and Sturm's original polynomial method, though achieves higher numerical stability and 1-4 orders of magnitude greater speed.
Article
In this paper, we propose an approach for detecting ball in broadcast soccer videos. We use hybrid techniques for identifying ball in medium and long shots. Candidate ball positions are first extracted using features based on shape and size. For medium shots, a ball is identified by filtering the candidates with the help of motion information. In long shots, after motion based filtering of the non-ball candidates, a directed weighted graph is constructed for the remaining ball candidates. Each node in the graph represents a candidate and each edge links candidates in a frame with the candidates in next two consecutive frames. Finally, dynamic programming is applied to find the longest path of the graph, which gives the actual ball trajectory. Experiments with several soccer sequences show that the proposed approach is very efficient.
Article
In this paper, we propose a new method to estimate players' and ball's positions from monocular broadcast soccer video. With the relationship between objects and the camera in perspective projection, we derive the formula for estimating the moving objects' positions in real world, even when the ball is in the air. This method calibrates the camera's position in the stadium through the homography between the image and the playfield, and the self-calibration for rotating and zooming camera. Thus, the method can estimate the ball's position in the air without referring to other reference object with known height. In order to reduce manual interference, the players are detected based on the playfield detection. For the ball, we combine the detection procedure and tracking procedure organically. First, we extract candidate regions in each frame, then search the most likely regions in consecutive frames using Viterbi decoding algorithm. Once detected, the ball will be tracked by Kalman filter, which can help improve the detection recall. The system checks whether the ball is lost automatically. If it is lost, the detection procedure restarts. Experiments on synthesized data verify the proposed method, and promising results are obtained on real video data.
Conference Paper
In this paper, we propose a system for automatic judgment of offsides in soccer games. We detect and track players in fixed multi camera images and calculate the world coordinates of them. Furthermore, we do a formation analysis by classifying uniforms and calculate the position of an offside line. On the other hand, we calculate the 3D coordinates and the trajectories of a ball in world coordinates from the plane coordinates of a ball in multi cameras and recognize the moment of a play from the 3D trajectories of a ball. In addition, we make a judge player's interfering with play by analyzing the spatial relationship between a ball and players. Finally, we make an offside judgment by integrating these results. We apply our system to a real soccer match and demonstrate the availability of this system by showing the experimental results
Conference Paper
A higher order scheme is presented for the optimal correction method of Kanatani (5) for triangulation from two views and is compared with the method of Hartley and Sturm (3). It is pointed out that the epipole is a singu- larity of the Hartley-Sturm method, while the proposed method has no singu- larity. Numerical simulation confirms that both compute identical solutions at other points. However, the proposed method is significantly faster.
Article
During soccer matches a number of doubtful situations arise that cannot be easily judged by the referee committee. An automatic visual system that checks objectively image sequences would prevent wrong interpretations due to perspective errors, occlusions, or high velocity of the events. In this work we present a real time visual system for goal detection. Four cameras with high frame rates (200 fps) are placed on the two sides of the goal lines. Four computers process the images acquired by the cameras detecting the ball position in real time; the processing result is sent to a central supervisor which evaluates the goal event probability and, when the goal is detected, forwards a warning signal to the referee that takes the final decision.
Article
A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. The authors describe the application of RANSAC to the Location Determination Problem (LDP): given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing and analysis conditions. Implementation details and computational examples are also presented
Conference Paper
In this work we consider the problem of soccer team discrimination. The approach we propose starts from the monocular images acquired by a still camera. The first step is the soccer player detection, performed by means of background subtraction. An algorithm based on pixels energy content has been implemented in order to detect moving objects. The use of energy information, combined with a temporal sliding window procedure, allows to be substantially independent from motion hypothesis. Colour histograms in RGB space are extracted from each player, and provided to the unsupervised classification phase. This is composed by two distinct modules: firstly, a modified version of the BSAS clustering algorithm builds the clusters for each class of objects. Then, at runtime, each player is classified by evaluating its distance, in the features space, from the classes previously detected. Algorithms have been tested on different real soccer matches of the Italian Serie A.
Conference Paper
Background subtraction is a widely used approach for detecting moving objects from static cameras. Many different methods have been proposed over the recent years and both the novice and the expert can be confused about their benefits and limitations. In order to overcome this problem, this paper provides a review of the main methods and an original categorisation based on speed, memory requirements and accuracy. Such a review can effectively guide the designer to select the most suitable method for a given application in a principled way. Methods reviewed include parametric and non-parametric background density estimates and spatial correlation approaches.