Conference PaperPDF Available

Non-invasive Soccer Goal Line Technology: A Real Case Study

June 2013

June 2013

DOI:10.1109/CVPRW.2013.147

Conference: Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on

Authors:

Paolo Spagnolo

Italian National Research Council

Marco Leo

Italian National Research Council

Pier Luigi Mazzeo

Italian National Research Council

Massimiliano Nitti

Italian National Research Council

Show all 6 authorsHide

In this paper, a real case study on a Goal Line Monitoring system is presented. The core of the paper is a refined ball detection algorithm that analyzes candidate ball regions to detect the ball. A decision making approach, by means of camera calibration, decides about the goal event occurrence. Differently from other similar approaches, the proposed one provides, as unquestionable proof, the image sequence that records the goal event under consideration. Moreover, it is non-invasive: it does not require any change in the typical football devices (ball, goal posts, and so on). Extensive experiments were performed on both real matches acquired during the Italian Serie A championship, and specific evaluation tests by means of an artificial impact wall and a shooting machine for shot simulation. The encouraging experimental results confirmed that the system could help humans in ambiguous goal line event detection.

During the 2012 Euro Competition, England's defender John Terry lunges for the ball, which appears to be over the line

…

Overall performance of the GLT system during extensive experimental sessions.

…

. Sensibility evaluation of the system in the sled test.

…

Some images of the training set: in the first row there are some negative examples of the ball, in the second row some positive examples

…

The intersection of the three projection lines produces the estimated ball position.

…

Figures - uploaded by Marco Leo

Content may be subject to copyright.

Content uploaded by Marco Leo

Content may be subject to copyright.

Non-Invasive Soccer Goal Line Technology: A Real Case Study

Paolo Spagnolo, Marco Leo, Pier Luigi Mazzeo, Massimiliano Nitti, Ettore Stella, Arcangelo Distante

National Research Council of Italy

spagnolo@ba.issia.cnr.it

Abstract

In this paper, a real case study on a Goal Line Moni-

toring system is presented. The core of the paper is a re-

ﬁned ball detection algorithm that analyzes candidate ball

regions to detect the ball. A decision making approach, by

means of camera calibration, decides about the goal event

occurrence. Differently from other similar approaches, the

proposed one provides, as unquestionable proof, the image

sequence that records the goal event under consideration.

Moreover, it is non-invasive: it does not require any change

in the typical football devices (ball, goal posts, and so on).

Extensive experiments were performed on both real matches

acquired during the Italian Serie A championship, and spe-

ciﬁc evaluation tests by means of an artiﬁcial impact wall

and a shooting machine for shot simulation. The encour-

aging experimental results conﬁrmed that the system could

help humans in ambiguous goal line event detection.

1. Introduction

Soccer is the world’s most popular sport and an enor-

mous business, and every match is currently refereed by a

single person who ”has full authority to enforce the Laws of

the Game”. So, controversies are inevitable, and the most

glaring of them are usually about referee calls for which

no interpretation is required and concern about whether the

ball has completely crossed goal line or not. Recently, fa-

mous ’bad calls’ happened during the Euro 2012 (Ukraine

scored a goal against England that clearly went over the line

but was disallowed by referee, see ﬁg. 1) and World Cup

2010 (England scored a goal against Germany that was dis-

allowed by referee) Competitions.

In cases like these, the referee’s call is inﬂuenced by,

among other things, three ineluctable factors:

•the referee’s position on the ﬁeld: he is not aligned

with the goal line and then a parallax error affects his

decision;

•the high speed of the ball that can reach up to 120km/h.

It is impossible for human visual and cognitive systems

Figure 1. During the 2012 Euro Competition, England’s defender

John Terry lunges for the ball, which appears to be over the line

(as well as for standard broadcast images, at 25fps) to

estimate the position of such a moving object continu-

ously.

•the considerable distance (about 35-40 m.) between

the linemen and the goal post: this makes it very hard

to evaluate goal events with a resolution of about 1-2

cm.

The only way to deﬁnitively avoid these kinds of contro-

versies is to introduce a ”goal line technology”, i.e an au-

tomatic system to assist the referee in decisions concerning

goal events.

For this purpose different technologies have been pro-

posed. The earliest ones were based on instant replay: in

case of a controversial call about a goal event the referee (or

an assistant) could stop the game and watch the images (ac-

quired from broadcast or dedicated cameras). This would

slow down the game taking away possible plays and an-

noying the audience. Thus attention has recently turned to

technologies able to decide autonomously whether or not

the ball has crossed the goal line. One of the most promis-

ing approaches uses a magnetic ﬁeld to track a ball with

a sensor suspended inside [3]. Thin cables with electrical

current running through them are buried in the penalty box

and behind the goal line to make a grid. The sensor in the

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

DOI 10.1109/CVPRW.2013.147

990

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

DOI 10.1109/CVPRW.2013.147

998

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

DOI 10.1109/CVPRW.2013.147

1005

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

DOI 10.1109/CVPRW.2013.147

1011

ball measures the magnetic grids and relays the data to a

computer which determines if the ball has crossed the line

or not. However, this kind of technology cannot provide

unquestionable proof of detected events; and requires sub-

stantial modiﬁcations to the infrastructure of the stadium

and game component (ball, playing-ﬁeld, goalposts,...).

For these reasons, the efforts of several companies and

research institutes are currently focused on the development

of non-invasive goal line technologies. In particular, vision-

based systems appear to be very promising considering their

capability to provide a posteriori veriﬁcation of the system’s

operations [1, 2].

The main issue of an automatic system is the detection of

the ball; it is very difﬁcult when images are taken from ﬁxed

or broadcast cameras with a wide camera view since the

ball is represented by a small number of pixels and more-

over it can have different scales, textures and colors. For

this reason, most ball detection approaches are based on an

evaluation of the ball trajectory. The underlying idea is that

the analysis of kinematic parameters can point out the ball

among a set of ball candidates [13, 15, 11, 10].

However, trajectory based approaches are generally off-

line since the evaluation of the kinematic parameters for

all ball candidates requires a long period of observation;

so they are not suitable to be used in a real time goal line

monitoring system.

In recent years, few research groups have started work-

ing on visual frameworks with the aim of recognizing real

time events. These systems have also to address problems

associated with the time spent on image acquisition, trans-

mission and processing (often the frame rate is even higher

than for standard TV cameras). Furthermore, the ability to

work autonomously for several hours and in all environmen-

tal conditions are additional characteristics required in this

kind of systems. In [5] the authors present a real time vi-

sual system for goal detection which can be used as decision

support by the referee committee. A system for automatic

judgment of offside events is presented in [7]. The authors

propose the use of 16 cameras located along both sides of

the soccer ﬁeld to cover the whole area. The integration of

results from multiple cameras is used for offside judgment.

Six ﬁxed cameras were used in [4] to cover the whole ﬁeld

and to acquire image sequences from both sides of the sta-

dium. Player and ball tracking processes run parallel on the

six image sequences and extract the player and ball posi-

tions in real time.

However, the ball detection approaches proposed in

these works are developed to perform mainly in single im-

age; they don’t use temporal consistency to reinforce the

detection, and also integration between different views is

quite superﬁcial. In our work we integrate all information

to realize a system able to work consistently for long time

periods.

In this paper, a visual system able to detect the goal event

through real time processing of the acquired images and

immediately provide the image sequence that records the

goal event under consideration is presented. The system

has been implemented at the Friuli Stadium in Udine. It

has been tested both during real matches of the Italian Serie

A championship, and speciﬁc simulation sessions: in this

case, the ball was shot by a shooting machine in different

contexts, as explained in detail in the experimental results

section, in order to validate the system in terms of both spa-

tial and temporal accuracy.

2. Overview of the System

In ﬁgure 2(a) the visual system is outlined. Six cameras

are placed on the stands of the stadium. For each side of

the pitch, two of the three cameras have their optical axes

parallel to the goal frame, the remaining one is placed be-

hind the goal with its optical axis perpendicular to the goal

frame. Each camera is connected to a processor (node) that

records and analyzes the acquired images. In ﬁgure 2(b) a

schematic diagram of the processing steps executed by each

node is shown.

(a)

(b)

Figure 2. The scheme of the visual system (2(a)), and a schematic

diagram of the processing steps executed by each Node (2(b))

The six processors are connected to a main node, which

has the supervisor function. The supervisor node has a de-

cision making function by combining the processing results

coming from the cameras. The strategy is based on some

heuristics that perform data fusion evaluating the time space

coherence of the ball’s 3D trajectory. The processing results

of the three corresponding nodes are compared and a goal

99199910061012

event probability function is evaluated.

3. Preliminary Steps: Calibration and Moving

Object Segmentation

First of all, it is necessary to do a calibration step for

each node in which the correspondences between the im-

age plane and a plane in the 3D world are assessed. This

step is fundamental in determining the 3D position of the

ball. In other words, the homography transformation ma-

trix is estimated by using Random Sample Consensus for

each node [6] in this step. Each homography transforma-

tion matrix Mirelates the points on the image plane to the

corresponding points on the 3D plane. The only constraint

to be considered when choosing the planes is that they must

not be perpendicular to the image plane of the associated

camera. This calibration needs to be done only once, after

camera installation, and if the cameras remain in place these

measures are still valid for any subsequent matches. For

the experiments reported in this paper, the calibration phase

was carried out using non-deformable steel structures: each

structure deﬁnes a plane in the 3D world and speciﬁc mark-

ers were used for the identiﬁcation of the control points.

The segmentation of the image to extract moving ob-

jects is the ﬁrst processing step executed by each node. It is

fundamental as it limits the ball detection to moving areas

and reduces computational time. For this purpose a back-

ground subtraction-based segmentation algorithm was im-

plemented. Firstly, a background model has to be generated

and then continuously updated to include lighting variations

in the model. The implemented algorithm uses the mean

and standard deviation to provide a statistical model of the

background. Detection is then performed by comparing the

pixel current intensity value with its statistical parameters,

as explained in several works on this topic (a good review

can be found in [12]). Details about the implemented ap-

proach can be found in [14] Finally, after the detection of

moving points, a connected components analysis detects the

blobs in the image by grouping neighboring pixels. After

this step, regions with an area less than a given threshold

are considered as noise and removed, whilst remaining re-

gions are evaluated in the following steps.

4. Ball Detection and Tracking

An automatic method that detects ball position in each

image is the central step to building the vision system. In the

soccer world, a great number of problems have to be man-

aged, including occlusions, shadowing, mis-detection (the

incorrect detection of objects similar to the ball), and last

but not least, real time processing constraints. The ball de-

tection method has to be very simple, fast and effective as a

great number of images per second must be processed. This

kind of problem can be addressed by considering two dif-

ferent detection systems: geometric approaches that can be

applied to match a model of the object of interest to differ-

ent parts of the image in order to ﬁnd the best ﬁt; or example

based techniques that can be applied to learn the salient fea-

tures of a class of objects from sets of positive and negative

examples.

This method uses two different techniques together in or-

der to take advantage of their peculiarities: ﬁrst of all, a fast

circle detection (and/or circle portion detection) algorithm,

based only on edge information, is applied to the whole im-

age to limit the image area to the best candidate containing

the ball; second, an appearance based distance measure is

used to validate ball hypothesis.

The Circle Hough Transform (CHT) aims to ﬁnd circu-

lar patterns of a given radius R within an image. Each edge

point contributes a circle of radius R to an output accumu-

lator space. The peak in the output accumulator space is

detected where these contributed circles overlap at the cen-

ter of the original circle. In order to reduce the computa-

tional burden and the number of false positives typical of

the CHT, a number of modiﬁcations have been widely im-

plemented in the last decade. The use of edge orientation

information limits the possible positions of the center for

each edge point. This way only an arc perpendicular to the

edge orientation at a distance R from the edge point needs

to be plotted. The CHT, as well as its modiﬁcations, can be

formulated as convolutions applied to an edge magnitude

image (after suitable edge detection). We have deﬁned a

circle detection operator that is applied over all the image

pixels, which produces a maximal value when a circle is

detected with a radius in the range [Rmin,Rmax]:

u(x, y)= 

D(x,y)e(α, β)·

O(α−x, β −y)dαdβ

2π(Rmax −Rmim)(1)

where the domain D(x,y) is deﬁned as:

D(x, y)={(α, β)∈

2|R2

min ≤(α−x)2+(β−y)2≤R2

max}

(2)

e is the normalized gradient vector:

e(x, y)=[

Ex(x, y)

|E|,Ey(x, y)

|E|]T(3)

and 

Ois the kernel vector



O(x, y)=[

cos(tan−1(y/x))

x2+y2,sin(tan−1(y/x))

x2+y2]T(4)

The use of the normalized gradient vector in (1) is nec-

essary in order to have an operator whose results are inde-

pendent from the intensity of the gradient in each point: we

want to be sure that the circle detected in the image is the

992100010071013

most complete in terms of contours and not the most con-

trasted in the image. Indeed, it is possible that a circle that is

not well contrasted in the image gives a convolution result

lower than another object that is not exactly circular but has

a greater gradient. The kernel vector contains a normaliza-

tion factor (the division by the distance of each point from

the center of the kernel) which is fundamental to ensuring

that we have the same values in the accumulation space

when circles with different radii in the admissible range are

found. Moreover, normalization ensures that the peak in the

convolution result is obtained for the most complete circle

and not for the greatest in the annulus. As a ﬁnal considera-

tion, in equation (1) the division by (2Π ·(Rmax −Rmin))

guarantees the ﬁnal result of our operator in the range [-1,1]

regardless of the radius value considered in the procedure.

The masks implementing the kernel vector have a dimen-

sion of (2 ·Rmax + 1)(2 ·Rmax +1)and they represent the

direction of the radial vector scaled by the distance from the

center in each point. The convolution between the gradient

versor images and these masks evaluates how many points

in the image have a gradient direction concordant with the

gradient direction of a range of circles. Then the peak in

the accumulator array provides the center of the sub-image

with higher circularity that is ﬁnally passed to the valida-

tion step. Examples of sub-images given as input to the ball

recognition process are shown in ﬁgure 3.

Figure 3. Some images of the training set: in the ﬁrst row there

are some negative examples of the ball, in the second row some

positive examples

The validation step assesses the similarity of appearance

between the candidate region and a set of positive examples

stored previously. The similarity is evaluated by comput-

ing the Bhattacharyya distance among histograms (reduced

to 64 bins): this measure is not computationally time con-

suming but at the same time it is sufﬁciently robust as it

is invariant to the rotation of the target (textured balls are

considered) and also to slight changes in scale. One of the

strengths of the proposed system is that the construction

and updating of the set of reference examples for validation

takes place automatically.

Initially, the set of reference examples is empty and all

the moving objects with the highest value of circularity

(greater than a weak threshold) and with an area compatible

with that of the ball are taken into account. Their displace-

ment on the image plane frame after frame is then evaluated

in order to estimate some motion parameters e.g. direction

and velocity. The candidate regions are then included in the

reference set if the associated motion parameters are com-

patible with those that only a ball can have in case of a shot

on goal (direction towards the goal and plausible number

of pixel displacement between frames). At the same time

the relative distance into the image plane between the can-

didate regions and the other moving object in the scene is

evaluated: if the relative distance is low and almost consis-

tent, the ball candidate is discarded since it has likely been

produced by incorrect segmentation of players’ bodies.

The same criteria are used to add new examples in the

reference set, additionally considering the value of the mea-

surement of similarity with the pre-existing examples. The

maximum number of examples in the reference set is 100

and it is managed as a circular buffer.

The reference set is re-initialized when circular objects

with peculiar motion parameters and low similarity (i.e.

higher distances in the space of histograms) to the exam-

ples in the reference set appear in a number of consecutive

frames. This way the introduction of a new type of ball or

sudden and substantial changes in lighting conditions (due

to clouds or ﬂoodlights) can be handled automatically.

The ball has to be detected in more consecutive images

in order to be sure that a true positive has been found. In

this case, a different and more reliable procedure for select-

ing candidate moving regions is used (tracking phase). A

ball position probability map, covering all the points of the

processing image, is created as follows:

P(x, y)= e(−

|(x−|x+Vxsign(cos θ)|)+(y−| y+Vysig n(sin θ)|)|2

2σ2)

σ√2π(5)

where (x, y)is the last known ball position and

σ=RpVmaxn

RcmT(6)

where Vand θare the local velocity and the direction

of the ball in the image plane respectively, Rpis the Ball

radius in pixels, Rcm is the Ball radius in centimeters and

Vmax is the maximum admissible velocity of the ball (in

cm/sec), Tis the camera frame rate and nis the number

of frames between the past and actual ball detection (1 if,

in this case, the two frames are consecutive). This way the

maximum probability value is related to the point where,

on the basis of past information about the ball’s movement,

the ball should be found (predicted point). The probability

value decreases exponentially as the distance from the pre-

dicted point becomes close to 0 for points far from the last

known ball position that cannot be reached considering the

maximum speed limits (usually 120 km/h). In the follow-

ing frames, the probability map is used to select candidate

993100110081014

moving regions (like those with a probability greater than

zero). This way, the ball can be detected both in case of

merging with players and in case of partial occlusions. The

ball velocity, direction and probability map are always up-

dated using the proper value for n (i.e. the number of frames

between the actual frame and the last ball detection). If the

ball is not detected for three consecutive seconds (i.e. nbe-

comes greater than T*3) the past information is considered

outdated and the ball detection procedure starts again con-

sidering all the candidate ball regions in the whole image.

5. Supervisor node

The supervisor node has a decision-making function ac-

cording to the processing results coming from the nodes.

For each frame the processing units send several items of

information to the supervisor, including the last frame num-

ber processed, the position of the ball (if detected), and the

number of consecutive frames in which the ball has been

correctly tracked. It should be noted that even if the images

are synchronized in the acquisition process, the processing

results are not necessarily synchronized, since each node

works independently from the others. Moreover, a node

may jump some frames having accumulated a signiﬁcant

delay during the processing. When the supervisor receives

the results of three nodes for a given frame, or when it de-

tects that synchronized data obtained cannot be retrieved for

a given frame, the supervisor processes the obtained infor-

mation to evaluate the occurrence of a goal event. This is

done by evaluating the goal line crossing in the available

2D images. However, this way it is not possible to evaluate

if the ball crossed the goal line inside the goal posts or not.

For this reason, the 3D ball position and its trajectory before

crossing the goal line are evaluated. This requires a calibra-

tion procedure (described in section ??), and an accurate

evaluation of the 3D position of the ball.

If the ball position is evaluated in the image plane, it is

possible to estimate the corresponding projection line. The

intersection of the three projection lines provides the esti-

mate of the ball position in the real world coordinate sys-

tem as shown in ﬁgure 4. In practice, this process entails

uncertainty, so corresponding lines of sight may not meet in

the scene. Furthermore, it is likely that in certain moments

it is not possible to see the ball by one or more cameras

because of occlusions, for example created by the players,

the goalkeeper or the goalposts. For these reasons a special

procedure for estimating the 3D position of the ball was in-

troduced. If the ball is visible in only 2 cameras the 3D dis-

tance between the two projection lines is ﬁrstly computed.

Then, if this distance is smaller than a selected threshold

(typically about the size of the diameter of the ball, ie 22

cm.) the two projection lines are considered as referring to

the same object (dealing with possible errors of the detec-

tion algorithms of the ball which are described in section 4)

and then the mid-point of the common perpendicular to the

two projection lines is chosen as an estimate of the 3D po-

sition of the ball. If the ball is visible in all three cameras,

the mutual 3D distance among the three projection lines is

calculated. The two projection lines with shorter distance

are then considered the most reliable and this leads the cal-

culation to the previous case. We are aware that different

approaches have been proposed in literature to handle the

3D position estimation issue by triangulation ([9], [8]), but

we have not considered using them because of the difﬁcul-

ties of their implementation and their high computational

costs that make them unsuitable for a real-time system.

Finally, if the ball is only in a single camera, its 3D posi-

tion can be estimated if some previous temporally close 3D

positions are available. In this case, a linear ﬁlter is used to

predict the next 3D position and then to estimate the projec-

tion lines of the missing views.

Figure 4. The intersection of the three projection lines produces

the estimated ball position.

6. Experimental Results

A prototypal system was installed at the Friuli Stadium

in Udine. The system uses Mikrotron MC1362 cameras

with a spatial resolution of 1024x768 pixels at 504 fps and

Xeon QuadCore E5606 2,13 Ghz as the processing node.

Each node is equipped with a X64-CL iPro PCI frame grab-

ber capable of acquiring images from one Medium Camera

LinkTM camera and performing image transfers at rates of

up to 528 MB/s. The system was extensively tested dur-

ing real ”Serie A” championship matches and a speciﬁc ex-

perimental session (making use of the impact wall, slide,

ball shooting machine, etc.) was conceived to evaluate goal

event detection accuracy. Thus, both the system’s reliability

in the real context of a soccer match and its performance in

very stressful sessions of shots executed using a ball shoot-

ing machine (which also allows repeatable results to be ob-

tained) were tested.

An observation about experimental tests is mandatory:

a comparison with other approaches is unfeasible, due to

the complexity of the whole system. It could be interesting

a comparison with commercial/industrial systems ([1], [2],

[3]), but companies do not release technical information and

data such as to make possible such comparisons.

994100210091015

6.1. Benchmark Results

Here we focus our attention on benchmark sessions per-

formed outside of the match. In detail, four different exper-

iments were carried out:

•Impact Wall shots (ﬁg. 5(a) - 5(b)): during this test, an

artiﬁcial wall was placed behind the goal line in two

different positions: in the ﬁrst, the ball position at the

moment of impact is ”No Goal”, while in the second

it is ”Goal” (ground truth). The ball was shot by a

shooting machine at different speeds. This way the ac-

curacy of the system to detect the goal event in terms

of both spatial resolution and mostly temporal resolu-

tion (at high ball speed, even at 200 fps, there are just

1-2 frames to correctly detect the goal event) can be

tested.

•Sled (ﬁg. 5(c)): in this test, the ball is positioned on

a mobile sled, and slowly moved from a non-goal to

a goal position. The system’s precision to detect the

exact ball position (in terms of cm over the goal line)

was tested.

•Free Shots: during these experiments, several shots

were performed by means of the shooting machine,

in different positions with respect to the goal: left,

right, middle, just under the horizontal crossbar, and

just over the ground; each of them at different ball

speeds. We tested whether the system fails to detect

a speciﬁc portion of the goal area. Moreover, the reli-

ability of the 3D reconstruction procedure (to separate

shots inside and outside the goal posts) was tested.

•Outer Net (ﬁg. 5(d)): in this session, speciﬁc shots on

the outer net were performed with the ﬁnal position of

the ball inside the volumetric area of the goal, but arriv-

ing from outside the goal posts. We tested the system’s

capability of tracking the ball accurately (even in 3D),

by separating balls that arrived in the goal volumet-

ric area from a ’goal trajectory’ (inside the goal posts)

from balls that arrived from an external trajectory.

In order to show the weak impact of light conditions,

some tests were also performed at night, in the presence

of artiﬁcial stadium light. In table 1 the ﬁnal results are

reported. The Impact Wall sessions gave good results, with

an overall success rate of more than 92%. The experiments

were carried out by shooting the ball at different speeds,

impacting the wall in different positions.

All shots in the Outer Net session were correctly de-

tected; for the Sled session, we reported the mean distance

over the goal line detected by the system, while a more de-

tailed analysis of this data, to emphasize that the system

mostly detected the event within 2 cm, is reported in table

(a) Impact

Wall Position

for No Goal

Simulation

(b) Impact

Wall Position

for Goal

Simulation

(note that the

ball is over the

net)

Figure 5. Example of different tests

The Free Shots session realized different results accord-

ing to test lighting conditions: in the daylight test a suc-

cess rate of over 98% was obtained. On the contrary, in the

night test, an 88.54% success rate was obtained, in the same

test. This was due to the different performances of the al-

gorithms. First of all, the background subtraction together

with computational aspects: if the segmentation algorithm,

due to artiﬁcial light ﬂickering, detects a number of moving

points greater than reality, the following algorithms have to

process more points causing a growing computational load,

which leads to problems with memory buffers, and subse-

quently some frames are discarded (it should be noted that

all our experiments were performed in realtime). To con-

ﬁrm this, this test session was off-line processed again, ob-

taining results comparable to those in daylight. It can be

concluded that this drawback can easily be overcome, by

simply optimizing the code and/or improving hardware with

more performing components. The same observations are

valid for the night session of the Impact Wall test.

Figure 6 reports images acquired during the experimen-

tal phase and corresponding to two goal events; the ﬁrst row

refers to a goal event during the Free Shot session, while in

the second row images from the Impact Wall session are

shown (just 2 cameras are reported for this experiment, the

third one is evidently occluded and cannot help in any way).

6.2. Real match results

In order to evaluate the system’s robustness in real

uncontrolled conditions, test were conducted during real

matches of the Italian Serie A Championship; in particular,

995100310101016

Table 1. Overall performance of the GLT system during extensive

experimental sessions.

Test Results

Impact Wall - Daylight 175/186 - 94.09%

Outer Net - Daylight 70/70 - 100%

Free Shots - Daylight 165/168 - 98.21%

Impact Wall - Night 84/93 - 90.32%

Free Shots - Night 85/96 - 88.54%

Sled - Daylight average of 3.8 cm

Table 2. Sensibility evaluation of the system in the sled test.

Distance Results

0-2cm 17/32 - 53.125%

2-3cm 8/32 - 25.00%

3-5cm 4/32 - 12.50%

>5cm 3/32 - 9.375%

(a) camera 1 (b) camera 2 (c) camera 3

(d) camera 1 (e) camera 2

Figure 6. Some images acquired during the experimental phase

the system was tested during 19 matches played at the Friuli

Stadium in Udine (Italy). Table 3 reports the goal detection

results. In a real context, the important events (goals) are

limited in number so the benchmark sessions reported in

the previous section are mandatory in order to test the sys-

tem exhaustively. On the other hand, in a benchmark ses-

sion, it is really hard to introduce and simulate all possible

events that could happen during a real match: the presence

of players in the scene that could alter the detection of the

ball (some body parts, like legs and shoulders, could be er-

roneously detected as the ball); the presence of logos and/or

particular combinations of colors on the players’ uniforms

that can inﬂuence the ball detection procedure; the possibil-

ity that players randomly cross the goal line (goalkeepers

often do); the presence of objects in the goal area (towels,

bottles, etc.) that could lead to misclassiﬁcations.

As it can be noted, during the 19 matches there were 33

goal events that were correctly detected (no misdetections)

and just 1 false positive occurrence.

In ﬁgure 7, one of the goal events correctly detected

(even if the ball was occluded by one camera and the ball

appearance is not very different from the player’s jersey) is

shown. During this experimental phase, in addition to goal

events, a very controversial situation occurred: the goal-

keeper saved a shot when the ball was on the goal line (see

ﬁg. 8). The system evaluated that situation as No-goal and

the image clearly evidences that it was right.

A false positive also occurred during a complex defen-

sive action: four defenders were close to the goal line, try-

ing to protect the goal and one of them kicked the ball away

clearly before it crossed the line (see ﬁg. 9). Afterwards, a

defender crossed the goal line and, unfortunately, the sys-

tem recognized the pattern of the ball on his shorts (whose

position was also consistent with the trajectory predicted by

the ball tracking procedure). Cases like this (although rare)

could happen again, and certainly need further investiga-

tion. Considering that the system processed a huge amount

of data, i.e a total of over 1.7K minutes of play, which cor-

respond to about 20M of images, the percentage of errors

can be considered acceptable.

Finally, something about computational load: a speedy

response is mandatory for the system to actually be used.

For this reason we evaluated the delay in response for each

test. In ﬁg. 10 a summary of the response time is reported.

As evidenced, considering the realistic threshold of 2 sec-

onds for the system’s response, it can be noted that in about

80% of the total number of experiments the response time

was acceptable. Considering that algorithms can be further

improved and optimized, it can be concluded that the real-

time constraint can easily be achieved.

Table 3. Performance in real conditions.

Goal True False False

Events Positives Negatives Positives

33 Goals 33 0 1

7. Acknowledgments

The authors thank Liborio Capozzo and Arturo Argen-

tieri for technical support in the setup of the hardware used

for data acquisition and processing.

References

[1] http://en.wikipedia.org/wiki/Hawk-Eye.

[2] http://goalcontrol.de/gc4d.html.

[3] www.iis.fraunhofer.de/en/bf/ln/referenzprojekte/goalref.html.

996100410111017

Figure 7. One of the 33 goal events occurred during the test on real

matches.

Figure 8. A controversial situation occurred during a real match

and rightly evaluated by the system as no-goal

Figure 9. A situation in which the system erroneously detected a

goal

[4] T. D’Orazio, M. Leo, P. Spagnolo, P. Mazzeo, M. Nitti,

N. Mosca, and A. Distante. An investigation into the fea-

sibility of real-time soccer offside detection from a multiple

camera system. IEEE Transaction on Circuits and Systems

for Video Technology, 19(12):1804 – 1817, 2009.

[5] T. D’Orazio, M. Leo, P. Spagnolo, M. Nitti, N. Mosca, and

A. Distante. A visual system for real time detection of goal

events during soccer matches. Computer Vision and Image

Understanding, 113(5):622 – 632, 2009.

Figure 10. Plot of the response time

[6] A. M. Fischler and R. C. Bolles. Random sample consensus:

a paradigm for model ﬁtting with applications to image anal-

ysis and automated cartography. Commun. ACM, 24(6):381–

395, 1981.

[7] S. Hashimoto and S. Ozawa. A system for automatic judg-

ment of offsides in soccer games. In IEEE International

Conference on Multimedia and Expo, pages 1889–1892,

2006.

[8] K. Kanatani, Y. Sugaya, and H. Niitsuma. Triangulation

from two views revisited: Hartley-sturm vs. optimal correc-

tion. In BMVC, 2008.

[9] P. Lindstrom. Triangulation made easy. In Computer Vision

and Pattern Recognition (CVPR), 2010 IEEE Conference on,

pages 1554 –1561, june 2010.

[10] Y. Liu, D. Liang, Q. Huang, and W. Gao. Extracting 3d infor-

mation from broadcast soccervideo. Image and Vision Com-

puting, 24(10):1146–1162, 2006.

[11] V. Pallavi, J. Mukherjee, A. Majumdar, and S. Sural. Ball

detection from broadcast soccervideos using static and dy-

namic features. Journal Visual Communication and Image

Representation, 19(7):426–436, 2008.

[12] M. Piccardi. Background subtraction techniques: a review.

In IEEE SMC 2004 International Conference on Systems,

Man and Cybernetics, 2004.

[13] J. Ren, J. Orwell, G. Jones, and M. Xu. Real-time modeling

of 3-d soccer ball trajectories from multiple ﬁxed cameras.

IEEE Transaction on Circuits and Systems for Video Tech-

nology, 18(3):350–362, 2008.

[14] P. Spagnolo, N. Mosca, M. Nitti, and A. Distante. An unsu-

pervised approach for segmentation and clustering of soccer

players. In IMVIP Conference, pages 133–142, 2007.

[15] X. Yu, H. Leong, C. Xu, and Q. Tian. Trajectory-based

ball detection and tracking in broadcast soccervideo. IEEE

Transaction on Multimedia, 8(6):1164–1178, 2006.

997100510121018

Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities

Article

Full-text available

Dec 2021

Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events.

Multi-camera 3D ball tracking framework for sports video

Article

Full-text available

Feb 2021
IET IMAGE PROCESS

Accurate ball tracking in sports is vital for automatic sports analysis yet it is challenging mainly due to the small size and occlusions. This study proposes a novel multi-camera 3D ball tracking (MBT) framework for sports video. The proposed framework consists of four parts: 2D ball detection, 2D ball tracking, 3D position fusion, and 3D ball tracking. In 2D aspect, the multi-scale features are introduced to enhance the 2D ball detection, and the 2D ball tracking is also improved by exploring cross-view information to handle the occlusion and timely updating tracking model with detection results to alleviate the problem of tracking drift. For 3D ball, a novel 3D position fusion method is proposed to optimise the ball position and the 3D ball tracking approach with improved Kalman filter is finally applied to ensure a smooth 3D ball trajectory. Moreover, compared to the existing products in commercial, the proposed framework does not require any special equipment and is thus low cost. Extensive experiments for 2D and 3D ball on a public dataset demonstrate that the proposed framework is robust to ball tracking in sports video, even in the presence of environmental interferences, substantial occlusions, and even calibration errors.

Real-time visualization of sword trajectories in fencing matches

Article

Full-text available

Sep 2020
MULTIMED TOOLS APPL

We developed a system called Sword Tracer that visualizes sword trajectories in fencing matches. Sword Tracer tracks the tips of the swords in the image coordinates and visualizes their movements with computer graphics (CGs). It measures each sword’s position in the infrared (IR) image by detecting IR light reflected from retroreflective tape placed on the tip of the sword. It uses only a single camera and a single marker at the tip, so the system is compact enough to be used in official fencing matches. It accurately detects the tips of the swords by using supervised machine learning and tracks them by predicting their positions in the next frame. The trajectory CGs of the sword tips can be composited on the broadcast image in real-time. Sword Tracer was first used for a broadcast at the All Japan Fencing Championships in December 2017 and has since been used for four other broadcast programs and five exhibition events from 2018 to 2020. TV viewers and guests at the events approved of this new video effect because it helped them to follow the fast-moving swords and gain a better understanding of the swordplay.

A Video-Based Framework for Automatic 3D Localization of Multiple Basketball Players: A Combinatorial Optimization Approach

Article

Full-text available

Apr 2020

Sports complexity must be investigated at competitions; therefore, non-invasive methods are essential. In this context, computer vision, image processing, and machine learning techniques can be useful in designing a non-invasive system for data acquisition that identifies players’ positions in official basketball matches. Here, we propose and evaluate a novel video-based framework to perform automatic 3D localization of multiple basketball players. The introduced framework comprises two parts. The first stage is player detection, which aims to identify players’ heads at the camera image level. This stage is based on background segmentation and on classification performed by an artificial neural network. The second stage is related to 3D reconstruction of the player positions from the images provided by the different cameras used in the acquisition. This task is tackled by formulating a constrained combinatorial optimization problem that minimizes the re-projection error while maximizing the number of detections in the formulated 3D localization problem.

Radio Frequency Identification based Goal Line Technology for Quick Decision Making in a Football Match

Conference Paper

Full-text available

Mar 2019

With the growing popularity of football globally, the accuracy of each and every events and technology involved in the game is a concern. Critical situations arise when the referee cannot discriminate a goal or no goal by fine margins due to human visual limitations. Nowadays Video-Assistant Referee (VAR) and other technologies perform accurate decision-making can be implemented in a live match. But the process consumes a lot of time which reduces the fast pace of the game and can cause unnecessary distractions. This study aims to design an automatic goal-line detection system with the help of radio frequency identification (RFID) - Arduino interfacing. RFID incorporates the use of radio waves to extract the information stored in a tag attached to an object. The proposed system uses RFID tags which are fitted on the inside surface of a football. The information embedded in the tags is read by RFID readers, placed behind the goalpost. This technology does not require any additional programming and camera analysis in decision-making, thus making the system faster to help the referees in quick decision-making and maintaining the pace of the game.

Primena informacionih tehnologija u modernom sportu

Conference Paper

Full-text available

Jan 2019

Futbol Maçları İçin Bilgisayarlı Görü Destekli Gol Karar Sistemi(GolKaSis): Bir Prototip Çalışma

Article

Full-text available

Mar 2019

The need for autonomous systems increases day by day in order to be able to appeal to the referee's decisions in soccer. The main reason for this increase in demand is the fact that referees can make critical mistakes that could have an effect on important matches. Therefore, in addition to referees in games, assistant referee systems with a computer vision system have been engaged in soccer games. It is aimed to provide fair competition environment by reducing mistakes with such systems. In this study, a computer vision system, GolKaSis, was designed for the determination of the goal event in football matches. In the GolKaSis, firstly, the images taken from the videos obtained from the cameras placed in a region close to the tower in the designed football field prototype are separated as negative and positive. The ones that are positive from these images are those showing the goal event, the ones that are negative are the video images where the goal event is not coming to fruition. In the developed computer vision system, the matching of the positive video with the video images taken in real time is determined by providing Haar Cascade Classifier. It has been seen that the computerized vision system proposed in the test procedures on the designed prototype correctly determined the goal event with 91% success rate.

Futbolda Uygulanan Video Yardımcı Hakem Sistemi Hakkında Taraftar Tutumlarını Belirlemeye Yönelik Ölçek Geliştirme ÇalışmasıA Scale Development Study to Determine their Attitudes about The Video Assistant Referee System in Football Fans

Article

Full-text available

Dec 2022

Mustafa Ertan Tabuk

Bu çalışma futbolda kullanılan Video Yardımcı Hakem (VAR) sistemine yönelik taraftar tutumlarını belirlemek amacıyla yapılan bir ölçme aracı geliştirme araştırmasıdır. Çalışma genel tarama modeli kullanılan betimsel bir araştırmadır. Araştırmanın grubu VAR sisteminin uygulandığı süper lig profesyonel futbol kulüplerinin taraftarlarıdır. Araştırmaya 397 katılımcı gönüllü olarak katılmıştır. Veriler, spor kulüplerinin taraftar gruplarının sosyal medya hesapları aracılığıyla web tabanlı olarak toplanmıştır. Geliştirilen ölçme aracına açıklayıcı ve doğrulayıcı faktör analizleri yapılmış ve uyum iyiliği indekslerine uygun değerler elde edilmiş, geliştirilen ölçeğin geçerlilik ölçütlerine uygunluğu tespit edilmiştir. Ölçme aracının iç tutarlık katsayısı α=,873 tespit edilerek ölçme aracının güvenilir olduğu ve alt boyutlarında hedeflenen özellikleri ölçtüğü ve geçerli olduğu sonucuna ulaşılmıştır. “Video Yardımcı Hakem Sistemi Taraftar Tutum Ölçeği” adı verilen ölçme aracı 9 ifadeli 7’li likert tipi ile hazırlanmış, futbola katkı, oyun yönetimi ve yarışma baskısı adı verilen 3 alt boyuttan oluşmaktadır. Araştırma sonucunda VAR sitemine karşı taraftar tutumlarını ölçmek amacıyla kullanılabilir, geçerli ve güvenilir bir ölçek oluşturulmuştur.

Türk Futbolunda Video Yardımcı Hakem (VAR) Uygulamasına Yönelik Tutum Ölçeği Geliştirilmesi

Article

Full-text available

Jun 2020

Review on Video Refereeing using Computer Vision in Football

Conference Paper

Nov 2018

An Investigation Into the Feasibility of Real-Time Soccer Offside Detection From a Multiple Camera System

Article

Full-text available

Jan 2010

In this paper, we investigate on the feasibility of multiple camera system for automatic offside detection. We propose six fixed cameras, properly placed on the two sides of the soccer field (three for each side) to reduce perspective and occlusion errors. The images acquired by the synchronized cameras are processed to detect the players' position and the ball position in real-time; a multiple view analysis is carried out to evaluate the offside event, considering the position of all the players in the field, determining the players who passed the ball, and determining if active offside condition occurred. The whole system has been validated using real-time images acquired during official soccer matches, and quantitative results on the system accuracy were obtained comparing the system responses with the ground truth data generated manually on a number of extracted significant sequences.

Triangulation Made Easy

Conference Paper

Full-text available

Jun 2010
IEEE Comput Soc Conf Comput Vis Pattern Recogn

Peter Lindstrom

We describe a simple and efficient algorithm for two-view triangulation of 3D points from approximate 2D matches based on minimizing the L2 reprojection error. Our iterative algorithm improves on the one by Kanatani et al. by ensuring that in each iteration the epipolar constraint is satisfied. In the case where the two cameras are pointed in the same direction, the method provably converges to an optimal solution in exactly two iterations. For more general camera poses, two iterations are sufficient to achieve convergence to machine precision, which we exploit to devise a fast, non-iterative method. The resulting algorithm amounts to little more than solving a quadratic equation, and involves a fixed, small number of simple matrix-vector operations and no conditional branches. We demonstrate that the method computes solutions that agree to very high precision with those of Hartley and Sturm's original polynomial method, though achieves higher numerical stability and 1-4 orders of magnitude greater speed.

Ball detection from broadcast soccer videos using static and dynamic features

Article

Oct 2008

In this paper, we propose an approach for detecting ball in broadcast soccer videos. We use hybrid techniques for identifying ball in medium and long shots. Candidate ball positions are first extracted using features based on shape and size. For medium shots, a ball is identified by filtering the candidates with the help of motion information. In long shots, after motion based filtering of the non-ball candidates, a directed weighted graph is constructed for the remaining ball candidates. Each node in the graph represents a candidate and each edge links candidates in a frame with the candidates in next two consecutive frames. Finally, dynamic programming is applied to find the longest path of the graph, which gives the actual ball trajectory. Experiments with several soccer sequences show that the proposed approach is very efficient.

Extracting 3D information from broadcast soccer video

Article

Oct 2006
IMAGE VISION COMPUT

In this paper, we propose a new method to estimate players' and ball's positions from monocular broadcast soccer video. With the relationship between objects and the camera in perspective projection, we derive the formula for estimating the moving objects' positions in real world, even when the ball is in the air. This method calibrates the camera's position in the stadium through the homography between the image and the playfield, and the self-calibration for rotating and zooming camera. Thus, the method can estimate the ball's position in the air without referring to other reference object with known height. In order to reduce manual interference, the players are detected based on the playfield detection. For the ball, we combine the detection procedure and tracking procedure organically. First, we extract candidate regions in each frame, then search the most likely regions in consecutive frames using Viterbi decoding algorithm. Once detected, the ball will be tracked by Kalman filter, which can help improve the detection recall. The system checks whether the ball is lost automatically. If it is lost, the detection procedure restarts. Experiments on synthesized data verify the proposed method, and promising results are obtained on real video data.

A System for Automatic Judgment of Offsides in Soccer Games

Conference Paper

Jul 2006

In this paper, we propose a system for automatic judgment of offsides in soccer games. We detect and track players in fixed multi camera images and calculate the world coordinates of them. Furthermore, we do a formation analysis by classifying uniforms and calculate the position of an offside line. On the other hand, we calculate the 3D coordinates and the trajectories of a ball in world coordinates from the plane coordinates of a ball in multi cameras and recognize the moment of a play from the 3D trajectories of a ball. In addition, we make a judge player's interfering with play by analyzing the spatial relationship between a ball and players. Finally, we make an offside judgment by integrating these results. We apply our system to a real soccer match and demonstrate the availability of this system by showing the experimental results

Triangulation from Two Views Revisited: Hartley-Sturm vs. Optimal Correction

Conference Paper

Jan 2008

A higher order scheme is presented for the optimal correction method of Kanatani (5) for triangulation from two views and is compared with the method of Hartley and Sturm (3). It is pointed out that the epipole is a singu- larity of the Hartley-Sturm method, while the proposed method has no singu- larity. Numerical simulation confirms that both compute identical solutions at other points. However, the proposed method is significantly faster.

A visual system for real time detection of goal events during soccer matches

Article

May 2009

During soccer matches a number of doubtful situations arise that cannot be easily judged by the referee committee. An automatic visual system that checks objectively image sequences would prevent wrong interpretations due to perspective errors, occlusions, or high velocity of the events. In this work we present a real time visual system for goal detection. Four cameras with high frame rates (200 fps) are placed on the two sides of the goal lines. Four computers process the images acquired by the cameras detecting the ball position in real time; the processing result is sent to a central supervisor which evaluates the goal event probability and, when the goal is detected, forwards a warning signal to the referee that takes the final decision.

Random Sample Consensus: A Paradigm for Model Fitting with Applications To Image Analysis and Automated Cartography

Article

Jun 1981

A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. The authors describe the application of RANSAC to the Location Determination Problem (LDP): given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing and analysis conditions. Implementation details and computational examples are also presented

An Unsupervised Approach for Segmentation and Clustering of Soccer Players

Conference Paper

Oct 2007

In this work we consider the problem of soccer team discrimination. The approach we propose starts from the monocular images acquired by a still camera. The first step is the soccer player detection, performed by means of background subtraction. An algorithm based on pixels energy content has been implemented in order to detect moving objects. The use of energy information, combined with a temporal sliding window procedure, allows to be substantially independent from motion hypothesis. Colour histograms in RGB space are extracted from each player, and provided to the unsupervised classification phase. This is composed by two distinct modules: firstly, a modified version of the BSAS clustering algorithm builds the clusters for each class of objects. Then, at runtime, each player is classified by evaluating its distance, in the features space, from the classes previously detected. Algorithms have been tested on different real soccer matches of the Italian Serie A.

Background Subtraction Techniques: A Review

Conference Paper

Nov 2004

Massimo Piccardi

Background subtraction is a widely used approach for detecting moving objects from static cameras. Many different methods have been proposed over the recent years and both the novice and the expert can be confused about their benefits and limitations. In order to overcome this problem, this paper provides a review of the main methods and an original categorisation based on speed, memory requirements and accuracy. Such a review can effectively guide the designer to select the most suitable method for a given application in a principled way. Methods reviewed include parametric and non-parametric background density estimates and spatial correlation approaches.

Non-invasive Soccer Goal Line Technology: A Real Case Study

Abstract and Figures

Recommended publications

Obstacle detection "for free" in the C-velocity space

Robust camera calibration tool for video surveillance camera in urban environment

Real-time object tracking

High Performance Real-time Object Detecting and Tracking System for Multiple Moving Targets