ArticlePDF Available

An Overview of Data Extraction From Invoices

Authors:

Abstract and Figures

This paper provides a comprehensive overview of the process for information retrieval from invoices. Invoices serve as proof of purchase and contain important information, including the date, description, quantity, and the price of goods or services, as well as the terms of payment. Companies must process invoices quickly and accurately to maintain proper financial records. To automate this workflow, commercial systems have been developed. Despite the complexity involved, realizing automated processing of invoices necessitates the harmonious integration of a wide range of techniques and methods. While several surveys have shed light on different aspects of this workflow, our objective in this paper is to present a synthetic view of the process and emphasize the most pertinent challenges. We discuss the digitalization of invoices and the use of natural language processing techniques to extract relevant information. We also review machine learning and deep learning techniques that are widely used to handle the variability of layouts, minimize end-user tasks, and train and adapt to new contexts. The purpose of this overview is not to evaluate various systems and algorithms, but rather to propose a survey that reviews a wide scope of techniques for different data extraction tasks, addressing both information extraction and structure recognition for invoice processing. Specifically, we focus on table processing, paying particular attention to graph-based approaches.
Content may be subject to copyright.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier
An Overview of Data Extraction from Invoices
THOMAS SAOUT, FRÉDÉRIC LARDEUX, and FRÉDÉRIC SAUBION
Univ Angers, LERIA, SFR MATHSTIC, F-49000 Angers, France (e-mail: firstname.lastname@univ-angers.fr)
Corresponding author: Thomas Saout (e-mail: thomas.saout@etud.univ-angers.fr).
This work is supported by KS2 company
ABSTRACT This paper provides a comprehensive overview of the process for information retrieval
from invoices. Invoices serve as proof of purchase and contain important information, including the
date, description, quantity, and the price of goods or services, as well as the terms of payment.
Companies must process invoices quickly and accurately to maintain proper financial records. To
automate this workflow, commercial systems have been developed. Despite the complexity involved,
realizing automated processing of invoices necessitates the harmonious integration of a wide range of
techniques and methods. While several surveys have shed light on different aspects of this workflow,
our objective in this paper is to present a synthetic view of the process and emphasize the most
pertinent challenges. We discuss the digitalization of invoices and the use of natural language
processing techniques to extract relevant information. We also review machine learning and deep
learning techniques that are widely used to handle the variability of layouts, minimize end-user
tasks, and train and adapt to new contexts. The purpose of this overview is not to evaluate various
systems and algorithms, but rather to propose a survey that reviews a wide scope of techniques for
different data extraction tasks, addressing both information extraction and structure recognition
for invoice processing. Specifically, we focus on table processing , paying particular attention to
graph-based approaches.
INDEX TERMS Invoice Processing - Table Recognition - Information Extraction
I. INTRODUCTION
Invoices are crucial documents for companies as they
serve as proof of purchase and are necessary for account-
ing and tax purposes. They are created by the seller
and sent to the buyer to request payment for goods or
services. Invoices typically contain essential information
such as the purchase date, the description of goods
or services, the quantity and price, and the payment
terms. Companies need to process invoices promptly
and accurately to maintain proper financial records and
avoid potential payment delays. Digitizing invoices can
help streamline the process and reduce the risk of errors.
Paper invoices can be converted into a digital format,
and automated systems can extract critical information
like invoice numbers, amounts, and dates. This approach
can speed up processing time and improve accuracy.
Furthermore, digital invoices can be easily stored and
accessed through document management systems, mak-
ing it simpler to keep track of them and retrieve them
when needed.
The process of automated invoice processing requires
the handling of several document characteristics, such
as varying formats and layouts of invoices, differences
in language and terminology, and errors or inaccuracies
in the data [31]. This can present challenges, but with
advanced techniques such as machine learning and deep
learning, the process can be automated and made more
accurate to accomplish the following objectives:
effectively handle the variability of layouts: due
to the lack of a global standard, invoices often
exhibit significantly different formatting. Naturally,
the required legal information varies from country
to country, and furthermore, it can be arranged
in various ways within the document. Hence, it is
crucial to have labeling and typing techniques in
place to isolate the key elements of an invoice.
train and rapidly adapt to new contexts: in a prac-
tical scenario, companies often lack a substantial
corpus of invoices that are properly labeled for
learning or testing purposes. However, for small
companies, the invoices they handle are typically
specific since they originate from a relatively limited
number of customers and suppliers. Consequently, it
should be feasible to customize a system effortlessly
VOLUME 10, 2022 1
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
for a particular situation,
minimize the end-user task: while some systems rely
on predefined invoice presentation styles, modify-
ing these layouts typically requires extensive user
interaction. Although it is important to engage the
user in formulating their needs and specifying the
desired information and management rules, it is
essential to minimize the laborious manual tasks
involved in system tuning,
efficiently detect and extract tables from the in-
voices: tables play a crucial role in invoices, primar-
ily used to present accounting information. How-
ever, their formatting can vary significantly, and in
some cases, they may only be suggested without
explicit graphic delimiters. Consequently, the detec-
tion of tables within invoices represents a significant
challenge for automated systems, leveraging the
distinctive characteristics of invoices compared to
more generalized documents that rely on headings
or predefined elements.
Automated processing of documents requires dedi-
cated approaches based on the targeted domain. For in-
stance, legal texts require specific techniques [17], [42].
The analysis of administrative documents, including
invoices, has been an active area of research for many
years [13]. The task is complex because invoices can
come in various formats and contain a wide range of
information such as invoice numbers, amounts, dates,
and payment terms [31]. The lack of structure in
documents poses a real challenge for companies [12].
To address this complexity, various techniques have
been developed, such as Optical Character Recognition
(OCR) [127] for digitizing paper invoices and natural
language processing (NLP) techniques for extracting
relevant information from the text. Neural networks are
also frequently used for document classification tasks
[138].
Commercial systems have been developed by compa-
nies like ITESOFT1. and ABBYY2[123] to automate
the processing of invoices. These systems use a combi-
nation of OCR, NLP, and machine learning techniques
to extract information from invoices and process them
automatically. By integrating with the company’s exist-
ing systems, such as accounting and enterprise resource
planning (ERP) systems, these systems streamline the
invoice processing workflow into a global electronic
document management system (EDMS) [63]. Recent
advances have led to the development of other end-to-
end solutions for invoices [6].
Processing invoices requires complex administrative
procedures and involves different departments such as
accounting, logistics, and supply chain. To ensure effi-
ciency and accuracy, specific workflows are often used
1https://www.itesoft.com
2https://www.abbyy.com
[56]. These workflows typically involve multiple steps,
such as document digitization, information extraction,
and data validation, as well as security considerations
[97]. Since invoices can take on various forms, statistical
learning methods have been used to detect their possible
classes [129].
The step of digitizing documents involves utilizing
OCR technology to convert paper invoices into a dig-
ital format, allowing them to be processed and stored
electronically with ease. Next comes the information
extraction phase, which entails identifying the various
identifiers such as types, amounts, dates, and other
crucial details from the invoices. To achieve this, natural
language processing (NLP) techniques, such as named
entity recognition (NER), are typically employed, which
aids in recognizing and extracting specific information
from the text [50], [52].
Even if outside the scope of this overview, it is worth
noting that classification techniques have been proposed
for managing sets of invoices and categorizing financial
transactions based on their economic nature [9], [132].
Machine learning can also be used to forecast financial
data [55] related to invoicing, and time series tools such
as [141]–[143] are particularly useful for this purpose.
There have been many proposed solutions for man-
aging information contained in scanned invoices, and
most of these solutions are based on machine learning
techniques, which have seen recent advances [50], [102].
In general, probabilistic and statistical approaches seem
to be a natural way of understanding documents [88].
The first challenge in this field was identifying invoices
from a set of documents [71], and models have been
proposed to streamline this process [22].
Once invoices have been correctly scanned and identi-
fied, the next challenge is to extract relevant information
from them. Labeling techniques can be applied using
rules [33], but recent research has focused on using neu-
ral networks (NN) for named entity recognition (NER)
tasks [73], [75]. This is because invoices often contain
text sequences that are vastly different from natural
language, and specific information extraction methods
have been proposed to consider the specific structures
in these documents. For example, [31] uses a star graph
to consider the neighborhood of a text token, allowing
for the context of a token to be taken into account when
extracting information. This is a powerful method as it
allows for meaningful information to be extracted from
the document.
Several surveys provide an overview of general pro-
cessing techniques for image documents, such as OCR
techniques [59], text detection techniques [16], [147],
NER approaches [75], [95], [145], and table processing
[30], [38], [64]. However, few papers provide general
considerations for invoice processing. One such paper
is [52], which does not cover table extraction. In
[6], a very interesting end-to-end system is proposed
2VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
for processing invoices, including the different above-
mentioned steps. Choices are made to select relevant
techniques and the resulting system focuses on key fields
extraction. From these considerations, our motivation
is to offer a more comprehensive overview of available
methods that practitioners can use to design end-to-
end solutions for invoice processing. Note that, we also
pay particular attention to recent approaches based on
graph representations. Please let us mention that our
study is rooted in a practical experience, underpinned by
the effective implementation of an electronic document
management system in partnership with a company.
This overview aims to examine data extraction in the
context of automated invoice processing. In Section II,
we provide a comprehensive description of an invoice to
highlight the critical data and structures that require
attention. In our main section, Section III, we discuss
the different components necessary for extracting this
data. These include the digitization of the invoice using
OCR (Section III-A), the development of a data extrac-
tion process (Section III-B), which involves recognizing
specific entities (Section III-C) and identifying tables
(Section III-D). Section III-E explores how geographical
information can be utilized, with a particular focus on
the use of graph-based representations.
Since such a survey involve numerous references, we
propose an appendix with bibliographic tables that
would help the reader to quickly identify the cited refer-
ences according to the above-mentioned organization of
the sections.
II. INVOICE MODELING
Defining a suitable representation of an invoice is an
important step for clearly understanding its specifica-
tions. An invoice can include inputting data such as the
invoice number, date, and amounts, as well as assigning
it to a specific customer or project. In [22], a semantic
network was used to describe the invoice domain by
different levels of abstraction. Before going on through
invoice processing techniques, we propose here a model
that better focuses on relevant extraction tasks that
are expected to be handled by an invoice processing
application.
We chose to initially limit the scope of invoice ex-
traction. Figure 1 illustrates a basic sample of an
invoice, emphasizing key information sought by auto-
mated document processing tools. The extraction of
specific fields, such as the invoice date (highlighted in
the purple box), supplier address (in the orange box),
and organizational providers (within the cyan box),
is crucial. This survey places particular emphasis on
table extraction, as indicated by data enclosed in blue
and red boxes. Additionally, it is worth noting that
the invoice contains other pertinent information that
may be valuable for Named Entity Recognition (NER)
processes, including the identification of both the sender
FIGURE 1. An invoice sample
and receiver. Figure 2 provides a comprehensive view of
the typical content of a invoice by means of an UML
class diagram.
Different types of information must be highlighted
such as addresses, tables, dates, and actors (organi-
zations or individuals identified on the invoice). This
selected information seems coherent with the analysis
of multiple invoice models and the usual requirements
of the companies. One may identify 6 groups of data:
Actors: individuals or companies involved in the
invoice, such as a customer or a supplier.
Independent fields: fields whose value is not linked
to one of the other following fields and that often
represent essential data for the invoice.
Information on the document: information specific
to the management of the document, such as its
name or identifier in the file system, the dates of
creation and processing of the document all the
data that are not extracted from the document but
that come from its processing.
Addresses: addresses contained in the document,
with if possible precision on their types, billing
address, delivery, or sender for example
Tables: data tables are essential in invoices. They
VOLUME 10, 2022 3
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
often include several lines of invoiced items, prices,
quantities. . .
Date: the set of dates, specific to the invoice pro-
cesses such as the date of the edition of the invoice,
the date of payment or of delivery.
FIGURE 2. An UML model for invoices
Among these data, tables are considered complex to
extract in this model because they often contain a large
amount of structured data that needs to be parsed
and understood. Companies need to perform verification
operations on the table data, such as verifying VAT
amounts and rates, or ensuring that the sum of the table
lines matches the invoice amount. Efficient methods
for extracting and analyzing table data are crucial due
to the time-consuming and error-prone nature of the
process.
III. INVOICE PROCESSING
As mentioned in the introduction, automated invoice
processing requires a complete chain of software tools
to automate the tasks involved in processing invoices.
Hence, we could consider the following key features:
1) Optical Character Recognition (OCR): OCR is
used to extract data from scanned or PDF invoices,
making them searchable and easily readable by the
system.
2) Machine Learning (ML): ML algorithms are widely
used to classify and extract data from invoices,
such as vendor information, invoice numbers, and
amount. They are also intended to be able to
extract structured information such as tables.
3) Workflow Automation: the system automatically
route invoices for approval, flagging any discrep-
ancies or errors for manual review.
4) Integration with ERP: automated invoice process-
ing systems can integrate with enterprise resource
planning (ERP) systems, allowing for seamless
data transfer and real-time visibility into the in-
voice process.
5) Real-time Analytics: automated invoice process-
ing systems can provide real-time analytics and
reporting on invoice data, allowing businesses to
track and analyze their spending. This is strongly
related to business intelligence modules.
6) Compliance and Security: one may want to check
compliance with tax regulations, and protect sen-
sitive data through security measures such as
encryption and secure data storage.
In this overview we restrict our scope to information
extraction, considering raw scanned documents. Hence
we restrict ourselves to the first two points of the above-
mentioned features.
A. OPTICAL CHARACTER RECOGNITION
OCR systems have a long history, starting with early
mechanical devices that were developed in the 1950s,
such as GISMO (built by Sheppard in 1951). During
the 1960s and 1970s, not much research was done on
OCR due to the errors and slow recognition speed of
the early systems [72]. However, during the past 40
years, there has been substantial research on OCR which
has led to the development of document image analysis,
multilingual, handwritten, and omni-font OCRs. Nev-
ertheless, OCR technology is still far from matching
human reading abilities and current research focuses
on improving accuracy and speed of diverse document
styles and languages, including complex languages.
Let us mention several state-of-the-art reviews [37],
[93] that were already synthesizing the work in the early
90s. The seminal roots of OCR can be explored by
reading the state-of-the-art of Mantas et al. [83]. On the
other hand, a good practical starting point for OCR can
be accessed through the work of Breue et al. [18], which
presents an open-source OCR solution. A recent state of
the art in OCR has been published in 2017 by Islam et
al. [59].
Hence, OCR is a crucial discipline in image interpre-
tation with highly important potential applications. A
major problem was handwritten character recognition
[89], including the need for a database. Note that im-
portant conferences were focusing on OCR since the 90s,
e.g. ICDAR [1] with dedicated workshops [49]. Neural
networks have then considered to overcome the previous
limitations. In [28], the use of projection profile fea-
tures coupled with a back-propagation neural network
classifier has proven highly effective. Nowadays, neural
networks are widely used in OCR technologies. Let us
quote some recent works: in [96] the author consider
a significantly extensive Urdu corpus ideally suited for
applications involving deep learning techniques, [62]
introduced end-to-end learning methods for recognizing
arithmetic expressions combining deep a convolutional
neural network and convolutional recurrent neural net-
4VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
work, in [66] the authors propose an exploration of char-
acter recognition, encompassing both monolingual and
multilingual contexts, utilizing both deep and shallow
architectural approaches.
Among the impressive number of works related to
OCR, let us mention the work of Mithe et al. [92]
that presents a solution using an OCR solution to
extract text and then send it to a voice synthesizer.
The main objective behind this solution is to produce a
solution that transforms an image into a speech on the
contained text in the picture. This article proves that
the processing of an image makes it possible to obtain
fully structured information.
Of course, it is also very important to clearly as-
sess the performance of OCR using suitable measures
and available benchmark sets [98]. Let us note here
that image processing techniques can be used to get
better initial documents, even before applying OCR.
Morphological operations, such as dilation, erosion, and
opening, are commonly used in image processing to
remove noise, blur, and skewness from document images.
These techniques have been applied to prepare images
for OCR and to locate text-containing parts in an image
[147], for instance using OpenCV [44].
Back to our structured information extraction con-
cern, a dedicated challenge has been recently proposed
by Huang et al. [58] at the ICDAR2019 conference. The
prize for the best paper was awarded to Zhong et al. [158]
which offers a solution based on neural networks for the
recognition of certain entities related to the formatting
of documents.
In recent times, there has been ongoing research in
the field of OCR. Let us mention a first work [10] that
specifically focuses on the application of OCR for the
recognition of written texts within a medical context. A
promising development in OCR techniques aligns with
the progress in deep learning, as exemplified by the work
of Minghao Li et al. [77]. In this work, the authors have
adapted the transformer architecture to address OCR
challenges and have presented a comprehensive bench-
mark featuring many contemporary techniques. This
reflects the dynamic evolution of OCR methodologies,
where advancements in deep learning play a pivotal role.
B. DATA EXTRACTION
Once the OCR has been applied, we are generally left
with a set of PDF documents that are expected to
be searchable and exploitable. Let us first begin with
a general consideration of possible data extraction at
this stage. At first glance, we may consider the visual
aspect of the document and the relative positions of the
information that it contains.
The work of Suzanne Liebowitz Taylor et al. [133]
presents an overview of the problem of document extrac-
tion from scanned documents. This article highlights the
problems of alignment of the text. It also highlights that
only part of the information is relevant to extract.
The global layout of the document has to be taken
into account [7]. Ahmad et al. [2] use the concept of
unstructured, semi-structured, or structured documents.
The work of Yao et al. [146] on the relationships between
entities, which is also unlabeled, also seems very rele-
vant. Sun et al. [131] present a solution for orienting
documents according to a specific entity (QR Code
in the article). These methods address two common
challenges in data extraction: document orientation and
scale. The invoices, which are in the form of images,
are first preprocessed to remove any unnecessary back-
ground and to correct the angle of the invoice. Then, the
region containing the desired information on the invoice
is identified using template matching. Another system
(BINYAS) [16] performs document layout analysis for
document image processing. This system uses connected
components and pixel analysis for classifying elements
such as paragraphs, graphics, images, and tables in
the document. In [11] the authors propose a dataset
for unstructured invoice documents that covers a wide
range of layouts, which is designed to generalize key
field extraction tasks for unstructured documents. The
dataset is evaluated using various feature extraction
techniques as well as Artificial Intelligence methods.
As already mentioned, tabular content extraction
from PDF documents is of great importance, in partic-
ular for benefiting from available open-source document
repositories [30]. The extraction and processing of data
from PDF files have indeed always been studied [81].
Data in tables is often displayed in a tabular format.
Although tables may appear simple, extracting and
processing them from PDFs can be difficult and require
complex computational methods [48]. The purpose is
often to produce new formats from initial PDFs such
as XML files [113]. Note that PDFs do not typically
record the structure of their graphical objects in their
description, although it could be done.
Of course, visual separators are important for identi-
fying tables in documents as they reveal the table struc-
tures [41]. Actually, when tables include visible lines
that can be extracted from the document, considering
the maximum independent set of rectangles (MISR)
problem seems relevant [24]. MISR consists of finding
in a set of rectangles the smallest set of rectangles with
no intersection. Unfortunately, many tables miss lines
to separate some columns or rows and some techniques
do not apply in these cases. Yildiz et al [149] present
approaches based on line intervals and columns to iden-
tify the entities corresponding to tables’ cells. Note that
table extraction will be detailed in Section III-D.
Deep learning techniques are now widely used to
identify and extract tables in PDF documents [46], [152].
This aspect will be detailed later. Note that some work
uses APIs such as PDFminer to transform PDF into
VOLUME 10, 2022 5
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
XML and perform supervised learning on XML [104].
C. ADDRESSING SPECIFIC INFORMATION EXTRACTION:
NAMED ENTITY RECOGNITION
In the scope of this study, we are not concerned with
general document processing but with invoices that
are restricted to a specific domain, whose terms and
concepts are known. Hence we are concerned by the
semantics of the documents. The analysis of invoices is
hence related to Natural Language Processing (NLP)
and more specifically to Named Entity Recognition
(NER) (see [95], [145] for dedicated surveys).
The problem of named entity recognition (NER) was
presented by Marsh and Perzanowski at the MUC con-
ference [85]. NER involves labeling a text by associating
each character string with a specific category, such as
a person, location, organization, temporality, amount,
or percentage. This problem is also referred to as entity
labeling or entity extraction. Research intensifies then on
this purpose. During CoNLL-2003 [135] the focus was
put on language-independent named entity recognition.
The challenge concentrates on four types of named
entities: persons, locations, organizations and names of
miscellaneous entities that do not belong to the previous
three groups. During same period, the ACE program’s
goal [35]) was to advance technology for automatically
extracting information from human language data. This
includes identifying mentioned entities, determining the
relationships between these entities as expressed in the
text, and recognizing the events in which these entities
are involved. The program encompasses various data
sources.
At this time the NER was restricted to the names of
people, locations, and organizations, and sometimes to
some other proper names, which does not cover all the
possible types expected in an invoice.
The specific set of labels used in NER depends on
the data and the task at hand. NER is, of course,
strongly dependent on the application domain (e.g.,
[107], [154]). Some researchers limit themselves to the 6
initial categories (person, location, organization, tempo-
rality, amount, and percentage) and believe that these
labels are sufficient for all NER tasks. However, other
researchers argue that specific labels may be necessary to
effectively solve specific NER tasks [20]. The number of
labels used can vary depending on the complexity of the
data and the specific requirements of the task. Therefore,
the choice of labels is often a trade-off between the need
for specific information and the complexity of the model.
Let us particularly mention the works of Alfonseca et
al. [3] and R. Evans [40] that use the notion of “open
domain”. Recently, data sets have been made available
for NER related to invoices [11]. From a practical point
of view, Mikolov et al. [90] demonstrate the benefit of
using vector representation of words and also that it
is possible to train a model of neural networks on a
large training set, including a large number of sentences
with approximately one billion words and a vocabulary
of more than one million different words. A month
later, Mikolov et al. [91] considered a distributed rep-
resentation of words and prove that, by adding certain
vectors of words, the learning process allows one to
learn the meaning of the words. The linguist Scharolta
Katharina Siencnik [126] attempts then to demonstrate
the possible application of these algorithms to named
entity recognition.
While state-of-the-art named entity recognition sys-
tems relied heavily on hand-crafted features and domain-
specific knowledge, new neural architectures for NER
were proposed [27], [73]. These architectures aim to
improve performance by leveraging the strengths of
neural networks, such as their ability to learn useful
features from data, while still addressing some of the
limitations of previous methods. Convolutional neural
networks (CNN) [79] have been then considered with
NER problems [139], [151] as well as bidirectional net-
works [4]. Let us mention work on the identification of
depression according to the answers of the patient in an
interview [115] as well as the work of He et al. [54] to
establish distant dependencies between the entity terms
via the processing of CNN.
ELMo is a language model that was developed by
Matthew E. Petters and his team [105]. Unlike tradi-
tional word embeddings that represent words as fixed
vectors, ELMo utilizes the context in which words ap-
pear to generate more dynamic and informative word
embeddings. The model is semi-bidirectional, which
means it takes into account both the preceding and
succeeding words in a sentence to better understand the
meaning of the word it is trying to represent.
ELMo’s innovative approach to word embeddings
quickly gained attention from researchers in the natural
language processing (NLP) community. Dogan et al. [36]
applied ELMo’s neural network architecture to tackle
Named Entity Recognition (NER) problems, which in-
volve identifying and classifying entities in text such
as names, dates, and locations. While ELMo showed
promising results, it had a limitation that it could not be
effectively fine-tuned with other models using a "masked
language model".
To address this shortcoming, Devlin et al. proposed
BERT [34], a bidirectional model also based on ELMo,
which has since become one of the most widely used pre-
training models for NLP tasks. BERT uses a "masked
language model" that allows it to refine its representa-
tions using an unsupervised pre-training method. This
makes it possible for BERT to generate high-quality
word embeddings that can be fine-tuned with other NLP
models to achieve state-of-the-art performance on a wide
range of language tasks.
Ali Safaya et al. [116] demonstrate the possible as-
sociation between CNN and BERT and study its ef-
6VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
ficiency. This work focuses on BERT associated with
Arabic, Turkish, and Greek languages, which presents
a more structured construction than some languages.
This study achieves better efficiency in the recognition
of hateful content for these languages.
GPT models have become essential in natural lan-
guage processing (NLP) due to their ability to be fine-
tuned for specific NLP tasks. Radford’s work on lan-
guage model transformers, particularly the GPT model
[111], has revolutionized the field of NLP. Unlike bi-
directional models like BERT and ELMo, GPT is a
unidirectional model where word embeddings are only
enhanced in one direction, typically from left to right.
This unidirectional architecture makes GPT particularly
useful in language prediction tasks, where the model
predicts the next word in a sentence based on the
preceding words.
Getting back closer to our main concern, Francis et al.
[43] present a solution for extracting data from financial
or medical documents using a neural network trained for
named entity recognition, which evaluates the efficiency
of a character-based model or on a word. One also has
to consider general language processing. For instance,
the work of Suárez et al. [130] on the state of the art of
named entity recognition for the French language can
be useful for dealing with French invoices. Hamdi et
al. [52] present tools to improve the learning of invoice-
specific labeling by reducing the cost of time and human
intervention.
To ensure better explainability, rule-based approaches
are useful alternative techniques for achieving NER
[26]. Shreeshiv et al. [103] address the extraction of
key parameters of the invoice (KPIE), by proposing a
rule-based approach and an approach based on neural
networks to recognize these parameters of the invoice.
Declarative approaches based on constraint solving
should also be considered as promising research direction
[5].
Practical solutions are available for NER, such as
that of Nanonets3.com and ABBYY. A well-documented
example explains the use of BERT in the case of a NER
[34].
In summary, there are two main approaches. The
historical rules-based approach tends to be inspired by
the rules of traditional grammar for labeling words in
the context of the text. This approach is very efficient
on specific domains because the writing of the rules
is often very oriented towards the desired domain to
avoid ambiguity. Nevertheless, this specialization leads
to processing difficulties for the new context, not defined
during the implementations. It is also necessary to
rework the model to extend its capacity. This step often
requires the intervention of an expert.
3https://www.nanonets
The neural network approach to label the entities
of our document seems interesting to avoid spending
too much time defining the labeling rules. This method
better manages the new domains and we can more easily
set up automation of the relearning for the new concepts
treated. Nevertheless, NN requires huge computational
resources and training corpora to be efficient.
In Figure 3 we propose an empirical evaluation of NER
systems according to the state of the art, the statements
of the various specialists in this field, and the needs
encountered in companies. This evaluation is therefore
subjective.
FIGURE 3. Advantages and disadvantages of NER methods according to
the state of the art
D. FOCUS ON TABLE EXTRACTION
Examining more precisely invoices leads to consider
that most of them include tables as a main structural
character. Hence, table detection within invoices appears
as an important processing task [122]. Table processing
is indeed an old challenge (the 2004 survey [150] propose
already an overview of the field) but these challenges are
still active [45].
Understanding information embedded into tables in-
volves three steps as quoted in [61]: detecting the table
boundaries, identifying the structure of the table includ-
ing rows, columns, and cell positions, and recognizing
the contents of the table (tokens of information that are
expected to be presented in a more readable format).
The layout is an important aspect [69]. Techniques used
for detection include object detection models [23] like
Faster-RCNN (Region Based Convolutional Neural Net-
works) and Mask-RCNN [108] and NLP-based methods
that incorporate both textual and visual features [57].
Note that TableBank [76] includes a new image-based
table detection and recognition dataset. PubLayNet
[158] can accurately recognize the layout of scientific
articles after training on over one million PDF articles.
LayoutLMv3 [57] is pre-trained with a word-patch align-
ment objective to improve cross-modal alignment. This
allows the model to predict whether the image patch
associated with a text word has been masked.
VOLUME 10, 2022 7
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
Deep learning techniques are now widely used for
achieving table structure recognition. Recently, Kava-
sidis et al [67] introduce a fully-convolutional neu-
ral network that utilizes saliency-based techniques for
multi-scale reasoning with visual cues. They also in-
corporate a fully-connected conditional random field
to precisely locate tables and charts within digital or
digitized documents. A common approach consists of
using a bi-directional RNN with Gated Recurrent Units
(GRUs) to process image data [68]. The pre-processing
step is used to form the image data so that it can be fed
into the network. The bi-directional RNN with GRUs
is then used to analyze the image data and extract
features. Finally, a fully connected layer with a softmax
activation function is used to classify the image based
on the features extracted by the RNN. Gilani et al. [47]
introduced an approach based on deep learning to detect
tables. Our method begins by pre-processing document
images, which are then input into a Region Proposal
Network (RPN), followed by a fully connected neural
network to identify tables. Their method demonstrates
remarkable precision when applied to document images
with diverse layouts, encompassing documents, research
papers, and magazines. Vine et al. [137] introduce a
two-step approach including a generative adversarial
network (GAN) and a genetic algorithm to optimize
a distance measure between candidate table structures.
Another two-step process that uses cell detection and
interaction modules to recognize the structure of a
table is proposed in [112]. The cell detection module
is used to locate and identify individual cells in the
table image. The interaction module then predicts the
associations between the detected cells, such as their
row and column associations. This approach can be
useful for determining the overall structure of a table,
including the number of rows and columns, as well
as the relationships between cells within the table.
Convolutional networks have, of course, been explored
[67], [125], with Split and Merge models [134]. In [99],
the authors consider also explainability as an issue in
an NN. Global end-to-end solutions are now available
TableNet [100], DeepDeSRT [120] PubTabNet [156] or
GTE [155]. Dedicated benchmarks repository have been
proposed to evaluate these methods : Tablebank [76]
(417K high quality labeled tables) and even a novel
dataset derived from the RVL-CDIP invoice data [114].
Table detection may also rely on more specific knowl-
edge. In [140], the authors propose a system for auto-
matically generating ground truth data for training table
detection algorithms. We found in the literature impor-
tant works on layouts, for example in [106], David P.
al use “Conditional Random Fields” (CRF) to compose
different layouts of a table that can sometimes overlap
and may be misinterpreted by other modeling languages.
Tools such as TableSeer [80] searches for forms that can
correspond to tables to extract them and be able to
execute queries on their contents.
The specific structures of invoices lead to consider-
ing the geographical organization of the document and
graph-based models are thus relevant [122]. Recent work
[65] proposes an approach to detect the general frame of
a table and extract its content. Focusing on more specific
tables, their characteristics are also intended to help
these tasks, such as headers [121]. Rule-based systems,
which were seminal table extraction techniques, may also
be relevant [124].
Graph-based approaches also seem to be a natural
way to handle tables. In [117] the authors use graph
mining for extracting tables using key fields. Hence,
Graph Neural Networks (GNN) [119] appears as natural
to handle graph-based knowledge [148]. Graph Neural
Networks (GNNs) can indeed capture the local repeating
structural information in invoice document tables [114].
In [78], the authors propose a method based on GNN to
mix position and text. Their algorithm also uses visual
recognition to predict the right numbers of columns and
lines. In [109] architecture that combines the benefits
of convolutional neural networks for visual feature ex-
traction and graph networks is introduced for dealing
with the problem structure. Cell detection and cell logic
are used to predict the location of the cells in [144]. [153]
presents a unified framework that utilizes a combination
of vision, semantics, and relations for analyzing docu-
ment layouts, supporting natural language processing
and computer vision-based methods. Slightly different,
LGPMA [110] employs a soft pyramid mask learning
approach to recover table structure by analyzing both
local and global feature maps. Additionally, it considers
the location of empty cells during this process.
E. HANDLING GEOGRAPHIC INFORMATION IN THE
INVOICES: POSSIBLE PERSPECTIVES FOR GRAPH-BASED
REPRESENTATIONS
Since the layout of invoices is particularly relevant as
described above, let us explore the modeling and the
processing of geometric or geographic information, to
discover links that cannot be handled by a purely seman-
tic analysis of the document. For example, an invoice
may contain a keyword and its expected associated value
close to it. Let us review some methods for representing
and exploring this structured data. For instance, Esser et
al [39] try to extract templates from scanned documents.
This section is devoted to methods that would not
consider image processing or NN to handle the global
layout of the document using a training process. We are
merely interested in techniques based on representation
models and associated solving techniques to process
geometric data in a more frugal (without the need for a
huge and costly training) and more declarative way.
A long time ago, Cesarini et al. [21] were already
interested in the structural analysis of a document by
trying to label areas. They consider that an invoice is a
8VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
set of regions that can be identified using their relative
geometrical position.
As mentioned in Section III-D, graph-based represen-
tation has been explored for handling the structures of
the tables in documents. Therefore, we focus here on
such representations and how they can be exploited to
efficiently retrieve table structures and their content.
Since the structure of a table may contain different
levels, we argue that several levels of abstraction are
needed to represent the geometrical structure of a table.
Using models with geometric constraints and enabling
their declarative handling has been explored in [19].
An abstract model is linked to a graphic model and a
refinement process is proposed. Geometric constraints
[94] require dedicated constraint solvers according to
targeted domains. In [118], we propose an approach
based on hypergraph to handle table extraction. Hyper-
graphs [15] are classic extensions of graphs and enable
more powerful models. Hence, after suitable modeling,
one may consider table extraction in a document as
an isomorphism problem in hypergraphs [14]. The sub-
isomorphism problem is NP-Complete [29] and its com-
plexity has been refined according to parameters [86].
Solvers, such as the Glasgow solver [87] are available to
solve this problem as well as efficient algorithms [128]
including recent quantum search algorithms [84]. In a
recent work [74], the author proposes to represent tables
as planar graphs with cell regions as their faces. They
generate junction confidence maps and line fields using
heatmap regression networks. Their approach mixes
deep NN and constrained optimization problems.
F. TURNING TO EFFICIENT SOLUTIONS FOR INDUSTRY
As a starting point, it might be worthwhile to delve
into the intricacies of Extraction, Transformation, and
Loading (ETL) processes [136], which form the back-
bone of operations within a data warehouse architecture,
with the aim of acquiring data from diverse document
sources, each characterized by its potential multimodal
attributes. A critical dimension in this context is the
recognition that data assimilation stems from a variety
of document origins. The multifaceted nature of these
documents underscores the complexity of the task at
hand. Furthermore, automated document processing
systems must exhibit the capability to update data at
regular intervals, emphasizing the need for real-time
adaptability. Following these lines, Figure 4 encapsu-
lates the multifunctional essence of information extrac-
tion from invoices. It provides a visual representation
of the intricate multitasking inherent in the information
extraction workflow.
Some industry solutions offer a partial solution to the
total ETL process. They are based on plugins designed
for each information retrieval task. For instance, the
FIGURE 4. Different steps for in an ETL process
Azure4solution developed by Microsoft offers numerous
APIs for processing documents including OCR and
NER. The ABBY solution is split into different programs
: Flexicapture for OCR and FlexiLayout for extracting
data from a document using templates.
Transformers maybe now used to provide end to
end solutions and address various modalities related to
document processing tasks, such as classification, ques-
tion answering or NER [32], [70]. The diverse nature
of documents necessitates multimodal reasoning that
encompasses various types of inputs [8]. These inputs,
including visual, textual, and layout elements, are found
in a variety of document sources. These aspects may
be considered for developing efficient invoices processing
tools.
IV. CONCLUSION
In conclusion, invoices are crucial documents for compa-
nies as they serve as proof of purchase and are necessary
for accounting and tax purposes. The processing of
invoices can be time-consuming and prone to errors,
but recent advances in technology have led to the de-
velopment of systems that automate the process. These
systems use a combination of OCR, NLP, and machine
learning techniques to digitize paper invoices and extract
relevant information. The processing of invoices involves
different steps such as document digitization, informa-
tion extraction, and data validation, and specific work-
flows are often used to ensure efficiency and accuracy.
The challenge of processing invoices lies in handling the
variability of layouts, language, and terminology, and
the presence of errors or inaccuracies in the data.
In this survey, we have reviewed the essential com-
ponents that must be taken into account when de-
veloping an automated invoice processing system. Our
goal is to provide valuable insights to researchers and
engineers striving to create end-to-end solutions, and
in this pursuit, several critical factors demand careful
consideration:
Document Quality: The quality of the documents
input for processing plays a crucial role. Standard
digitized invoices can often be handled with rela-
tively basic OCR systems. However, when dealing
4https://azure.microsoft.com/en-us
VOLUME 10, 2022 9
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
with documents exhibiting orientation issues or
containing handwritten sections, a more sophisti-
cated image processing pipeline and highly efficient
text recognition are imperative. Real-world finan-
cial documents, for instance, may feature handwrit-
ten notes from employees seeking reimbursements,
making document quality a critical determinant.
Invoice Content: The nature of the invoice content
is another crucial consideration. In cases where
invoices consist of limited and concise information,
without extensive descriptions or intricate commer-
cial terms, employing simple Named Entity Recog-
nition (NER) techniques based on a compact model,
as exemplified in Figure 2, suffices. Conversely, for
more complex scenarios, the integration of Natural
Language Processing (NLP) techniques becomes
essential to delve into the semantic nuances of
scanned texts.
Layout Diversity: The diversity of invoice layouts
cannot be underestimated. When documents are
associated with a finite number of suppliers or
clients, rule-based techniques designed to match
predefined layouts can be harnessed. Moreover,
these techniques may offer flexibility, allowing end-
users to fine-tune the system to visually locate and
extract key information from invoices.
Annotated Data Sets: Machine learning techniques,
while powerful, rely heavily on sizable and repre-
sentative training datasets for optimal performance.
As mentioned in this survey, rule-based approaches
can often be generic enough to process invoices ef-
fectively without necessitating extensive supervised
learning processes.
Table Diversity and Quality: Tables within invoices
represent a pivotal aspect of the processing pipeline.
While basic tables can be detected using image
processing and neural network-based algorithms,
more complex scenarios emerge when tables are
incomplete and exhibit considerable diversity, often
due to variations in invoice layouts. In such cases,
recent graph-based algorithms present a compelling
and efficient alternative.
By taking these facets into account, engineers can
embark on the development of robust, efficient, and
adaptable automated invoice processing systems that
cater to a wide spectrum of real-world invoice scenarios.
In this context, hybrid methods, combining both rule-
based and neural network approaches.
In recent times, there has been a notable emergence
of large language models (LLM). These models present
promising prospects for document processing by inte-
grating structural and semantic recognition to achieve
effective extraction of information from both structured
and semi-structured documents.
V. APPENDIX : BIBLIOGRAPHIC TABLES
Main topic References
OCR techniques [37], [59], [66], [83], [93], [98]
[10], [77]
Text detection techniques [16], [147]
NER approaches [75], [95], [107], [145], [154]
Table processing [30], [38], [64], [150]
Convolutionnal networks [79]
Invoice processing [52]
Graph neural networks [148]
Information retrieval [51]
TABLE 1. Summary of cited surveys
Name Desc. Ref.
CORD Receipt Dataset for [101]
Post-OCR Parsing
rvl-cdip-invoice set of invoices [53]
extracted from RVL-CDIP
GHEGA-DATASET labeled dataset for [88]
document understanding
research experiments
ICDAR2019 competition on scanned [58]
receipt ocr and
information extraction.
FUNSD Form Understanding [60]
in Noisy Scanned
Documents challenge
PubLayNet dataset for document [158]
layout analysis
TABLE 2. Summary of available datasets for document analysis and
recognition
Name Desc. Ref.
cTDar annotated documents [45]
with table entities
SciTSR large-scale table structure [25]
recognition dataset
PubTabNet image-based table recognition [157]
WikiTableSet publicly available [82]
image-based table recognition
dataset in three languages
built from Wikipedia
TABLE 3. Summary of available datasets for table analysis
Reference Topic
[18] open source OCR solution
[89] handwritten character recognition
[28] handwritten OCR
[96] text recognition using deep learning
[62] deep learning based OCR
[92] OCR solution including image to speech
transformation
[98] benchmark sets for OCR
[44] OpenCV system
[158] neural network based OCR
[58] description of Icdar2019 competition on scanned
receipt
[10] A survey into OCR specialized for medical reports.
[77] A technique based on transformer architecture for
OCR and a benchmark with modern solutions
TABLE 4. Summary of main cited works on OCR
10 VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
Reference Topic
[133] seminal work on data extraction
[7] computational-geometry algorithms for analyzing
document structures
[2] handling multiple types of data structures
[146] considering relations between data
[131] orientation of documents
[16] document layout analysis
[11] data sets for evaluation
[81] seminal work on pdf documents management
[48] data extraction from tables
[113] table extraction for pdf documents
[41] table detection for multipage pdf documents
[24] solving of the maximum independent set of
rectangles problem
[149] pdf2table : method for extracting table
[46] graph neural network for extracting tables from
pdf documents
[152] deep learning for pdf table extraction
[104] presentation of TAO for table detection and
extraction
TABLE 5. Summary of main cited works on data extraction
Reference Topic
[85] seminal work on NER
[135] NER Challenge at CoNLL
[35] ACE program : challenge for NER systems
[20] empirical study of NER
[3] procedure to automatically extend an ontology
with domain specific knowledge
[40] system for NER in the open domain
[90] model architectures for computing continuous
vector representations of words (word2vec)
[91] distributed architectures for word2vec
[126] adaptation of word2vec to NER
[73] neural networks for NER
[27] neural networks for NER
[4] bidirectional recurrent neural network for NER
[34] presentation of BERT
[54] combination of convolutional neural network
with BERT
[115] application of bert-cnn for an application in
health care
[105] presentation of ELMO, a model language word
representation
[36] use of ELMO for NER
[116] Bert-cnn for speech identification
[111] enhancing language comprehension
through pre-training
[43] data extraction from financial documents
[130] state of the NER for French language
[52] specific work on invoices
[26] rule-based information extraction systems
[103] information extraction from scanned invoices
[5] constraint satisfaction for invoice processing
ABBY a commercial system for NER
TABLE 6. Summary of main cited works on NER
Ref. Topic
[122] reference work on table extraction
[45] ICDAR 2019 Competition on Table Detection
and Recognition
[69] the T-Recs system for table recognition
[23] algorithm for searching parallel lines in documents
to extract tables
[61] proposal for the representation of tables
(Wang Notation Tool)
[158] Publaynet, a data bank for table extraction
[76] TableBank, a data bank for table extraction
[108] presentation of CascadeTabNet an end to end system
using Convolutional Neural Networks
[57] LayoutLMv3: a general-purpose pre-trained model
for documents
[67] Convolutional Neural Network for table detection
[68] approach based on bi-directional gated recurrent
unit networks
[47] deep learning for table detection
[137] approach based on a generative adversarial network
[112] two step approach that combines cell detection and
interaction module
[125] DeepTabStR : a deep learning based system for
table recognition
[134] use of a novel deep learning models
(Split and Merge models)
[99] explainability to get the semantic structures of tables
[100] Tablenet : end to end solution for table extraction
[120] DeepDeSRT : end to end solution for table extraction
[156] PubTabNet : end to end solution for table extraction
[155] GTE : end to end solution for table extraction
[114] use of Graph Neural Network for table extraction
[140] system for automatically generating ground truth data
for training table detection algorithms
[106] introduction of conditional random fields to manage
layouts of a table
[80] presentation of TableSeer, a search engine for tables
[65] an end-to-end table structure recognition system
using a Yolo-based object detector
[121] segmentation techniques for tables
[124] presentation of TabbyPDF: heuristic-based approach
to table detection and structure recognition
[117] approach that uses a graph-based
representation of documents
[109] architecture that combines convolutional neural
networks and graph networks
[144] presentation of TGRNet an end-to-end trainable
table graph reconstruction network
[153] presentation of VSR a combination of computer
vision and NLP techniques
[110] LGPMA a system that uses the concept
of Local and Global Pyramid Mask Alignment
TABLE 7. Summary of main cited works on table Extraction
VOLUME 10, 2022 11
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
REFERENCES
[1] ICDAR 2nd International Conference Document Analysis
and Recognition. IEEE Computer Society, 1993.
[2] Ily Amalina Sabri Ahmad and Mustafa Man. Multiple types
of semi-structured data extraction using wrapper for extrac-
tion of image using dom (weid). In Regional Conference on
Science, Technology and Social Sciences (RCSTSS 2016),
pages 67–76. Springer, 2018.
[3] Enrique Alfonseca and Suresh Manandhar. An unsupervised
method for general named entity recognition and automated
concept discovery. In Proceedings of the 1st international
conference on general WordNet, 2002.
[4] Mohammed N. A. Ali, Guanzheng Tan, and Aamir Hussain.
Bidirectional recurrent neural network approach for arabic
named entity recognition. Future Internet, 10(12):123, 2018.
[5] Jakob Andersson. Automatic invoice data extraction as a
constraint satisfaction problem, 2020.
[6] Halil Arslan. End to end invoice processing application
based on key fields extraction. IEEE Access, 10:78398–
78413, 2022.
[7] Henry S Baird. Background structure in document images.
International Journal of Pattern Recognition and Artificial
Intelligence, 8(05):1013–1030, 1994.
[8] Souhail Bakkali, Zuheng Ming, Mickaël Coustaty, Marçal
Rusiñol, and Oriol Ramos Terrades. Vlcdoc: Vision-
language contrastive pre-training model for cross-modal
document classification. Pattern Recognit., 139:109419,
2023.
[9] Chiara Bardelli, Alessandro Rondinelli, Ruggero Vecchio,
and Silvia Figini. Automatic electronic invoice classification
using machine learning models. Mach. Learn. Knowl. Extr.,
2(4):617–629, 2020.
[10] Pulkit Batra, Nimish Phalnikar, Deepesh Kurmi, Jitendra
Tembhurne, Parul Sahare, and Tausif Diwan. Ocr-mrd: Per-
formance analysis of different optical character recognition
engines for medical report digitization. 2023.
[11] Dipali Baviskar, Swati Ahirrao, and Ketan Kotecha. Multi-
layout invoice document dataset (MIDD): A dataset for
named entity recognition. Data, 6(7):78, 2021.
[12] Dipali Baviskar, Swati Ahirrao, Vidyasagar M. Potdar,
and Ketan Kotecha. Efficient automated processing of
the unstructured documents using artificial intelligence: A
systematic literature review and future directions. IEEE
Access, 9:72894–72936, 2021.
[13] Abdel Belaïd, Vincent Poulain D’Andecy, Hatem Hamza,
and Yolande Belaïd. Administrative document analysis
and structure. In Marenglen B. and Fatos X., editors,
Learning Structure and Schemas from Documents, Studies
in Computational Intelligence. Springer, 2011.
[14] Claude Berge. Isomorphism problems for hypergraphs. In
Hypergraph Seminar, pages 1–12. Springer, 1974.
[15] Claude Berge. Graphs and Hypergraphs. Elsevier Science
Ltd, 1985.
[16] Showmik Bhowmik, Ram Sarkar, Mita Nasipuri, and
David S. Doermann. Text and non-text separation in
offline document images: a survey. Int. J. Document Anal.
Recognit., 21(1-2):1–20, 2018.
[17] Carlo Biagioli, Enrico Francesconi, Andrea Passerini, Si-
monetta Montemagni, and Claudia Soria. Automatic se-
mantics extraction in law documents. In Proceedings of the
10th international conference on Artificial intelligence and
law, pages 133–140, 2005.
[18] Thomas M Breuel. The ocropus open source ocr system. In
Document recognition and retrieval XV, volume 6815, page
68150F. International Society for Optics and Photonics,
2008.
[19] Théo Le Calvar, Fabien Chhel, Frédéric Jouault, and
Frédéric Saubion. Toward a declarative language to generate
explorable sets of models. In Proceedings of the 34th
ACM/SIGAPP Symposium on Applied Computing, pages
1837–1844, 2019.
[20] Helena Ceovic, Adrian Satja Kurdija, Goran Delac, and
Marin Silic. Named entity recognition for addresses: An
empirical study. IEEE Access, 10:42094–42106, 2022.
[21] Francesca Cesarini, Enrico Francesconi, Marco Gori, Simone
Marinai, JQ Sheng, and Giovanni Soda. Rectangle labelling
for an invoice understanding system. In Proceedings of the
Fourth ICDAR. IEEE, 1997.
[22] Francesca Cesarini, Enrico Francesconi, Simone Marinai,
Jianqing Sheng, and Giovanni Soda. Conceptual modelling
for invoice document processing. In Roland R. Wagner,
editor, Eighth International Workshop on Database and
Expert Systems Applications, DEXA ’97, Toulouse, France,
September 1-2, 1997, Proceedings, pages 596–603. IEEE
Computer Society, 1997.
[23] Francesca Cesarini, Simone Marinai, L. Sarti, and Giovanni
Soda. Trainable table location in document images. In 16th
International Conference on Pattern Recognition, ICPR
2002, Quebec, Canada, August 11-15, 2002, pages 236–240.
IEEE Computer Society, 2002.
[24] Parinya Chalermsook and Julia Chuzhoy. Maximum inde-
pendent set of rectangles. In Proceedings of the twenti-
eth annual ACM-SIAM symposium on discrete algorithms,
pages 892–901. SIAM, 2009.
[25] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx-
uan Yin, and Xian-Ling Mao. Complicated table structure
recognition. arXiv preprint arXiv:1908.04729, 2019.
[26] Laura Chiticariu, Yunyao Li, and Frederick Reiss. Rule-
based information extraction is dead! long live rule-based
information extraction systems! In Proceedings of the
2013 conference on empirical methods in natural language
processing, pages 827–832, 2013.
[27] Jason PC Chiu and Eric Nichols. Named entity recognition
with bidirectional lstm-cnns. Transactions of the Associa-
tion for Computational Linguistics, 4:357–370, 2016.
[28] Amit Choudhary, Rahul Rishi, and Savita Ahlawat. Un-
constrained handwritten digit ocr using projection profile
and neural network approach. In Proceedings of the In-
ternational Conference on Information Systems Design and
Intelligent Applications 2012 (INDIA 2012) held in Visakha-
patnam, India, January 2012, pages 119–126. Springer, 2012.
[29] Stephen A Cook. The complexity of theorem-proving proce-
dures. In Proceedings of the third annual ACM symposium
on Theory of computing, pages 151–158, 1971.
[30] Andreiwid Sheffer Corrêa and Pär-Ola Zander. Unleashing
tabular content to open data: A survey on PDF table
extraction methods and tools. In Charles C. Hinnant and
Adegboyega Ojo, editors, Proceedings of the 18th Annual
International Conference on Digital Government Research,
DG.O 2017, Staten Island, NY, USA, June 7-9, 2017, pages
54–63. ACM, 2017.
[31] Vincent Poulain D’Andecy, Emmanuel Hartmann, and
Marçal Rusiñol. Field extraction by hybrid incremental and
a-priori structural templates. In 13th IAPR, DAS 2018,
pages 251–256. IEEE Computer Society, 2018.
[32] Brian L. Davis, Bryan S. Morse, Brian L. Price, Chris
Tensmeyer, Curtis Wigington, and Vlad I. Morariu. End-to-
end document recognition and understanding with dessurt.
In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino,
editors, Computer Vision - ECCV 2022 Workshops - Tel
Aviv, Israel, October 23-27, 2022, Proceedings, Part IV,
volume 13804 of Lecture Notes in Computer Science, pages
280–296. Springer, 2022.
[33] Andreas Dengel and Bertin Klein. smartfix: A requirements-
driven system for document analysis and understanding. In
Daniel P. Lopresti, Jianying Hu, and Ramanujan S. Kashi,
editors, 5th DAS, volume 2423 of Lecture Notes in Computer
Science, pages 433–444. Springer, 2002.
[34] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina
Toutanova. Bert: Pre-training of deep bidirectional trans-
formers for language understanding. In Jill Burstein, Christy
Doran, and Thamar Solorio, editors, Proceedings of the
2019 Conference of the North American Chapter of the As-
sociation for Computational Linguistics: Human Language
Technologies, NAACL-HLT 2019, Minneapolis, MN, USA,
June 2-7, 2019, Volume 1 (Long and Short Papers), pages
4171–4186. Association for Computational Linguistics, 2019.
12 VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
[35] George R. Doddington, Alexis Mitchell, Mark A. Przybocki,
Lance A. Ramshaw, Stephanie M. Strassel, and Ralph M.
Weischedel. The automatic content extraction (ACE) pro-
gram - tasks, data, and evaluation. In Proceedings of the
Fourth International Conference on Language Resources
and Evaluation, LREC 2004, May 26-28, 2004, Lisbon,
Portugal. European Language Resources Association, 2004.
[36] Cihan Dogan, Aimore Dutra, Adam Gara, Alfredo Gemma,
Lei Shi, Michael Sigamani, and Ella Walters. Fine-grained
named entity recognition using elmo and wikidata. arXiv
preprint arXiv:1904.10503, 2019.
[37] Line Eikvil. Optical character recognition. citeseer. ist. psu.
edu/142042. html, 26, 1993.
[38] David W. Embley, Matthew Hurst, Daniel P. Lopresti,
and George Nagy. Table-processing paradigms: a research
survey. Int. J. Document Anal. Recognit., 8(2-3):66–86,
2006.
[39] Daniel Esser, Daniel Schuster, Klemens Muthmann, Michael
Berger, and Alexander Schill. Automatic indexing of
scanned documents: a layout-based approach. In Document
recognition and retrieval XIX, volume 8297, page 82970H.
International Society for Optics and Photonics, 2012.
[40] Richard Evans and Stafford Street. A framework for named
entity recognition in the open domain. Recent advances
in natural language processing III: selected papers from
RANLP, 260(267-274):110, 2003.
[41] Jing Fang, Liangcai Gao, Kun Bai, Ruiheng Qiu, Xin Tao,
and Zhi Tang. A table detection method for multipage PDF
documents via visual seperators and tabular structures. In
2011 International Conference on Document Analysis and
Recognition, ICDAR 2011, Beijing, China, September 18-
21, 2011, pages 779–783. IEEE Computer Society, 2011.
[42] Enrico Francesconi, Simonetta Montemagni, Wim Peters,
and Daniela Tiscornia. Semantic processing of legal texts:
Where the language of law meets the law of language,
volume 6036. Springer, 2010.
[43] Sumam Francis, Jordy Van Landeghem, and Marie-Francine
Moens. Transfer learning for named entity recognition in
financial and biomedical documents. Information, 10(8):248,
2019.
[44] Ayushe Gangal, Peeyush Kumar, and Sunita Kumari.
Complete scanning application using opencv. CoRR,
abs/2107.03700, 2021.
[45] Liangcai Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meu-
nier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Maria
Lang. ICDAR 2019 competition on table detection and
recognition (ctdar). In 2019 ICDAR, pages 1510–1515.
IEEE, 2019.
[46] Andrea Gemelli, Emanuele Vivoli, and Simone Marinai.
Graph neural networks and representation embedding for
table extraction in PDF documents. In 26th International
Conference on Pattern Recognition, ICPR 2022, Montreal,
QC, Canada, August 21-25, 2022, pages 1719–1726. IEEE,
2022.
[47] Azka Gilani, Shah Rukh Qasim, Muhammad Imran Malik,
and Faisal Shafait. Table detection using deep learning. In
14th IAPR, ICDAR 2017, pages 771–776. IEEE, 2017.
[48] Max C. Göbel, Tamir Hassan, Ermelinda Oro, Giorgio
Orsi, and Roya Rastan. Table modelling, extraction and
processing. In Robert Sablatnig and Tamir Hassan, editors,
Proceedings of the 2016 ACM Symposium on Document
Engineering, DocEng 2016, Vienna, Austria, September 13
- 16, 2016, pages 1–2. ACM, 2016.
[49] Venu Govindaraju, Prem Natarajan, Santanu Chaudhury,
and Daniel P. Lopresti, editors. Proceedings of the Inter-
national Workshop on Multilingual OCR, MOCR@ICDAR
2009, Barcelona, Spain, July 25, 2009. ACM, 2009.
[50] Hien Thi Ha and Ales Horák. Information extraction
from scanned invoice images using text analysis and layout
features. Signal Process. Image Commun., 102:116601, 2022.
[51] Kailash A Hambarde and Hugo Proenca. Information
retrieval: Recent advances and beyond. arXiv preprint
arXiv:2301.08801, 2023.
[52] Ahmed Hamdi, Elodie Carel, Aurélie Joseph, Mickaël Cous-
taty, and Antoine Doucet. Information extraction from
invoices. In Josep Lladós, Daniel Lopresti, and Seiichi
Uchida, editors, ICDAR 2021, Proceedings, Part II, volume
12822 of Lecture Notes in Computer Science, pages 699–714.
Springer, 2021.
[53] Adam W. Harley, Alex Ufkes, and Konstantinos G. Derpa-
nis. Evaluation of deep convolutional nets for document
image classification and retrieval. In 13th International
Conference on Document Analysis and Recognition, ICDAR
2015, Nancy, France, August 23-26, 2015, pages 991–995.
IEEE Computer Society, 2015.
[54] Changai He, Sibao Chen, Shilei Huang, Jian Zhang, and
Xiao Song. Using convolutional neural network with bert
for intent determination. In 2019 International Conference
on Asian Language Processing (IALP), pages 65–70. IEEE,
2019.
[55] Kaijian He, Qian Yang, Lei Ji, Jingcheng Pan, and Yingchao
Zou. Financial time series forecasting with the deep learning
ensemble model. Mathematics, 11(4), 2023.
[56] David Hollingsworth. The workflow reference model, 1994.
[57] Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu
Wei. Layoutlmv3: Pre-training for document AI with unified
text and image masking. In João Magalhães, Alberto Del
Bimbo, Shin’ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda,
Qin Jin, Vincent Oria, and Laura Toni, editors, MM ’22: The
30th ACM International Conference on Multimedia, Lisboa,
Portugal, October 10 - 14, 2022, pages 4083–4091. ACM,
2022.
[58] Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimos-
thenis Karatzas, Shijian Lu, and CV Jawahar. Icdar2019
competition on scanned receipt ocr and information extrac-
tion. In 2019 ICDAR, pages 1516–1520. IEEE, 2019.
[59] Noman Islam, Zeeshan Islam, and Nazia Noor. A survey on
optical character recognition system. Journal of Information
and Communication Technology (JICT), 2017.
[60] Guillaume Jaume, Hazim Kemal Ekenel, and Jean-Philippe
Thiran. FUNSD: A dataset for form understanding in
noisy scanned documents. In 2nd International Work-
shop on Open Services and Tools for Document Analysis,
OST@ICDAR 2019, Sydney, Australia, September 22-25,
2019, pages 1–6. IEEE, 2019.
[61] Piyushee Jha and George Nagy. Wang notation tool: Layout
independent representation of tables. In ICPR 2008, pages
1–4. IEEE Computer Society, 2008.
[62] Yuxiang Jiang, Haiwei Dong, and Abdulmotaleb El-Saddik.
Baidu meizu deep learning competition: Arithmetic opera-
tion recognition using end-to-end learning OCR technolo-
gies. IEEE Access, 6:60128–60136, 2018.
[63] Ralph H. Sprague Jr. Electronic document management:
Challenges and opportunities for information systems man-
agers. MIS Q., 19(1):29–49, 1995.
[64] Mahmoud Kasem, Abdelrahman Abdallah, Alexander
Berendeyev, Ebrahem Elkady, Mahmoud Abdalla, Mo-
hamed Mahmoud, Mohamed Hamada, Daniyar Nurseitov,
and Islam Taj-Eddin. Deep learning for table detection and
structure recognition: A survey, 2022.
[65] Tejas Kashinath, Twisha Jain, Yash Agrawal, Tanvi Anand,
and Sanjay Singh. End-to-end table structure recognition
and extraction in heterogeneous documents. Appl. Soft
Comput., 123:108942, 2022.
[66] Sukhandeep Kaur, Seema Bawa, and Ravinder Kumar. A
survey of mono- and multi-lingual character recognition
using deep and shallow architectures: indic and non-indic
scripts. Artif. Intell. Rev., 53(3):1813–1872, 2020.
[67] I. Kavasidis, C. Pino, S. Palazzo, F. Rundo, D. Giordano,
P. Messina, and C. Spampinato. A saliency-based convo-
lutional neural network for table and chart detection in
digitized documents. In Elisa Ricci, Samuel Rota Bulò, Cees
Snoek, Oswald Lanz, Stefano Messelodi, and Nicu Sebe,
editors, Image Analysis and Processing ICIAP 2019, pages
292–302, Cham, 2019. Springer International Publishing.
[68] Saqib Ali Khan, Syed Muhammad Daniyal Khalid, Muham-
mad Ali Shahzad, and Faisal Shafait. Table structure
VOLUME 10, 2022 13
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
extraction with bi-directional gated recurrent unit networks.
In 2019 ICDAR, pages 1366–1371. IEEE, 2019.
[69] Thomas Kieninger and Andreas Dengel. The t-recs table
recognition and analysis system. In Seong-Whan Lee and
Yasuaki Nakano, editors, Document Analysis Systems: The-
ory and Practice, Third IAPR Workshop, DAS’98, Nagano,
Japan, November 4-6, 1998, Selected Papers, volume 1655
of Lecture Notes in Computer Science, pages 255–269.
Springer, 1998.
[70] Geewook Kim, Teakgyu Hong, Moonbin Yim, JeongYeon
Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sang-
doo Yun, Dongyoon Han, and Seunghyun Park. Ocr-
free document understanding transformer. In Shai Avi-
dan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria
Farinella, and Tal Hassner, editors, Computer Vision -
ECCV 2022 - 17th European Conference, Tel Aviv, Israel,
October 23-27, 2022, Proceedings, Part XXVIII, volume
13688 of Lecture Notes in Computer Science, pages 498–
517. Springer, 2022.
[71] Mario Köppen, Dörte Waldöstl, and Bertram Nickolay.
A system for the automated evaluation of invoices. In
Jonathan J. Hull and Suzanne Liebowitz Taylor, editors,
DAS 1996, volume 29 of Series in Machine Perception and
Artificial Intelligence, pages 223–241. WorldScientific, 1996.
[72] Sargur N Srihari Stephen W Lam. Character recognition.
IETE Journal of Education, 17(3):154–156, 1976.
[73] Guillaume Lample, Miguel Ballesteros, Sandeep Subrama-
nian, Kazuya Kawakami, and Chris Dyer. Neural archi-
tectures for named entity recognition. In Kevin Knight,
Ani Nenkova, and Owen Rambow, editors, NAACL HLT
2016, The 2016 Conference of the North American Chapter
of the Association for Computational Linguistics: Human
Language Technologies, San Diego California, USA, June
12-17, 2016, pages 260–270. The Association for Computa-
tional Linguistics, 2016.
[74] Eunji Lee, Jaewoo Park, Hyung Il Koo, and Nam Ik Cho.
Deep-learning and graph-based approach to table structure
recognition. Multim. Tools Appl., 81(4):5827–5848, 2022.
[75] Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. A survey
on deep learning for named entity recognition. IEEE Trans.
Knowl. Data Eng., 34(1):50–70, 2022.
[76] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou,
and Zhoujun Li. Tablebank: Table benchmark for image-
based table detection and recognition. In Proceedings of
The 12th Language Resources and Evaluation Conference,
LREC 2020, Marseille, France, May 11-16, 2020, pages 1918–
1925. European Language Resources Association, 2020.
[77] Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan
Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei.
Trocr: Transformer-based optical character recognition with
pre-trained models. In Proceedings of the AAAI Conference
on Artificial Intelligence, volume 37, pages 13094–13102,
2023.
[78] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and
Xianhui Liu. Gfte: graph-based financial table extraction.
In International Conference on Pattern Recognition, pages
644–658. Springer, 2021.
[79] Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun
Zhou. A survey of convolutional neural networks: analysis,
applications, and prospects. IEEE transactions on neural
networks and learning systems, 2021.
[80] Ying Liu, Kun Bai, Prasenjit Mitra, and C Lee Giles. Table-
seer: automatic table metadata extraction and searching in
digital libraries. In Proceedings of the 7th ACM/IEEE-CS
joint conference on Digital libraries, pages 91–100, 2007.
[81] Will Lovegrove. Advanced document analysis and automatic
classification of PDF documents. PhD thesis, University of
Nottingham, UK, 1996.
[82] Nam Tuan Ly, Atsuhiro Takasu, Phuc Nguyen, and
Hideaki Takeda. Rethinking image-based table recogni-
tion using weakly supervised methods. arXiv preprint
arXiv:2303.07641, 2023.
[83] John Mantas. An overview of character recognition method-
ologies. Pattern recognition, 19(6):425–430, 1986.
[84] Nicola Mariella and Andrea Simonetto. A quantum al-
gorithm for the sub-graph isomorphism problem. ACM
Transactions on Quantum Computing, 4(2):1–34, 2023.
[85] Elaine Marsh and Dennis Perzanowski. Muc-7 evaluation
of ie technology: Overview of results. In Seventh Message
Understanding Conference (MUC-7): Proceedings of a Con-
ference Held in Fairfax, Virginia, April 29-May 1, 1998, 1998.
[86] Dániel Marx and Michał Pilipczuk. Everything you always
wanted to know about the parameterized complexity of
subgraph isomorphism (but were afraid to ask). In 31st In-
ternational Symposium on Theoretical Aspects of Computer
Science, page 542, 2014.
[87] Ciaran McCreesh, Patrick Prosser, and James Trimble. The
glasgow subgraph solver: using constraint programming to
tackle hard subgraph isomorphism problem variants. In
International Conference on Graph Transformation, pages
316–324. Springer, 2020.
[88] Eric Medvet, Alberto Bartoli, and Giorgio Davanzo. A
probabilistic approach to printed document understanding.
Int. J. Document Anal. Recognit., 14(4):335–347, 2011.
[89] Jamshed Memon, Maira Sami, Rizwan Ahmed Khan, and
Mueen Uddin. Handwritten optical character recogni-
tion (OCR): A comprehensive systematic literature review
(SLR). IEEE Access, 8:142642–142668, 2020.
[90] Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.
Efficient estimation of word representations in vector space.
In Yoshua Bengio and Yann LeCun, editors, 1st Interna-
tional Conference on Learning Representations, ICLR 2013,
Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track
Proceedings, 2013.
[91] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado,
and Jeff Dean. Distributed representations of words and
phrases and their compositionality. In Advances in neural
information processing systems, pages 3111–3119, 2013.
[92] Ravina Mithe, Supriya Indalkar, and Nilam Divekar. Op-
tical character recognition. International journal of recent
technology and engineering (IJRTE), 2(1):72–75, 2013.
[93] Shunji Mori, Ching Y Suen, and Kazuhiko Yamamoto. His-
torical review of ocr research and development. Proceedings
of the IEEE, 80(7):1029–1058, 1992.
[94] Adel Moussaoui. Geometric Constraint Solver. PhD thesis,
Ecole nationale Supérieure d’Informatique (ex INI), Alger,
2016.
[95] David Nadeau and Satoshi Sekine. A survey of named entity
recognition and classification. Lingvisticae Investigationes,
30(1):3–26, 2007.
[96] Tayyab Nasir, Muhammad Kamran Malik, and Khurram
Shahzad. MMU-OCR-21: towards end-to-end urdu text
recognition using deep learning. IEEE Access, 9:124945–
124962, 2021.
[97] Michael Netter, Eduardo B. Fernández, and Günther Pernul.
Refining the pattern-based reference model for electronic
invoices by incorporating threats. In ARES 2010, Fifth
International Conference on Availability, Reliability and
Security, 15-18 February 2010, Krakow, Poland, pages 560–
564. IEEE Computer Society, 2010.
[98] Clemens Neudecker, Konstantin Baierer, Mike Gerber,
Christian Clausner, Apostolos Antonacopoulos, and Stefan
Pletschacher. A survey of OCR evaluation tools and metrics.
In Apostolos Antonacopoulos, Christian Clausner, Maud
Ehrmann, Clemens Neudecker, and Stefan Pletschacher,
editors, HIP@ICDAR 2021: The 6th International Workshop
on Historical Document Imaging and Processing, Lausanne,
Switzerland, September 5-6, 2021, pages 13–18. ACM, 2021.
[99] Kyosuke Nishida, Kugatsu Sadamitsu, Ryuichiro Hi-
gashinaka, and Yoshihiro Matsuo. Understanding the se-
mantic structures of tables with a hybrid deep neural
network architecture. In Thirty-First AAAI Conference on
Artificial Intelligence, 2017.
[100] Shubham Singh Paliwal, Vishwanath D, Rohit Rahul,
Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning
model for end-to-end table detection and tabular data
extraction from scanned document images. In 2019 Interna-
tional Conference on Document Analysis and Recognition,
14 VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
ICDAR 2019, Sydney, Australia, September 20-25, 2019,
pages 128–133. IEEE, 2019.
[101] Seunghyun Park, Seung Shin, Bado Lee, Junyeop Lee,
Jaeheung Surh, Minjoon Seo, and Hwalsuk Lee. Cord: A
consolidated receipt dataset for post-ocr parsing. 2019.
[102] Shreeshiv Patel and Dvijesh Bhatt. Abstractive information
extraction from scanned invoices (AIESI) using end-to-end
sequential approach. CoRR, abs/2009.05728, 2020.
[103] Shreeshiv Patel and Dvijesh Bhatt. Abstractive information
extraction from scanned invoices (aiesi) using end-to-end
sequential approach. arXiv preprint arXiv:2009.05728, 2020.
[104] Martha O Perez-Arriaga, Trilce Estrada, and Soraya Abad-
Mota. Tao: system for table detection and extraction from
pdf documents. In The Twenty-Ninth International Flairs
Conference, 2016.
[105] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and Luke Zettle-
moyer. Deep contextualized word representations. In
Marilyn A. Walker, Heng Ji, and Amanda Stent, editors,
Proceedings of the 2018 Conference of the North American
Chapter of the Association for Computational Linguistics:
Human Language Technologies, NAACL-HLT 2018, New
Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long
Papers), pages 2227–2237. Association for Computational
Linguistics, 2018.
[106] David Pinto, Andrew McCallum, Xing Wei, and W Bruce
Croft. Table extraction using conditional random fields. In
Proceedings of the 26th annual international ACM SIGIR
conference on Research and development in informaion
retrieval, pages 235–242, 2003.
[107] Gorjan Popovski, Barbara Korousic-Seljak, and Tome Efti-
mov. A survey of named-entity recognition methods for food
information extraction. IEEE Access, 8:31586–31594, 2020.
[108] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish
Visave, and Kavita Sultanpure. Cascadetabnet: An ap-
proach for end to end table detection and structure recog-
nition from image-based documents. In 2020 IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
CVPR Workshops 2020, Seattle, WA, USA, June 14-19,
2020, pages 2439–2447. Computer Vision Foundation /
IEEE, 2020.
[109] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait.
Rethinking table recognition using graph neural networks.
In 2019 International Conference on Document Analysis and
Recognition, ICDAR 2019, Sydney, Australia, September
20-25, 2019, pages 142–147. IEEE, 2019.
[110] Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang,
Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, and Fei Wu.
LGPMA: complicated table structure recognition with local
and global pyramid mask alignment. In Josep Lladós, Daniel
Lopresti, and Seiichi Uchida, editors, 16th International
Conference on Document Analysis and Recognition, ICDAR
2021, Lausanne, Switzerland, September 5-10, 2021, Pro-
ceedings, Part I, volume 12821 of Lecture Notes in Computer
Science, pages 99–114. Springer, 2021.
[111] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya
Sutskever, et al. Improving language understanding by
generative pre-training. 2018.
[112] Sachin Raja, Ajoy Mondal, and C. V. Jawahar. Table
structure recognition using top-down and bottom-up cues.
In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-
Michael Frahm, editors, Computer Vision - ECCV 2020 -
16th European Conference, Glasgow, UK, August 23-28,
2020, Proceedings, Part XXVIII, volume 12373 of Lecture
Notes in Computer Science, pages 70–86. Springer, 2020.
[113] Roya Rastan, Hye-Young Paik, John Shepherd, Seung Hwan
Ryu, and Amin Beheshti. TEXUS: table extraction sys-
tem for PDF documents. In Junhu Wang, Gao Cong,
Jinjun Chen, and Jianzhong Qi, editors, Databases Theory
and Applications - 29th Australasian Database Conference,
ADC 2018, Gold Coast, QLD, Australia, May 24-27, 2018,
Proceedings, volume 10837 of Lecture Notes in Computer
Science, pages 345–349. Springer, 2018.
[114] Pau Riba, Anjan Dutta, Lutz Goldmann, Alicia Fornés,
Oriol Ramos Terrades, and Josep Lladós. Table detection
in invoice documents by graph neural networks. In 2019
ICDAR. IEEE, 2019.
[115] Mariana Rodrigues Makiuchi, Tifani Warnita, Kuniaki Uto,
and Koichi Shinoda. Multimodal fusion of bert-cnn and
gated cnn representations for depression detection. In Pro-
ceedings of the 9th International on Audio/Visual Emotion
Challenge and Workshop, pages 55–63, 2019.
[116] Ali Safaya, Moutasem Abdullatif, and Deniz Yuret. Kuisail
at semeval-2020 task 12: Bert-cnn for offensive speech iden-
tification in social media. In Proceedings of the Fourteenth
Workshop on Semantic Evaluation, pages 2054–2059, 2020.
[117] KC Santosh and Abdel Belaïd. Pattern-based approach
to table extraction. In Iberian Conference on Pattern
Recognition and Image Analysis, pages 766–773. Springer,
2013.
[118] Thomas Saout, Frédéric Lardeux, and Frédéric Saubion. A
two-stage approach for table extraction in invoices. CoRR,
abs/2210.04716, 2022.
[119] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus
Hagenbuchner, and Gabriele Monfardini. The graph neural
network model. IEEE Trans. Neural Networks, 20(1):61–80,
2009.
[120] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel,
and Sheraz Ahmed. Deepdesrt: Deep learning for detection
and structure recognition of tables in document images. In
14th IAPR International Conference on Document Analysis
and Recognition, ICDAR 2017, Kyoto, Japan, November 9-
15, 2017, pages 1162–1167. IEEE, 2017.
[121] Sharad C. Seth and George Nagy. Segmenting tables via
indexing of value cells by table headers. In 12th ICDAR
2013, pages 887–891. IEEE Computer Society, 2013.
[122] Faisal Shafait and Ray Smith. Table detection in heteroge-
neous documents. In David S. D., Venu G., Daniel P. L;,
and Premkumar N., editors, The Ninth IAPR, DAS 2010.
ACM, 2010.
[123] Andrey Shapenko, Vladimir Korovkin, and Benoit Leleux.
Abbyy: the digitization of language and text. Emerald
Emerging Markets Case Studies, 8:1–26, 04 2018.
[124] Alexey O. Shigarov, Andrey Altaev, Andrey A. Mikhailov,
Viacheslav Paramonov, and Evgeniy A. Cherkashin. Tab-
bypdf: Web-based system for PDF table extraction. In
Robertas D. and Giedre Vasiljeviene, editors, ICIST 2018,
Proceedings, Communications in Computer and Informa-
tion Science, 2018.
[125] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tah-
seen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed.
Deeptabstr: Deep learning based table structure recognition.
In 2019 International Conference on Document Analysis and
Recognition, ICDAR 2019, Sydney, Australia, September
20-25, 2019, pages 1403–1409. IEEE, 2019.
[126] Scharolta Katharina Sienčnik. Adapting word2vec to named
entity recognition. In Proceedings of the 20th Nordic Con-
ference of Computational Linguistics (NODALIDA 2015),
pages 239–243, 2015.
[127] Ray Smith. An overview of the tesseract OCR engine. In
9th ICDAR, pages 629–633. IEEE Computer Society, 2007.
[128] Christine Solnon. Experimental evaluation of subgraph
isomorphism solvers. In International Workshop on Graph-
Based Representations in Pattern Recognition, pages 1–13.
Springer, 2019.
[129] Enrico Sorio, Alberto Bartoli, Giorgio Davanzo, and Eric
Medvet. Open world classification of printed invoices. In
Apostolos Antonacopoulos, Michael J. Gormish, and Rolf
Ingold, editors, Proceedings of the 2010 ACM Symposium
on Document Engineering, Manchester, United Kingdom,
September 21-24, 2010, pages 187–190. ACM, 2010.
[130] Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller,
Laurent Romary, and Benoît Sagot. Establishing a new
state-of-the-art for french named entity recognition. In
LREC 2020-12th Language Resources and Evaluation Con-
ference, 2020.
VOLUME 10, 2022 15
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
[131] Yingyi Sun, Xianfeng Mao, Sheng Hong, Wenhua Xu, and
Guan Gui. Template matching-based method for intelligent
invoice information identification. IEEE access, 7:28392–
28401, 2019.
[132] Ahmad Tarawneh, Ahmad Hassanat, Dmitry Chetverikov,
Imre Lendak, and Chaman Verma. Invoice classification
using deep features and machine learning techniques. 03
2019.
[133] Suzanne Liebowitz Taylor, Richard Fritzson, and Jon A
Pastor. Extraction of data from preprinted forms. Machine
Vision and Applications, 5(3):211–222, 1992.
[134] Chris Tensmeyer, Vlad I. Morariu, Brian L. Price, Scott
Cohen, and Tony R. Martinez. Deep splitting and merging
for table structure decomposition. In 2019 International
Conference on Document Analysis and Recognition, ICDAR
2019, Sydney, Australia, September 20-25, 2019, pages 114–
121. IEEE, 2019.
[135] Erik F Tjong Kim Sang and Fien De Meulder. Introduction
to the conll-2003 shared task: language-independent named
entity recognition. In Proceedings of the seventh conference
on Natural language learning at HLT-NAACL 2003-Volume
4, pages 142–147, 2003.
[136] Panos Vassiliadis and Alkis Simitsis. Extraction, transfor-
mation, and loading. Encyclopedia of Database Systems,
10, 2009.
[137] Nataliya Le Vine, Matthew Zeigenfuse, and Mark Rowan.
Extracting tables from documents using conditional gen-
erative adversarial networks and genetic algorithms. In
International Joint Conference on Neural Networks, IJCNN
2019 Budapest, Hungary, July 14-19, 2019, pages 1–8. IEEE,
2019.
[138] Joris Voerman, Aurélie Joseph, Mickaël Coustaty, Vin-
cent Poulain D’Andecy, and Jean-Marc Ogier. Evaluation of
neural network classification systems on document stream.
In Xiang Bai, Dimosthenis Karatzas, and Daniel Lopresti,
editors, Document Analysis Systems - 14th IAPR Inter-
national Workshop, DAS 2020, Wuhan, China, July 26-
29, 2020, Proceedings, volume 12116 of Lecture Notes in
Computer Science, pages 262–276. Springer, 2020.
[139] Qianwen Wang and Mizuho Iwaihara. Deep neural architec-
tures for joint named entity recognition and disambiguation.
In IEEE International Conference on Big Data and Smart
Computing, BigComp 2019, Kyoto, Japan, February 27 -
March 2, 2019, pages 1–4. IEEE, 2019.
[140] Yalin Wangt, Ihsin T Phillipst, and Robert Haralick. Au-
tomatic table ground truth generation and a background-
analysis-based table structure extraction method. In Pro-
ceedings of Sixth International Conference on Document
Analysis and Recognition, pages 528–532. IEEE, 2001.
[141] Zhiwen Xiao, Haoxi Zhang, Huagang Tong, and Xin Xu. An
efficient temporal network with dual self-distillation for elec-
troencephalography signal classification. In 2022 IEEE In-
ternational Conference on Bioinformatics and Biomedicine
(BIBM), pages 1759–1762, 2022.
[142] Huanlai Xing, Zhiwen Xiao, Rong Qu, Zonghai Zhu, and
Bowen Zhao. An efficient federated distillation learning sys-
tem for multitask time series classification. IEEE Transac-
tions on Instrumentation and Measurement, 71:1–12, 2022.
[143] Huanlai Xing, Zhiwen Xiao, Dawei Zhan, Shouxi Luo,
Penglin Dai, and Ke Li. Selfmatch: Robust semisupervised
time-series classification with self-distillation. International
Journal of Intelligent Systems, 37(11):8583–8610, 2022.
[144] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao,
and Qingyong Li. Tgrnet: A table graph reconstruction
network for table structure recognition. In 2021 IEEE/CVF
International Conference on Computer Vision, ICCV 2021,
Montreal, QC, Canada, October 10-17, 2021, pages 1275–
1284. IEEE, 2021.
[145] Vikas Yadav and Steven Bethard. A survey on recent
advances in named entity recognition from deep learning
models. In Proceedings of the 27th International Conference
on Computational Linguistics, pages 2145–2158, 2018.
[146] Limin Yao, Sebastian Riedel, and Andrew McCallum. Col-
lective cross-document relation extraction without labelled
data. In Proceedings of the 2010 Conference on Empirical
Methods in Natural Language Processing, pages 1013–1023,
2010.
[147] Qixiang Ye and David S. Doermann. Text detection and
recognition in imagery: A survey. IEEE Trans. Pattern Anal.
Mach. Intell., 37(7):1480–1500, 2015.
[148] Zi Ye, Yogan Jaya Kumar, Goh Ong Sing, Fengyan Song,
and Junsong Wang. A comprehensive survey of graph neural
networks for knowledge graphs. IEEE Access, 10:75729–
75741, 2022.
[149] Burcu Yildiz, Katharina Kaiser, and Silvia Miksch.
pdf2table: A method to extract table information from pdf
files. In IICAI, pages 1773–1785, 2005.
[150] Richard Zanibbi, Dorothea Blostein, and James R. Cordy.
A survey of table recognition. Int. J. Document Anal.
Recognit., 2004.
[151] Liang Zhang and Huan Zhao. Named entity recognition
for chinese microblog with convolutional neural network. In
Yong Liu, Liang Zhao, Guoyong Cai, Guoqing Xiao, Kenli
Li, and Lipo Wang, editors, 13th International Conference
on Natural Computation, Fuzzy Systems and Knowledge
Discovery, ICNC-FSKD 2017, Guilin, China, July 29-31,
2017, pages 87–92. IEEE, 2017.
[152] Mengshi Zhang, Daniel Perelman, Vu Le, and Sumit Gul-
wani. An integrated approach of deep learning and symbolic
analysis for digital PDF table extraction. In 25th Inter-
national Conference on Pattern Recognition, ICPR 2020,
Virtual Event / Milan, Italy, January 10-15, 2021, pages
4062–4069. IEEE, 2020.
[153] Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang
Pu, Yi Niu, and Fei Wu. VSR: A unified framework for
document layout analysis combining vision, semantics and
relations. In Josep Lladós, Daniel Lopresti, and Seiichi
Uchida, editors, 16th International Conference on Document
Analysis and Recognition, ICDAR 2021, Lausanne, Switzer-
land, September 5-10, 2021, Proceedings, Part I, volume
12821 of Lecture Notes in Computer Science, pages 115–
130. Springer, 2021.
[154] Kaihong Zheng, Lingyun Sun, Xin Wang, Shangli Zhou,
Hanbin Li, Sheng Li, Lukun Zeng, and Qihang Gong. Named
entity recognition in electric power metering domain based
on attention mechanism. IEEE Access, 9:152564–152573,
2021.
[155] Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong,
and Nancy Xin Ru Wang. Global table extractor (GTE):
A framework for joint table identification and cell struc-
ture recognition using visual context. In IEEE Winter
Conference on Applications of Computer Vision, WACV
2021, Waikoloa, HI, USA, January 3-8, 2021, pages 697–706.
IEEE, 2021.
[156] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno-
Yepes. Image-based table recognition: Data, model, and
evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox,
and Jan-Michael Frahm, editors, Computer Vision - ECCV
2020 - 16th European Conference, Glasgow, UK, August 23-
28, 2020, Proceedings, Part XXI, volume 12366 of Lecture
Notes in Computer Science, pages 564–580. Springer, 2020.
[157] Xu Zhong, Elaheh ShafieiBavani, and Antonio Ji-
meno Yepes. Image-based table recognition: data, model,
and evaluation. In European conference on computer vision,
pages 564–580. Springer, 2020.
[158] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Pub-
laynet: largest dataset ever for document layout analysis. In
2019 (ICDAR), pages 1015–1022. IEEE, 2019.
16 VOLUME 10, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Saoutet al.: An Overview of Data Extraction from Invoices
THOMAS SAOUT Was born in Brest, France
in 1992. In 2020, he earned an MS degree in
Computer Science, specializing in Decision
Intelligence, from the University of Angers.
After working for 6 months as a JAVA de-
veloper at KS2, a French company that pro-
duces ERP solutions, he began his PhD at
the University of Angers with the LERIA in
2021. His research focuses on Evolutionary
Algorithms, Information Retrieval, Natural
Language Processing, and Graph Pattern Recognition.
FRÉDÉRIC LARDEUX Was born in France in
1979. He received the MS and the PhD
degrees in computer science from the Uni-
versity of Angers, France in 2002 and 2005,
respectively.
Since 2006, he is professor with the
LERIA, University of Angers, France. His
research interests include Constraints (CSP,
SAT), Model transformations, Combina-
torial optimization, Metaheuristics, Evolu-
tionary Computation, Learning (Reinforcement learning, Machine
learning), and Logical Analysis of Data.
FRÉDÉRIC SAUBION Received his MS degree
and PhD degree in computer science from
the University of Orléans (France) in 1996.
From 1997 to 2003, he was an assis-
tant professor at the University of Angers
(France). He is full professor since 2004
with the faculty of science at the University
of Angers. His research interests include
Metaheuristics, Evolutionary Computation,
Machine Learning. He has supervised a
dozen of PhDstudents. He has contributed to the autonomous
search paradigm that consists in improving the automated setting
and control of solving algorithms, in particular thanks to machine
learning techniques. He has also investigated different application
domains (biology, information retrieval. . . ).
VOLUME 10, 2022 17
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3360528
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper provides an extensive and thorough overview of the models and techniques utilized in the first and second stages of the typical information retrieval processing chain. Our discussion encompasses the current state-of-the-art models, covering a wide range of methods and approaches in the field of information retrieval. We delve into the historical development of these models, analyze the key advancements and breakthroughs, and address the challenges and limitations faced by researchers and practitioners in the domain. By offering a comprehensive understanding of the field, this survey is a valuable resource for researchers, practitioners, and newcomers to the information retrieval domain, fostering knowledge growth, innovation, and the development of novel ideas and techniques.
Article
Full-text available
With the continuous development of financial markets worldwide to tackle rapid changes such as climate change and global warming, there has been increasing recognition of the importance of financial time series forecasting in financial market operation and management. In this paper, we propose a new financial time series forecasting model based on the deep learning ensemble model. The model is constructed by taking advantage of a convolutional neural network (CNN), long short-term memory (LSTM) network, and the autoregressive moving average (ARMA) model. The CNN-LSTM model is introduced to model the spatiotemporal data feature, while the ARMA model is used to model the autocorrelation data feature. These models are combined in the ensemble framework to model the mixture of linear and nonlinear data features in the financial time series. The empirical results using financial time series data show that the proposed deep learning ensemble-based financial time series forecasting model achieved superior performance in terms of forecasting accuracy and robustness compared with the benchmark individual models.
Article
In the modern era, the necessity of digitization is increasing in a rapid manner day-to-day. The healthcare industries are working towards operating in a paperless environment. Digitizing the medical lab records help the patients in hassle-free management of their medical data. It may also prove beneficial for insurance companies for designing various medical insurance policies which can be patient-centric rather than being generalized. Optical Character Recognition (OCR) technology is demonstrated its usefulness for such cases and thus, to know the best possible solution for digitizing the medical lab records, there is a need to perform an extensive comparative study on the different OCR techniques available for this purpose. It is observed that the current research is focused mainly on the pre-processing image techniques for OCR development, however, their effects on OCR performance specially for medical report digitization yet not been studied. Herein this work, three OCR Engines viz Tesseract, EasyOCR and DocTR, and six pre-processing techniques: image binarization, brightness transformations, gamma correction, sigmoid stretching, bilateral filtering and image sharpening are surveyed in detail. In addition, an extensive comparative study of the performance of the OCR Engines while applying the different combinations of the image pre-processing techniques, and their effect on the OCR accuracy is presented.
Article
Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at https://aka.ms/trocr.
Chapter
We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.KeywordsDocument understandingEnd-to-endHandwriting recognitionForm understandingOCR
Conference Paper
Over the years, several deep learning algorithms have been proposed for electroencephalography (EEG) signal classification. The performance of any learning method usually relies on the quality of the learned representation that provides semantic information for downstream tasks such as classification. Thus, it is crucial to improve the model’s representation learning capability. This paper proposes an Efficient Temporal Network with dual self-distillation for EEG signal classification, ETNEEG. It enhances the model’s representation learning by promoting mutual learning between higher-level and lower-level semantic information. The proposed ETNEEG consists of two main components: a parallel dual-network-based feature extractor called MLN-GRN and a dual self-distillation module. MLN-GRN includes a multi-scale local network (MLN) and a global relation network (GRN). MLN pays attention to local features of EEG data, and GRN is designed for learning global patterns of EEG data. Meanwhile, the dual self-distillation module extracts semantic information by mutual learning among the output layer and the low-level features. To evaluate the proposed method’s performance, seven widely used public EEG datasets, i.e., FaceDetection, FingerMovements, HandMovementDirection, MotorImagery, PenDigits, SelfRegulationSCP1, and SelfRegulationSCP2, are applied to a set of experiments. Experimental results demonstrate that the proposed ETNEEG achieves excellent performance on these datasets compared with fourteen existing algorithms.