ChapterPDF Available

A Novel Method for Filtering a Useful Subset of Composite Linguistic Summaries

Authors:

Abstract

Selecting a subset of linguistic summaries and providing them in a user-friendly and compact form is a latent issue in the field of Linguistic Data Summarization. The paper proposes a method for filtering the most useful subset, for a given decision problem, from a set of composite linguistic summaries. Those summaries embody Evidence, Contrast or Emphasis relations, inspired by the Rhetorical Structure Theory. The summaries’ usefulness is determined according to the relevance of the attributes contained in each one. The strategy followed is based on first finding the Evidence relation whose nucleus contains the better possible representation of the problem attributes, then searching for a Contrast relation and an Emphasis relation that share that nucleus. The method output is a scheme that synthesizes and combines the texts of the three relations. The paper provides an illustrative example in which the most useful relations are found from a dataset of 63 crimes to solve a case of bank document forgery.
A Novel Method for Filtering a Useful Subset
of Composite Linguistic Summaries
Carlos R. Rodríguez Rodríguez1,2(B), Marieta Peña Abreu1,
Denis Sergeevich Zuev2, Yarina Amoroso Fernández1,3 ,
and Yeleny Zulueta Véliz1
1University of Informatics Sciences, Havana, Cuba
crodriguezr@uci.cu
2Kazan Federal University, Kazan, Russia
3National Union of Cuban Jurists, Havana, Cuba
Abstract. Selecting a subset of linguistic summaries and providing them in a
user-friendly and compact form is a latent issue in the field of Linguistic Data
Summarization. The paper proposes a method for filtering the most useful subset,
AQ1
for a given decision problem, from a set of composite linguistic summaries. Those
summaries embody Evidence, Contrast or Emphasis relations, inspired by the
Rhetorical Structure Theory. The summaries’ usefulness is determined according
to the relevance of the attributes contained in each one. The strategy followed
is based on first finding the Evidence relation whose nucleus contains the better
possible representation of the problem attributes, then searching for a Contrast
relation and an Emphasis relation that share that nucleus. The method output is a
scheme that synthesizes and combines the texts of the three relations. The paper
provides an illustrative example in which the most useful relations are found from
a dataset of 63 crimes to solve a case of bank document forgery.
AQ2
Keywords: Linguistic descriptions of data ·Linguistic data summarization ·
Natural language generation ·Expressiveness of linguistic summaries
1 Introduction
Linguistic data summarization (LDS) is a descriptive knowledge discovery technique
to produce summaries from a database using natural language [1]. Several authors have
extended the original LDS approach [2,3] by defining different stereotyped forms for
structuring summaries, proposing new indicators to measure their quality, using differ-
ent techniques to generate them, and applying these developments to a wide range of
problems. Pupo et al. [4] provide a comprehensive review about these topics.
AQ3
The structure of linguistic summaries (LS), and of any kind of information, is a key
factor of their actual usefulness. The usefulness of LS depends, among other criteria, on
their expressiveness [5]. Stereotyped forms for structuring LS, called protoforms, were
initially proposed by Zadeh [6] and then presented as a hierarchy of abstract prototypes
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
Y. Hernández Heredia et al. (Eds.): IWAIPR 2023, LNCS 14335, pp. 1–13, 2024.
https://doi.org/10.1007/978-3-031-49552-6_16
Author Proof
2 C. R. Rodríguez Rodríguez et al.
[7]. The protoforms have been extended for different problems, but their original forms
has been the most widespread [4,8], which are defined as in (1)or(2):
TQX are
−→
have Y(1)
TQFX are
−→
have Y(2)
where Yis a summarizer (e.g., have sentences from 24 to 42 months); Xis the object
(e.g., FBCD crimes)Qis a quantity in agreement given as a fuzzy linguistic quantifier
(e.g., many); and Tis the truth degree of the summary in [0, 1]. In (Eq. 2), a qualifier
F(e.g., with circumstances 80.1(c) and 79.1(a)) is added. Fis a filter to get a specific
data subset. The following is a summary like (Eq. 2): T (Many FBCD crimes with
circumstances 80.1(c) and 79.1(a), have sentences from 24 to 42 months) =1.
These protoforms consist of quantified sentences that are said not to be delivered
directly to the user due to their lack of expressiveness. Moreover, they are usually handled
individually without taking into account the relationships between them [8]. For these
reasons, several approaches aim to improve the expressiveness of LS [813]. Among
these contributions, CLS-QD stands as a model for generating composite linguistic
summaries from qualitative data [13]. All these proposals improve the LS understanding,
but they do not propose ways to select the proper subset of LS for an instance of a decision
problem. That is, they provide an improved description of data, but do not show how to
filter those LS in a dynamic decision-making environment.
In order to address this issue, this paper proposes a method to select the three most
useful summaries, for a given situation, from all those generated with the CLS-QD model
[13]. For this purpose, Sect. 2briefly reviews the CLS-QD model; Sect. 3presents the
model for selecting and assembling the most useful summaries; and Sect. 4describes an
illustrative example of its applicability.
2 A Short Overview of CLS-QD Model
CLS-QD model was formalized in [13] and presented in an implementable form in [14].
Those summaries embody relations of Evidence (Pe), Contrast (Pc) or Emphasis (Ph),
inspired by the Rhetorical Structure Theory [15,16]. A relation (Pr) involves at least
two constituent statements, which can function as nuclei (PN) or satellites (PS), and
which are semantically linked by a relation r, i.e., by a specific connector.
The constituent statements are the classical protoforms of LS (see Eq. (1) and Eq. (2)),
which we will call type-I and type-II statements, respectively. Xand Ycan be simple or
complex predicates. A simple predicate consists of a single pair (attribute: value), and a
complex one comprises two or more pairs.
For measuring the quality of any relation Pr, CLS-QD defines three metrics: the
truth degree T(Pr), the relation strength S(Pr)and the coverage degree S(Pr).
An Evidence relation Peprovides one main statement (the nucleus, PN) and one or
more supporting statements (the satellites, PS), which supply finer-grained information
that validates the nucleus. Its general structure is:
Pe=PN,evidence connectorPS(3)
Author Proof
A Novel Method for Filtering a Useful Subset 3
The nucleus of Pecan be a type-I or type-II statement. The satellite can be one or
more non-overlapping type-II statements. In Pe, the satellites semantically support the
nucleus, i.e., all constituent statements of relation share the same consequent.
AComposite relation Pcconsists of two nuclei, which provide contrasting
information about the same attributes of the analyzed problem. Its general structure
is:
Pc=PN1,contrast connectorPN2(4)
Both nuclei can be type-I or type-II statements and can have complex predicates in
their antecedents and consequents, but at least one pair of predicates must be different.
An Emphasis relation Phcombines two similar statements in which the second one
(the satellite) has an additional predicate that specifies the main feature of the objects
described by the first one (the nucleus). Its general structure is:
Ph=PN,emphasis connectorPS(5)
In Ph, the statement that functions as the nucleus is, in turn, the antecedent of the
satellite, and the consequent of the satellite contains a different predicate that emphasizes
a feature of the nucleus. The nucleus can be a type-I or type-II statement. Meanwhile,
the satellite has been constrained to only one statement of type-II.
3 A Method for Filtering and Assembling the Most Useful Relations
The strategy for selecting the most useful composite relations aims at identifying those
Evidence, Contrast and Emphasis relations (one per type), which, according to value
of their attributes, are the most helpful for the specific situation addressed. Therefore,
selecting these relations is not a simple search for those that maximize the values of
T(Pr),S(Pr)and C(Pr)metrics defined in [13] for measuring its quality. That is to say,
a relation that is the most useful one for solving a problem instance may not be useful for
another one, even if in both cases the same subset of attributes is involved. The method
comprises five activities (see Fig. 1).
Fig. 1. Method flowchart.
The strategy focuses attention on the nucleus of Evidence relation because, by its
self-definition, this relation exposes a main statement (the nucleus) and the satellite
Author Proof
4 C. R. Rodríguez Rodríguez et al.
provides information that helps to increase the credibility of the nucleus. Therefore,
finding the most representative nucleus of the attributes brings implicitly other infor-
mation (the satellite) that supports it. At the same time, from that nucleus it is possible
to find a Contrast relation that shares it and contains another nucleus in which the tar-
get attribute (summarizer) has a different value. Similarly, it is possible to find another
Emphasis relation that shares the nucleus of the Evidence relation and that its satellite
highlights an additional property of the cases described by the nucleus. We believe that
this compendium, harmoniously assembled, could be useful to assist decision-makers.
Remark. The analysis and selection of the most useful relations works mainly with
the relation’s attributes, it does not take into account the Qvalues and the Tvalues are
considered only as a second resource for selecting. So, the constituent statements (Eq. (1)
and Eq. (2)), will be handled as XYand FX Y, respectively, where Xand FX
comprise the set of predictor attributes and Ycomprise the set of target attributes.
3.1 Establishing an Attribute Preference Ranking
The selection strategy takes into account that not all predictor attributes of the analyzed
problem may be present in the same relation. For this reason, it is initially necessary to
establish a ranking among these attributes according to its relevance, significance, or the
intensity of its values for the specific situation to be solved. That is to say, given a set
of n predictor attributes, X={xi|i(1,...,n)}, it is necessary to obtain the ordered
set X=xj... xnwhere j(1,...,n)and xjdenotes the j-th most relevant
attribute. Knowing this ranking of preference, it is then possible to search for the relations
that best represent such attributes.
Several approaches can be explored to set a ranking of attributes, including:
Employing feature selection techniques.
Using a ranking pre-established by domain experts.
Obtaining the ranking after the decision makers assess the relevance of each attribute
for the specific situation they are solving.
3.2 Selecting the Most Useful Evidence Relation
In order to select the most useful Evidence relation, Pe, for each value of the target
attribute, the relation whose nucleus contains in its antecedent the best possible repre-
sentation of the attributes is found. The best attributes representation is when all of them
are present. Otherwise, the relation whose nucleus contains in its antecedent the best
possible subset of attributes according to the previous ranking must be found.
Example 1: Let us consider a problem with three predictor attributes ranked as fol-
lows: X={x1x2x3}and a target attribute, y, with four possible values. By
applying the CLS-QD model, it is theoretically possible to obtain, for any value of the
target attribute, y, the Evidence relations whose nuclei are shown in Fig. 2a). Such
nuclei, with the form FX Y, are ordered as shown in Fig. 2b) according to the
attributes ranking, i.e., x1,x2,x3yis the most representative (useful) nucleus.
But in a real case it is unlikely that all relations would be obtained for a single value
of the target attribute. Instead, it is usual to find relations for several or all values of the
Author Proof
A Novel Method for Filtering a Useful Subset 5
target attribute, as shown in Fig. 3. If the Evidence relations obtained were those whose
nuclei are shown in Fig. 3a), the order of representativeness would be: x1,x2,x3y2>
x1,x2y1>x1,x3y3>x2,x3y4, therefore, the most useful relation would
be the one to which the nucleus x1,x2,x3y2belongs.
Fig. 2 Nuclei prototypes of all possible Evidence relations that contain at least one attribute in
the antecedent for any value of the target attribute y.
b)
2,3 2
1 2
2 2
1,2,3 3
2,3 3
3 3
1
,
2
,
3
3
1,2,3 1
2 1
3 1
1
,
2
,
3
1
1 4
3 4
1,3 3
2,3 3
3 3
1
,
3
3
2,3 1
2 1
3 1
1,3 2
1 2
2 2
1
,
3
2
2 4
3 4
c)
1,2,3 2
1 2
2 2
1
,
2
,
3
2
1,3 3
2,3 3
3 3
2,3 4
1 4
3 4
1,2 1
2 1
3 1
Fig. 3 Prototypes of Evidence relations nuclei that contain at least one attribute in the antecedent
for the four possible values of the target attribute.
On the other hand, if the relations obtained were those whose nuclei are shown in
Fig. 3b) or Fig. 3c), where several nuclei share the same antecedent, then it would be
necessary to apply the following decision rules:
R1.If the relations nuclei share the same antecedent, and it contains all the attributes
(see Fig. 3b)), then select the one whose nucleus has the highest value of T.
R1.1.If the nuclei have the same value of T,then select the relation with the
highest value of T(Pe), and if these values are equal, then select the one with the
highest value of S(Pe).
R2.If the relations nuclei share the same antecedent, but it does not contain all
the attributes (see Fig. 4c)), then select the one that has the most representative
satellite.
Author Proof
6 C. R. Rodríguez Rodríguez et al.
R2.1.If the satellites share the same attributes (see Fig. 4c)), then select the relation
with the highest value of T(Pe), and if these values are equal, then select the one
with the highest value of S(Pe).
Example 2: Let us consider now a problem with three predictor attributes ranked as
follows: X={x1x2x3}, and the Evidence relations shown in Fig. 4.
Given the Pe
1and Pe
2relations (Fig. 4a)), the satellite of Pe
1contains all three attributes,
is the most representative, so, the Pe
1relation is the most useful one.
In Pe
3and Pe
4(Fig. 4b)), the both satellites contain a subset of attributes, the satellite
of Pe
4has the two most relevant attributes, so Pe
4is the most useful relation.
In Fig. 4c), both satellites contain the same attributes, so the most useful relation is
found by applying the R2.1 rule.
:1,3 2
:1, , 3 2
:1,3 3
:1,3 3
12
:
,
,
3
2
:2 2
:2,3 2
:2 3
:,2 3
3 4
:
,
2
3
b)
:2 2
:1,2 2
:2 3
:1,2 3
56
:
,
2 2
:
,
2
3
c)
:1 1
1:1,3 1
2:1,3 1
:1 3
1:1,3
2:1,3
910
1
:
,
3
2
:
,
3
e) :2 2
1:2,3 2
2:2,3 2
:2 3
1:2,3 3
2:2,3 3
11 12
1
:
2
,
3
2
2
:
2
,
3
2
1
:
2
,
3
3
2
:
2
,
3
3
f)
d) :1 2
1:1,3 2
2:1,3 2
:1 1
:1,1
7 8
:
1
,
1
Fig. 4. Paired prototypes of Evidence relations with equal antecedents in its nuclei. The relations
are compared according to its satellites.
On the other hand, it should be reminded that the Evidence relations may contain
more than one satellite, but they share the same set of attributes, so:
In Fig. 4d), the satellite of Pe
8contains a better representation of the attributes than
satellites of Pe
7,soPe
8is the most useful relation although it has only one satellite.
In Fig. 4e), both relations contain two satellites, but the satellites of Pe
10 include more
representative attributes than the satellites of Pe
9,soPe
10 is the most useful relation.
Finally, in Fig. 4f), both relations contain the same attributes in its satellites, so the
most useful relation is found by applying the R2.1 rule.
3.3 Selecting the Most Useful Contrast Relation
Selecting the most useful Contrast relation, Pc, depends on the nucleus of the previously
selected Evidence relation, Pe. This nucleus will be, in turn, the first nucleus of the
Contrast relation, i.e., PN1Pc=PNPe. The aim of this dependence is to find
Author Proof
A Novel Method for Filtering a Useful Subset 7
another statement (second nucleus of the Contrast relation, PN2Pc) that relates the
same predictor attributes to another value of the target attribute.
In order to select the Pcrelation, the following tasks are performed:
1. Find all Contrast relations which contain the Penucleus and a second constituent
statement with another value of the target attribute given some combination of the
same attributes.
Let XYbe the nucleus of the Perelation and ABbe the other constituent
statement that will compose the Contrast relation. The constraints to be met by the
candidate Contrast relations are specified below:
(((A=X)(AX)(XA)) P_Dif (Y,B))
(((X=B)(BX)(XB)) P_Dif (Y,A)) (6)
where P_Dif (K,L)is a constraint which checks that the predicates KPN1and
LPN2have at least one equal attribute, but with different values (see Eq. 7).
P_Dif (K,L)=∃
kiK,ljL
attki=attlj;valki= vallj;i=1,2, ..., n;j=1,2, ..., n(7)
2. When there are several relations that meet such restrictions, then select the relation
whose nucleus PN2better represents the attributes, operating in the same way as
described for selecting the nucleus of the Perelation.
3. If in more than one relation the nucleus PN2have the same representation of the
attributes, then select the relation with the highest value of T(Pc).
3.4 Selecting the Most Useful Emphasis Relation
Selecting the most useful Emphasis relation, Ph, also depends on the nucleus of the
previously selected Evidence relation, Pe. This nucleus will be, in turn, the nucleus of
the Emphasis relation, i.e., PNPh=PNPe. The aim of this dependency is to
find another statement (satellite of the Emphasis relation, PSPh) that highlights the
most common property among objects described by the nucleus.
In order to select the Phrelation, the following tasks are performed:
1. Find all Emphasis relations whose nuclei are the same as that of the Perelation.
2. Among them, select the one with the highest value of S(Pe).
3. If several relations have the same value of S(Pe), then select the one whose satellite
has the highest value of T.
3.5 Assembling the Pe*,Pc*and Ph*Relations
Delivering linguistic summaries in a user-friendly way, which facilitates their under-
standing and, therefore, increases their usefulness in decision making, is still a latent
need. For this reason, we first find the statement (nucleus) most representative of the
problem attributes and around it we select the most useful Pe,Pcand Phrelations.
These relations share the same nucleus. Therefore, in order to clean up the data to
be delivered to the user, to facilitate its understanding, it is necessary to eliminate the
fragments of information repeated in the three relations. To this end, the capabilities of
Author Proof
8 C. R. Rodríguez Rodríguez et al.
graphic representation are used to provide information in a synthesized and logically
structured manner. Thus, a knowledge graph (KG) representing the relationships between
Pe,Pcand Phis create (see Fig. 5). The KG is based on the relation assemblysection
of the CNLSummaries language (available at http://bit.ly/CNL_Summaries). CNLSummaries
is a controlled natural language specified for creating the constituent summaries, for
generating the composite relations, and for assembling a subset of relations.
Fig. 5. Knowledge graph of the most useful relations.
4 Illustrative Example
To better understand how the method works, an illustrative example is developed using
data from criminal cases. The procedure consists of three tasks:
1. Retrieving data and results of the experiment 3 reported in [17].
2. Generating candidate relations from a dataset of similar cases using CLS-QD [13].
3. Applying our proposal to the relation set obtained with CLS-QD taking as input the
circumstances of experiment 3[17] ranked according to their intensity.
Completing Task 1. The following relevant case information was retrieved:
Crime: 333.1 Forgery of banking or commercial documents (FBCD)
Original punishment interval: 2–5 years (24–60 months)
Mitigating circumstances: 79.1 (a) and 79.1 (c)
Aggravating Circumstances: 80.1 (c)
Ranking of circumstances according to its intensity: 80.1 (c) >79.1 (a) >79.1 (c)
Completing Task 2. Adataset of 98 similar previous cases judged in cassation by
the Criminal Chamber of the Supreme People’s Court was compiled. It comprises six
attributes without missing values (see Table 1). Based on the case information retrieved
in task 1, the dataset was preprocessed:
Records of other crime types were eliminated, as well as FBCD offenses involving
circumstances different from those of the analyzed case, finally leaving 63 cases.
Author Proof
A Novel Method for Filtering a Useful Subset 9
The attribute “punishment” was discretized as follows. First, the value expressed in
years was translated into months. Then, since the original punishment interval fore-
seen in the CPC for the FDBC offense ranges from two to five years of imprisonment,
the numerical values of the attribute “punishment” were transformed into one of the
following four labels: {i1=[12 months; 23 months], i2=[24 months; 42 months],
i3=[43 months; 60 months], i4=[61 months; 90 months]}.
Table 1. Attributes of the criminal case dataset.
Name Description Example
1Crime Crime code and
denomination
333.1 Forgery of banking or
commercial documents
2Mitigating circumstances List of mitigating
circumstance codes
From 79.1(a) to 79.1(k)
3Aggravating circumstances List of aggravating
circumstance codes
From 80.1(a) to 80.1(r)
4 Response to appeal Court response to
defendant’s appeal
Accepted / Rejected
5 Punishment Length of penalty (in
years)
3 years
Then, to obtain the composite relations from 63 cases obtained after data prepro-
cessing, the CLS-QD model [13] was applied using the Java implementation developed
in [14] and the CNLSummaries language, with the following settings:
For generating the association rules, the Apriori algorithm (Weka version) were
invoked using the following configuration: numRules =100,metricType =Confi-
dence,minMetric =0.5,delta =0.05,minSupport =0.05.
Parameters minT (Algorithms 1 and 2 in [14]) and minConf (Algorithm 1 in [14])
were set to 0.5.
For computing T(Pr),theminimum t-norm was used.
The set of fuzzy linguistic quantifiers were modeled with trapezoidal fuzzy sets as
follows: Q={About_half =[0.42, 0.48, 0.52, 0.58], Many =[0.52, 0.58, 1, 1], Most
=[0.72, 0.78, 1, 1], Almost_all =[0.92, 0.98, 1, 1]}.
For creating the Contrast relations, the contrast degree between each pair of labels
(li,lj)for attribute “punishment” was set the unity, i.e., µR(li,lj)=1.
We discarded the relations were S(Pr)<0.5.
As a result, four Evidence relations, seven Contrast relations and five Emphasis
relations were generated, whose statistical values are shown in Table 2.
Author Proof
10 C. R. Rodríguez Rodríguez et al.
Completing Task 3. Our proposal was applied to the relations derived from task 2.
Mitigating and aggravating circumstances were taken as predictor attributes and were
ranked according to the intensity assigned by the judges: 80.1 (c) >79.1 (a) >79.1 (c).
Performing the activities defined in 3.2, the Perelation selected was the following:
PN:Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were met,
have been sentenced with punishments from 24 to 42 months.
PS:Almost all FBCD (333.1) crimes in which circumstances 80.1(c), 79.1(a) y 79.1(c)
were met, have been sentenced with punishments from 24 to 42 months.
TPe=1;SPe=0.87;CPe=0.61
By verbalization via the evidence relationsection of CNLSummaries, the relation is
shown as:
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met, have been sentenced with punishments from 24 to 42 months, since almost
all FBCD (333.1) crimes in which circumstances 80.1(c), 79.1(a) and 79.1(c)
were met, have been sentenced with punishments from 24 to 42 months.
Table 2. Statistical values of the composite relations generated in task 2.
Composite relations Measures T(Pr)S(Pr)C(Pr)
Evidence, Pe
number: 4
Min 0.79 0.63 0.05
Max 10,94 0.69
Mean 0.9132 0.8832 0.4625
StdDev 0.1163 0.1045 0.1564
Contrast, Pc
number: 7
Min 0.68 0.5 0.21
Max 10,75 0.78
Mean 0.7864 0.6721 0.5430
StdDev 0.1437 0.1889 0.1539
Emphasis, Ph
number: 5
Min 0.71 0.61 0.05
Max 10.91 0.69
Mean 0.8949 0.8650 0.4807
StdDev 0.1266 0.1343 0.1398
Performing the activities defined in 3.3, the Pcrelation selected was the following:
PN1:Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were met,
have been sentenced with punishments from 24 to 42 months.
Author Proof
A Novel Method for Filtering a Useful Subset 11
PN2:About half of FBCD (333.1) crimes in which circumstances 80.1(c) was met, have
been sentenced with punishments from 43 to 60 months.
TPc=1;SPc=0.75;CPc=0.66
By verbalization via the contrast relationsection of CNLSummaries, the relation is
shown as:
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met, have been sentenced with punishments from 24 to 42 months,but about
half of FBCD (333.1) crimes in which circumstances 80.1(c) was met, have been
sentenced with punishments from 43 to 60 months.
Performing the activities defined in 3.4, the Phrelation selected was the following:
PN:Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were met,
have been sentenced with punishments from 24 to 42 months.
PS:In most of FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met and have been sentenced with punishments from 24 to 42 months, the appeals were
rejected.
TPh=0.93;SPh=0.88;CPh=0.61
By verbalization via the emphasis relationsection of CNLSummaries, the relation is
shown as:
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met, have been sentenced with punishments from 24 to 42 months,and specially
in most of them, the appeals were rejected.
Finally, the relations are mapped in the knowledge graph of the Fig. 6, which is the
method output, i.e., the information delivered to the user. In it, it is easy to understand
the most frequent behavior in the previous cases similar to the handled case.
Almost all FBCD (333.1) crimes in which
circumstances 80.1(c), 79.1(a) y 79.1(c) were met, have
been sentenced with punishments from 24 to 42 months.
sincebut
About half of FBCD (333.1) crimes in which
circumstances 80.1(c) was met, have been
sentenced with punishments from 43 to 60 months.
and especially
In most of them, the appeals were rejected.
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a)
were met, have been sentenced with punishments from 24 to 42 months.
Fig. 6. Knowledge graph of Pe,Pcand Phrelations.
Author Proof
12 C. R. Rodríguez Rodríguez et al.
5 Concluding Remarks
The proposed method addresses the problem of selecting a proper subset of LS for an
instance of a decision problem. The described approach makes a better use, for a specific
situation, of previous knowledge about similar cases. Its representation scheme improves
the expressiveness and simplicity of the initially generated composite summaries, which
increases their usefulness.
Defining an attribute preference ranking allows that, in the absence of an all-inclusive
relation, priority is given to those ones that contain the most relevant attributes. For this
reason, the metrics T,T(Pr)and S(Pr)are used as a second resource to select the most
useful relations.
Focusing on the nucleus of Evidence relation allows to find first the most rep-
resentative statement and then, for that nucleus, to identify the three most useful
relations.
The proposed approach allows to rescan, for each instance of a dynamic decision-
making problem, a set of composite relations previously mined from a dataset.
References
1. Yager, R.R., Reformat, M.Z., To, N.D.: Drawing on the iPad to input fuzzy sets with an
application to linguistic data science. Inf. Sci. (Ny) 479, 277–291 (2019). https://doi.org/10.
1016/J.INS.2018.11.048
2. Yager, R.R.: A new approach to the summarization of data. Inf. Sci. (Ny) 28, 69–86 (1982)
3. Zadeh, L.A.: A computational approach to fuzzy quantifiers in natural languages. Comput.
Math. Appl. 9(1), 149–184 (1983). https://doi.org/10.1016/0898-1221(83)90013-5
4. Pupo, I., Piñero, P.Y., Bello, R.E., García, R., Villavicencio, N.: Linguistic data summarization:
a systematic review. In: Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J. (eds.) Artificial
Intelligence in Project Management and Making Decisions, pp. 3–21. Springer International
Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-97269-1_1
5. Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist.
40(1), 121–170 (2014)
6. Zadeh, L.A.: A prototype-centered approach to adding deduction capability to search engines-
the concept of protoform. In: 2002 Annual Meeting of the North American Fuzzy Information
Processing Society Proceedings, pp. 523–525 (2002)
7. Kacprzyk, J., Zadrozny, S.: Linguistic database summaries and their protoforms: towards
natural language based knowledge discovery tools. Inform Sci (Ny) 173(4), 281–304 (2005).
https://doi.org/10.1016/j.ins.2005.03.002
8. Ramos-Soto, A., Martin-Rodilla, P.: Enriching linguistic descriptions of data: a framework
for composite protoforms. Fuzzy Sets Syst. 407, 1–26 (2021). https://doi.org/10.1016/j.fss.
2019.11.013
9. Cornejo, M.E., Medina, J., Rubio-Manzano, C.: Linguistic descriptions of data via fuzzy
formal concept analysis. In: Harmati, I.Á., Kóczy, L.T., Medina, J., Ramírez-Poussa, E. (eds.)
Computational Intelligence and Mathematics for Tackling Complex Problems 3, pp. 119–125.
Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-74970-
5_14
10. To, N.D., Reformat, M.Z., Yager, R.R.: Question-answering system with linguistic summa-
rization. In: 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–8
(2021). https://doi.org/10.1109/FUZZ45933.2021.9494389.
Author Proof
A Novel Method for Filtering a Useful Subset 13
11. Trivino, G., Sugeno, M.: Towards linguistic descriptions of phenomena. Int. J. Approx.
Reason. 54(1), 22–34 (2013). https://doi.org/10.1016/j.ijar.2012.07.004
12. Pérez, I., Piñero, P.Y., Al-subhi, S.H., Mahdi, G.S.S., Bello, R.E.: Linguistic data summariza-
tion with multilingual approach. In: Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J. (eds.)
Artificial Intelligence in Project Management and Making Decisions, pp. 39–64. Springer
International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-97269-1_3
13. Rodríguez, C.R., Peña, M., Zuev, D.S.: Extracting composite summaries from qualitative
data. In: Heredia, Y.H., Núñez, V.M., Shulcloper, J.R. (eds.) Progress in Artificial Intelligence
and Pattern Recognition: 7th International Workshop on Artificial Intelligence and Pattern
Recognition, IWAIPR 2021, Havana, Cuba, October 5–7, 2021, Proceedings, pp. 260–269.
Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-89691-
1_26
14. Rodríguez Rodríguez, C.R., Zuev, D.S., Peña Abreu, M.: Algorithms for linguistic description
of categorical data. In: Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J. (eds.) UCIENCIA
2021. SCI, vol. 1035, pp. 79–97. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-
97269-1_5
15. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text
organization. Text 8(3), 243–281 (1988)
16. Hou, S., Zhang, S., Fei, C.: Rhetorical structure theory: a comprehensive review of theory,
parsing methods and applications. Expert Syst. Appl. 157, 113421 (2020)
17. Rodríguez, C.R., Amoroso, Y., Zuev, D.S., Peña, M., Zulueta, Y.: M-LAMAC: a model for
linguistic assessment of mitigating and aggravating circumstances of criminal responsibility
using computing with words. Artif. Intell. Law (2023). https://doi.org/10.1007/s10506-023-
09365-8
Author Proof
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The general mitigating and aggravating circumstances of criminal liability are elements attached to the crime that, when they occur, affect the punishment quantum. Cuban criminal legislation provides a catalog of such circumstances and some general conditions for their application. Such norms give judges broad discretion in assessing circumstances and adjusting punishment based on the intensity of those circumstances. In the interest of broad judicial discretion, the law does not establish specific ways for measuring circumstances’ intensity. This gives judges more freedom and autonomy, but it also imposes on them more social responsibility and challenges them to manage the uncertainty and subjectivity inherent in this complex activity. This paper proposes a model to aid the linguistic assessment of circumstances’ intensity and to provide linguistic and numerical recommendations to determine an appropriate punishment interval. M-LAMAC determines the collective evaluation of circumstances of the same type, determines the prevalence of a type of circumstance by means of a compensation function, recommends the required modification in the input interval, and finally recommends a numerical interval adjusted to the judges’ initially expressed preferences. The model’s applicability is demonstrated by means of several experiments on a fictitious case of bank document forgery.
Chapter
This paper constitutes a systematic review of Linguistic Data Summarization theory and its trends. Authors analyzed the advantages and limitations of different techniques to generate linguistic summaries. In this investigation, authors discuss about the high variability of protoforms used by researches in several domains. This review exposes the strategy of validation of researches reported in bibliography. We identify the approaches in data summarization followed by the main authors. As conclusions, of this investigation, authors identify the following opportunities for improvement in this research area. One opportunity deal with the use of linguistic summaries not only for description, but also for prediction in decision-making problems. Other is about the necessity of new algorithms to facilitate the generation of linguistic summaries and its applications in several domains.
Chapter
The development of information systems increases the volume of data and the need for its processing for decision-making. Algorithms are required that allow the discovery of behavior patterns and their interpretability. In this context, the linguistic summarization of data arises as one of the descriptive knowledge discovery techniques with a promising approach to produce summaries from a database using natural language, where authors such as Yager and Zadeh were pioneers and set guidelines in the development of these techniques. In this work, new algorithms are proposed for the generation of linguistic summaries from data that combine different soft computing techniques such as: rough sets, the learning of probabilistic graphs with controlled natural languages for the generation of linguistic summaries with an approach multilingual. In particular, controlled natural languages are proposed for the Spanish, English, Japanese and Arabic languages. The proposed algorithms are compared with other techniques reported in the bibliography and are subject to expert evaluation.KeywordsArtificial intelligentController natural languageLinguistic data summarizationProject management
Chapter
The paper proposes a method that comprises five algorithms for producing composite linguistic summaries from categorical data. The generated composite summaries reflect Evidence, Contrast, or Emphasis relations between at least two constituent summaries. The constituent summaries are instances of the LDS classical protoforms created, in this case, with frequent L1 item sets and association rules obtained from applying an association rule mining algorithm. In order to verify the feasibility of implementing the method, we performed a use case with a dataset of 2128 cases of the Economic Chamber of the Provincial People’s Court of Havana. The results were consistent with expectations, obtaining 18 Evidence relations, 11 Contrast relations, and 16 Emphasis relations. Furthermore, we evaluated the interpretability of the composite summaries obtained in the use case. Specifically, we measured the accuracy of identifying the relation type implicit in the summary and their understandability. In both cases, the results were positive.
Chapter
The paper proposes a model combining association rules and elements of Rhetorical Structure Theory (RST) to generate composite linguistic summaries from qualitative data. The specifications of three new abstract forms of composite linguistic summaries for qualitative data are presented. The proposed abstract forms represent relations of Evidence, Contrast, and Emphasis inspired by RST, consisting of at least two semantically related constituent statements linked by a connector specific to each relation. The constituent statements have the structure of the classical protoforms of linguistic summaries, which in this paper are built from an association rule, to which a fuzzy linguistic quantifier is assigned. Moreover, the definitions of truth degree, relation strength, and coverage degree for composite relations are presented. The model applicability was checked through a use case performed with a database of 2128 cases of the Economic Chamber of the Provincial People’s Court of Havana.
Chapter
In this paper, a new approach for automatically generating linguistic descriptions by using residuated formal concept analysis is introduced and detailed. The main idea is to create a linguistic description from a database which has been previously transformed into a fuzzy formal context via a set of linguistic variables. From the obtained context, linguistic descriptions can be automatically generated through the information provided by the meet-irreducible concepts.
Article
Rhetorical structure theory (RST) is a significant theory about discourse organization. With an increasing number of research interests focus on RST, many novel parsing approaches have been proposed and motivated many brand new applications, such as chatbots and other expert and intelligent systems. However, the work on RST dates back many years and there remains a lack of comprehensive literature reviews. The aim of this study is therefore to provide a comprehensive overview of RST, parsing methods and applications. In this paper, we first give a detailed introduction to RST. Then the commonly used discourse treebank: RST-DT is elaborated. We propose a new taxonomy to divide the RST parsing methods into different categories. With a focus on the classical and latest methods that have recently been developed, we review the pros and cons of these approaches, along with the performance analysis of them. We then summarize the applications of RST across various domains. Moreover, we present a comparative study of RST with other discourse structure theories. Finally, we discuss some implications of our findings and outline future research directions in this challenging and fast-growing field.
Article
One of the current limitations of fuzzy linguistic descriptions of data is the lack of diversity of protoforms that can be used to linguistically summarize data. Despite an important effort in providing protoforms with improved semantics that are applicable to time series data or specific application domains, type-I and type-II fuzzy quantified sentences are still predominant in the literature. In this context, we propose a different approach for defining new types of protoforms. Instead of understanding protoforms as individual primitives, our proposal draws inspiration from Rhetorical Structure Theory to provide a framework that allows to define new types of complex protoforms based on semantic relations among simpler protoforms. Based on this framework, we propose an initial taxonomy of relations among protoforms and provide an illustrative use case based on real data and evaluated by human users.
Article
Large amounts of collected and stored data require specialized processing skills. However, a human-computer interaction is crude and far from natural. Users are forced to interact with databases using languages understood by machines. A simple way to learn more about phenomena represented by data can be done via representing the data as human-perceived concepts and enabling a human-like interpretation of it. The users should be able to use linguistic terms – for example LARGE, SLOW, MOST – as their representations of concepts in order to gain a better understanding of data. Yet, other issues arise: how to enter definitions of such terms, how to incorporate individual's understanding of their meanings, how to ensure their proper interpretation, and of course, how to do all this in an easy and simple way. In this paper, we present and describe an iPad-based software that enables an easy procedure of defining linguistic terms – such as LARGE, MEDIUM, SMALL – and linguistic qualifiers – ALL, MOSTLY – that are suitable for data analysis purposes. We state, that linguistic terms and qualifiers represented as fuzzy sets embody human defined concepts and allow users to better understand data. This understating is achieved via ‘mapping’ data into models built with definitions of imprecise concepts familiar to users. Further, fuzzy-based definitions of concepts and fuzzy operations on them facilitate a human-like analysis of data as an important aspect of data science. The main contribution is a tablet application – called Tablet input of Fuzzy Sets (TiFS) – that allows users to define terms in a simple way via drawing their ‘shapes’ using fingers. We provide a detailed description of the developed iPad application for that purpose and show its benefits when used in defining linguistic terms for data summarization processes.