Content uploaded by Carlos Rafael Rodríguez Rodríguez
Author content
All content in this area was uploaded by Carlos Rafael Rodríguez Rodríguez on Dec 23, 2023
Content may be subject to copyright.
A Novel Method for Filtering a Useful Subset
of Composite Linguistic Summaries
Carlos R. Rodríguez Rodríguez1,2(B), Marieta Peña Abreu1,
Denis Sergeevich Zuev2, Yarina Amoroso Fernández1,3 ,
and Yeleny Zulueta Véliz1
1University of Informatics Sciences, Havana, Cuba
crodriguezr@uci.cu
2Kazan Federal University, Kazan, Russia
3National Union of Cuban Jurists, Havana, Cuba
Abstract. Selecting a subset of linguistic summaries and providing them in a
user-friendly and compact form is a latent issue in the field of Linguistic Data
Summarization. The paper proposes a method for filtering the most useful subset,
AQ1
for a given decision problem, from a set of composite linguistic summaries. Those
summaries embody Evidence, Contrast or Emphasis relations, inspired by the
Rhetorical Structure Theory. The summaries’ usefulness is determined according
to the relevance of the attributes contained in each one. The strategy followed
is based on first finding the Evidence relation whose nucleus contains the better
possible representation of the problem attributes, then searching for a Contrast
relation and an Emphasis relation that share that nucleus. The method output is a
scheme that synthesizes and combines the texts of the three relations. The paper
provides an illustrative example in which the most useful relations are found from
a dataset of 63 crimes to solve a case of bank document forgery.
AQ2
Keywords: Linguistic descriptions of data ·Linguistic data summarization ·
Natural language generation ·Expressiveness of linguistic summaries
1 Introduction
Linguistic data summarization (LDS) is a descriptive knowledge discovery technique
to produce summaries from a database using natural language [1]. Several authors have
extended the original LDS approach [2,3] by defining different stereotyped forms for
structuring summaries, proposing new indicators to measure their quality, using differ-
ent techniques to generate them, and applying these developments to a wide range of
problems. Pupo et al. [4] provide a comprehensive review about these topics.
AQ3
The structure of linguistic summaries (LS), and of any kind of information, is a key
factor of their actual usefulness. The usefulness of LS depends, among other criteria, on
their expressiveness [5]. Stereotyped forms for structuring LS, called protoforms, were
initially proposed by Zadeh [6] and then presented as a hierarchy of abstract prototypes
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
Y. Hernández Heredia et al. (Eds.): IWAIPR 2023, LNCS 14335, pp. 1–13, 2024.
https://doi.org/10.1007/978-3-031-49552-6_16
Author Proof
2 C. R. Rodríguez Rodríguez et al.
[7]. The protoforms have been extended for different problems, but their original forms
has been the most widespread [4,8], which are defined as in (1)or(2):
TQX are
−→
have Y(1)
TQFX are
−→
have Y(2)
where Yis a summarizer (e.g., have sentences from 24 to 42 months); Xis the object
(e.g., FBCD crimes)Qis a quantity in agreement given as a fuzzy linguistic quantifier
(e.g., many); and Tis the truth degree of the summary in [0, 1]. In (Eq. 2), a qualifier
F(e.g., with circumstances 80.1(c) and 79.1(a)) is added. Fis a filter to get a specific
data subset. The following is a summary like (Eq. 2): T (Many FBCD crimes with
circumstances 80.1(c) and 79.1(a), have sentences from 24 to 42 months) =1.
These protoforms consist of quantified sentences that are said not to be delivered
directly to the user due to their lack of expressiveness. Moreover, they are usually handled
individually without taking into account the relationships between them [8]. For these
reasons, several approaches aim to improve the expressiveness of LS [8–13]. Among
these contributions, CLS-QD stands as a model for generating composite linguistic
summaries from qualitative data [13]. All these proposals improve the LS understanding,
but they do not propose ways to select the proper subset of LS for an instance of a decision
problem. That is, they provide an improved description of data, but do not show how to
filter those LS in a dynamic decision-making environment.
In order to address this issue, this paper proposes a method to select the three most
useful summaries, for a given situation, from all those generated with the CLS-QD model
[13]. For this purpose, Sect. 2briefly reviews the CLS-QD model; Sect. 3presents the
model for selecting and assembling the most useful summaries; and Sect. 4describes an
illustrative example of its applicability.
2 A Short Overview of CLS-QD Model
CLS-QD model was formalized in [13] and presented in an implementable form in [14].
Those summaries embody relations of Evidence (Pe), Contrast (Pc) or Emphasis (Ph),
inspired by the Rhetorical Structure Theory [15,16]. A relation (Pr) involves at least
two constituent statements, which can function as nuclei (PN) or satellites (PS), and
which are semantically linked by a relation r, i.e., by a specific connector.
The constituent statements are the classical protoforms of LS (see Eq. (1) and Eq. (2)),
which we will call type-I and type-II statements, respectively. Xand Ycan be simple or
complex predicates. A simple predicate consists of a single pair (attribute: value), and a
complex one comprises two or more pairs.
For measuring the quality of any relation Pr, CLS-QD defines three metrics: the
truth degree T(Pr), the relation strength S(Pr)and the coverage degree S(Pr).
An Evidence relation Peprovides one main statement (the nucleus, PN) and one or
more supporting statements (the satellites, PS), which supply finer-grained information
that validates the nucleus. Its general structure is:
Pe=PN,evidence connectorPS(3)
Author Proof
A Novel Method for Filtering a Useful Subset 3
The nucleus of Pecan be a type-I or type-II statement. The satellite can be one or
more non-overlapping type-II statements. In Pe, the satellites semantically support the
nucleus, i.e., all constituent statements of relation share the same consequent.
AComposite relation Pcconsists of two nuclei, which provide contrasting
information about the same attributes of the analyzed problem. Its general structure
is:
Pc=PN1,contrast connectorPN2(4)
Both nuclei can be type-I or type-II statements and can have complex predicates in
their antecedents and consequents, but at least one pair of predicates must be different.
An Emphasis relation Phcombines two similar statements in which the second one
(the satellite) has an additional predicate that specifies the main feature of the objects
described by the first one (the nucleus). Its general structure is:
Ph=PN,emphasis connectorPS(5)
In Ph, the statement that functions as the nucleus is, in turn, the antecedent of the
satellite, and the consequent of the satellite contains a different predicate that emphasizes
a feature of the nucleus. The nucleus can be a type-I or type-II statement. Meanwhile,
the satellite has been constrained to only one statement of type-II.
3 A Method for Filtering and Assembling the Most Useful Relations
The strategy for selecting the most useful composite relations aims at identifying those
Evidence, Contrast and Emphasis relations (one per type), which, according to value
of their attributes, are the most helpful for the specific situation addressed. Therefore,
selecting these relations is not a simple search for those that maximize the values of
T(Pr),S(Pr)and C(Pr)metrics defined in [13] for measuring its quality. That is to say,
a relation that is the most useful one for solving a problem instance may not be useful for
another one, even if in both cases the same subset of attributes is involved. The method
comprises five activities (see Fig. 1).
Fig. 1. Method flowchart.
The strategy focuses attention on the nucleus of Evidence relation because, by its
self-definition, this relation exposes a main statement (the nucleus) and the satellite
Author Proof
4 C. R. Rodríguez Rodríguez et al.
provides information that helps to increase the credibility of the nucleus. Therefore,
finding the most representative nucleus of the attributes brings implicitly other infor-
mation (the satellite) that supports it. At the same time, from that nucleus it is possible
to find a Contrast relation that shares it and contains another nucleus in which the tar-
get attribute (summarizer) has a different value. Similarly, it is possible to find another
Emphasis relation that shares the nucleus of the Evidence relation and that its satellite
highlights an additional property of the cases described by the nucleus. We believe that
this compendium, harmoniously assembled, could be useful to assist decision-makers.
Remark. The analysis and selection of the most useful relations works mainly with
the relation’s attributes, it does not take into account the Qvalues and the Tvalues are
considered only as a second resource for selecting. So, the constituent statements (Eq. (1)
and Eq. (2)), will be handled as X→Yand FX →Y, respectively, where Xand FX
comprise the set of predictor attributes and Ycomprise the set of target attributes.
3.1 Establishing an Attribute Preference Ranking
The selection strategy takes into account that not all predictor attributes of the analyzed
problem may be present in the same relation. For this reason, it is initially necessary to
establish a ranking among these attributes according to its relevance, significance, or the
intensity of its values for the specific situation to be solved. That is to say, given a set
of n predictor attributes, X={xi|i∈(1,...,n)}, it is necessary to obtain the ordered
set X≥=xj≥... ≥xnwhere j∈(1,...,n)and xjdenotes the j-th most relevant
attribute. Knowing this ranking of preference, it is then possible to search for the relations
that best represent such attributes.
Several approaches can be explored to set a ranking of attributes, including:
– Employing feature selection techniques.
– Using a ranking pre-established by domain experts.
– Obtaining the ranking after the decision makers assess the relevance of each attribute
for the specific situation they are solving.
3.2 Selecting the Most Useful Evidence Relation
In order to select the most useful Evidence relation, Pe∗, for each value of the target
attribute, the relation whose nucleus contains in its antecedent the best possible repre-
sentation of the attributes is found. The best attributes representation is when all of them
are present. Otherwise, the relation whose nucleus contains in its antecedent the best
possible subset of attributes according to the previous ranking must be found.
Example 1: Let us consider a problem with three predictor attributes ranked as fol-
lows: X≥={x1≥x2≥x3}and a target attribute, y, with four possible values. By
applying the CLS-QD model, it is theoretically possible to obtain, for any value of the
target attribute, y∗, the Evidence relations whose nuclei are shown in Fig. 2a). Such
nuclei, with the form FX →Y, are ordered as shown in Fig. 2b) according to the
attributes ranking, i.e., x1,x2,x3→y∗is the most representative (useful) nucleus.
But in a real case it is unlikely that all relations would be obtained for a single value
of the target attribute. Instead, it is usual to find relations for several or all values of the
Author Proof
A Novel Method for Filtering a Useful Subset 5
target attribute, as shown in Fig. 3. If the Evidence relations obtained were those whose
nuclei are shown in Fig. 3a), the order of representativeness would be: x1,x2,x3→y2>
x1,x2→y1>x1,x3→y3>x2,x3→y4, therefore, the most useful relation would
be the one to which the nucleus x1,x2,x3→y2belongs.
Fig. 2 Nuclei prototypes of all possible Evidence relations that contain at least one attribute in
the antecedent for any value of the target attribute y∗.
b)
2,3 2
1 2
2 2
1,2,3 3
2,3 3
3 3
1
,
2
,
3
3
1,2,3 1
2 1
3 1
1
,
2
,
3
1
1 4
3 4
1,3 3
2,3 3
3 3
1
,
3
3
2,3 1
2 1
3 1
1,3 2
1 2
2 2
1
,
3
2
2 4
3 4
c)
1,2,3 2
1 2
2 2
1
,
2
,
3
2
1,3 3
2,3 3
3 3
2,3 4
1 4
3 4
1,2 1
2 1
3 1
Fig. 3 Prototypes of Evidence relations nuclei that contain at least one attribute in the antecedent
for the four possible values of the target attribute.
On the other hand, if the relations obtained were those whose nuclei are shown in
Fig. 3b) or Fig. 3c), where several nuclei share the same antecedent, then it would be
necessary to apply the following decision rules:
•R1.If the relations nuclei share the same antecedent, and it contains all the attributes
(see Fig. 3b)), then select the one whose nucleus has the highest value of T.
–R1.1.If the nuclei have the same value of T,then select the relation with the
highest value of T(Pe), and if these values are equal, then select the one with the
highest value of S(Pe).
•R2.If the relations nuclei share the same antecedent, but it does not contain all
the attributes (see Fig. 4c)), then select the one that has the most representative
satellite.
Author Proof
6 C. R. Rodríguez Rodríguez et al.
–R2.1.If the satellites share the same attributes (see Fig. 4c)), then select the relation
with the highest value of T(Pe), and if these values are equal, then select the one
with the highest value of S(Pe).
Example 2: Let us consider now a problem with three predictor attributes ranked as
follows: X≥={x1≥x2≥x3}, and the Evidence relations shown in Fig. 4.
•Given the Pe
1and Pe
2relations (Fig. 4a)), the satellite of Pe
1contains all three attributes,
is the most representative, so, the Pe
1relation is the most useful one.
•In Pe
3and Pe
4(Fig. 4b)), the both satellites contain a subset of attributes, the satellite
of Pe
4has the two most relevant attributes, so Pe
4is the most useful relation.
•In Fig. 4c), both satellites contain the same attributes, so the most useful relation is
found by applying the R2.1 rule.
:1,3 2
:1, , 3 2
:1,3 3
:1,3 3
12
:
1
,
,
3
2
:2 2
:2,3 2
:2 3
:,2 3
3 4
:
,
2
3
b)
:2 2
:1,2 2
:2 3
:1,2 3
56
:
1
,
2 2
:
1
,
2
3
c)
:1 1
1:1,3 1
2:1,3 1
:1 3
1:1,3
2:1,3
910
1
:
1
,
3
2
:
1
,
3
e) :2 2
1:2,3 2
2:2,3 2
:2 3
1:2,3 3
2:2,3 3
11 12
1
:
2
,
3
2
2
:
2
,
3
2
1
:
2
,
3
3
2
:
2
,
3
3
f)
d) :1 2
1:1,3 2
2:1,3 2
:1 1
:1,1
7 8
:
1
,
1
Fig. 4. Paired prototypes of Evidence relations with equal antecedents in its nuclei. The relations
are compared according to its satellites.
On the other hand, it should be reminded that the Evidence relations may contain
more than one satellite, but they share the same set of attributes, so:
•In Fig. 4d), the satellite of Pe
8contains a better representation of the attributes than
satellites of Pe
7,soPe
8is the most useful relation although it has only one satellite.
•In Fig. 4e), both relations contain two satellites, but the satellites of Pe
10 include more
representative attributes than the satellites of Pe
9,soPe
10 is the most useful relation.
•Finally, in Fig. 4f), both relations contain the same attributes in its satellites, so the
most useful relation is found by applying the R2.1 rule.
3.3 Selecting the Most Useful Contrast Relation
Selecting the most useful Contrast relation, Pc∗, depends on the nucleus of the previously
selected Evidence relation, Pe∗. This nucleus will be, in turn, the first nucleus of the
Contrast relation, i.e., PN1∈Pc∗=PN∈Pe∗. The aim of this dependence is to find
Author Proof
A Novel Method for Filtering a Useful Subset 7
another statement (second nucleus of the Contrast relation, PN2∈Pc∗) that relates the
same predictor attributes to another value of the target attribute.
In order to select the Pc∗relation, the following tasks are performed:
1. Find all Contrast relations which contain the Pe∗nucleus and a second constituent
statement with another value of the target attribute given some combination of the
same attributes.
Let X→Ybe the nucleus of the Pe∗relation and A→Bbe the other constituent
statement that will compose the Contrast relation. The constraints to be met by the
candidate Contrast relations are specified below:
(((A=X)∨(A⊂X)∨(X⊂A)) ∧P_Dif (Y,B))∨
(((X=B)∨(B⊂X)∨(X⊂B)) ∧P_Dif (Y,A)) (6)
where P_Dif (K,L)is a constraint which checks that the predicates K∈PN1and
L∈PN2have at least one equal attribute, but with different values (see Eq. 7).
P_Dif (K,L)=∃
ki∈K,lj∈L
attki=attlj;valki= vallj;i=1,2, ..., n;j=1,2, ..., n(7)
2. When there are several relations that meet such restrictions, then select the relation
whose nucleus PN2better represents the attributes, operating in the same way as
described for selecting the nucleus of the Pe∗relation.
3. If in more than one relation the nucleus PN2have the same representation of the
attributes, then select the relation with the highest value of T(Pc).
3.4 Selecting the Most Useful Emphasis Relation
Selecting the most useful Emphasis relation, Ph∗, also depends on the nucleus of the
previously selected Evidence relation, Pe∗. This nucleus will be, in turn, the nucleus of
the Emphasis relation, i.e., PN∈Ph∗=PN∈Pe∗. The aim of this dependency is to
find another statement (satellite of the Emphasis relation, PS∈Ph∗) that highlights the
most common property among objects described by the nucleus.
In order to select the Ph∗relation, the following tasks are performed:
1. Find all Emphasis relations whose nuclei are the same as that of the Pe∗relation.
2. Among them, select the one with the highest value of S(Pe).
3. If several relations have the same value of S(Pe), then select the one whose satellite
has the highest value of T.
3.5 Assembling the Pe*,Pc*and Ph*Relations
Delivering linguistic summaries in a user-friendly way, which facilitates their under-
standing and, therefore, increases their usefulness in decision making, is still a latent
need. For this reason, we first find the statement (nucleus) most representative of the
problem attributes and around it we select the most useful Pe∗,Pc∗and Ph∗relations.
These relations share the same nucleus. Therefore, in order to clean up the data to
be delivered to the user, to facilitate its understanding, it is necessary to eliminate the
fragments of information repeated in the three relations. To this end, the capabilities of
Author Proof
8 C. R. Rodríguez Rodríguez et al.
graphic representation are used to provide information in a synthesized and logically
structured manner. Thus, a knowledge graph (KG) representing the relationships between
Pe∗,Pc∗and Ph∗is create (see Fig. 5). The KG is based on the relation assemblysection
of the CNLSummaries language (available at http://bit.ly/CNL_Summaries). CNLSummaries
is a controlled natural language specified for creating the constituent summaries, for
generating the composite relations, and for assembling a subset of relations.
Fig. 5. Knowledge graph of the most useful relations.
4 Illustrative Example
To better understand how the method works, an illustrative example is developed using
data from criminal cases. The procedure consists of three tasks:
1. Retrieving data and results of the experiment №3 reported in [17].
2. Generating candidate relations from a dataset of similar cases using CLS-QD [13].
3. Applying our proposal to the relation set obtained with CLS-QD taking as input the
circumstances of experiment №3[17] ranked according to their intensity.
Completing Task 1. The following relevant case information was retrieved:
•Crime: 333.1 – Forgery of banking or commercial documents (FBCD)
•Original punishment interval: 2–5 years (24–60 months)
•Mitigating circumstances: 79.1 (a) and 79.1 (c)
•Aggravating Circumstances: 80.1 (c)
•Ranking of circumstances according to its intensity: 80.1 (c) >79.1 (a) >79.1 (c)
Completing Task 2. Adataset of 98 similar previous cases judged in cassation by
the Criminal Chamber of the Supreme People’s Court was compiled. It comprises six
attributes without missing values (see Table 1). Based on the case information retrieved
in task 1, the dataset was preprocessed:
•Records of other crime types were eliminated, as well as FBCD offenses involving
circumstances different from those of the analyzed case, finally leaving 63 cases.
Author Proof
A Novel Method for Filtering a Useful Subset 9
•The attribute “punishment” was discretized as follows. First, the value expressed in
years was translated into months. Then, since the original punishment interval fore-
seen in the CPC for the FDBC offense ranges from two to five years of imprisonment,
the numerical values of the attribute “punishment” were transformed into one of the
following four labels: {i1=[12 months; 23 months], i2=[24 months; 42 months],
i3=[43 months; 60 months], i4=[61 months; 90 months]}.
Table 1. Attributes of the criminal case dataset.
№Name Description Example
1Crime Crime code and
denomination
333.1 – Forgery of banking or
commercial documents
2Mitigating circumstances List of mitigating
circumstance codes
From 79.1(a) to 79.1(k)
3Aggravating circumstances List of aggravating
circumstance codes
From 80.1(a) to 80.1(r)
4 Response to appeal Court response to
defendant’s appeal
Accepted / Rejected
5 Punishment Length of penalty (in
years)
3 years
Then, to obtain the composite relations from 63 cases obtained after data prepro-
cessing, the CLS-QD model [13] was applied using the Java implementation developed
in [14] and the CNLSummaries language, with the following settings:
•For generating the association rules, the Apriori algorithm (Weka version) were
invoked using the following configuration: numRules =100,metricType =Confi-
dence,minMetric =0.5,delta =0.05,minSupport =0.05.
•Parameters minT (Algorithms 1 and 2 in [14]) and minConf (Algorithm 1 in [14])
were set to 0.5.
•For computing T(Pr),theminimum t-norm was used.
•The set of fuzzy linguistic quantifiers were modeled with trapezoidal fuzzy sets as
follows: Q={About_half =[0.42, 0.48, 0.52, 0.58], Many =[0.52, 0.58, 1, 1], Most
=[0.72, 0.78, 1, 1], Almost_all =[0.92, 0.98, 1, 1]}.
•For creating the Contrast relations, the contrast degree between each pair of labels
(li,lj)for attribute “punishment” was set the unity, i.e., µR(li,lj)=1.
•We discarded the relations were S(Pr)<0.5.
As a result, four Evidence relations, seven Contrast relations and five Emphasis
relations were generated, whose statistical values are shown in Table 2.
Author Proof
10 C. R. Rodríguez Rodríguez et al.
Completing Task 3. Our proposal was applied to the relations derived from task 2.
Mitigating and aggravating circumstances were taken as predictor attributes and were
ranked according to the intensity assigned by the judges: 80.1 (c) >79.1 (a) >79.1 (c).
Performing the activities defined in 3.2, the Pe∗relation selected was the following:
PN:Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were met,
have been sentenced with punishments from 24 to 42 months.
PS:Almost all FBCD (333.1) crimes in which circumstances 80.1(c), 79.1(a) y 79.1(c)
were met, have been sentenced with punishments from 24 to 42 months.
TPe=1;SPe=0.87;CPe=0.61
By verbalization via the evidence relationsection of CNLSummaries, the relation is
shown as:
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met, have been sentenced with punishments from 24 to 42 months, since almost
all FBCD (333.1) crimes in which circumstances 80.1(c), 79.1(a) and 79.1(c)
were met, have been sentenced with punishments from 24 to 42 months.
Table 2. Statistical values of the composite relations generated in task 2.
Composite relations Measures T(Pr)S(Pr)C(Pr)
Evidence, Pe
number: 4
Min 0.79 0.63 0.05
Max 10,94 0.69
Mean 0.9132 0.8832 0.4625
StdDev 0.1163 0.1045 0.1564
Contrast, Pc
number: 7
Min 0.68 0.5 0.21
Max 10,75 0.78
Mean 0.7864 0.6721 0.5430
StdDev 0.1437 0.1889 0.1539
Emphasis, Ph
number: 5
Min 0.71 0.61 0.05
Max 10.91 0.69
Mean 0.8949 0.8650 0.4807
StdDev 0.1266 0.1343 0.1398
Performing the activities defined in 3.3, the Pc∗relation selected was the following:
PN1:Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were met,
have been sentenced with punishments from 24 to 42 months.
Author Proof
A Novel Method for Filtering a Useful Subset 11
PN2:About half of FBCD (333.1) crimes in which circumstances 80.1(c) was met, have
been sentenced with punishments from 43 to 60 months.
TPc=1;SPc=0.75;CPc=0.66
By verbalization via the contrast relationsection of CNLSummaries, the relation is
shown as:
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met, have been sentenced with punishments from 24 to 42 months,but about
half of FBCD (333.1) crimes in which circumstances 80.1(c) was met, have been
sentenced with punishments from 43 to 60 months.
Performing the activities defined in 3.4, the Ph∗relation selected was the following:
PN:Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were met,
have been sentenced with punishments from 24 to 42 months.
PS:In most of FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met and have been sentenced with punishments from 24 to 42 months, the appeals were
rejected.
TPh=0.93;SPh=0.88;CPh=0.61
By verbalization via the emphasis relationsection of CNLSummaries, the relation is
shown as:
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a) were
met, have been sentenced with punishments from 24 to 42 months,and specially
in most of them, the appeals were rejected.
Finally, the relations are mapped in the knowledge graph of the Fig. 6, which is the
method output, i.e., the information delivered to the user. In it, it is easy to understand
the most frequent behavior in the previous cases similar to the handled case.
Almost all FBCD (333.1) crimes in which
circumstances 80.1(c), 79.1(a) y 79.1(c) were met, have
been sentenced with punishments from 24 to 42 months.
sincebut
About half of FBCD (333.1) crimes in which
circumstances 80.1(c) was met, have been
sentenced with punishments from 43 to 60 months.
and especially
In most of them, the appeals were rejected.
Many FBCD (333.1) crimes in which circumstances 80.1(c) and 79.1(a)
were met, have been sentenced with punishments from 24 to 42 months.
Fig. 6. Knowledge graph of Pe∗,Pc∗and Ph∗relations.
Author Proof
12 C. R. Rodríguez Rodríguez et al.
5 Concluding Remarks
The proposed method addresses the problem of selecting a proper subset of LS for an
instance of a decision problem. The described approach makes a better use, for a specific
situation, of previous knowledge about similar cases. Its representation scheme improves
the expressiveness and simplicity of the initially generated composite summaries, which
increases their usefulness.
Defining an attribute preference ranking allows that, in the absence of an all-inclusive
relation, priority is given to those ones that contain the most relevant attributes. For this
reason, the metrics T,T(Pr)and S(Pr)are used as a second resource to select the most
useful relations.
Focusing on the nucleus of Evidence relation allows to find first the most rep-
resentative statement and then, for that nucleus, to identify the three most useful
relations.
The proposed approach allows to rescan, for each instance of a dynamic decision-
making problem, a set of composite relations previously mined from a dataset.
References
1. Yager, R.R., Reformat, M.Z., To, N.D.: Drawing on the iPad to input fuzzy sets with an
application to linguistic data science. Inf. Sci. (Ny) 479, 277–291 (2019). https://doi.org/10.
1016/J.INS.2018.11.048
2. Yager, R.R.: A new approach to the summarization of data. Inf. Sci. (Ny) 28, 69–86 (1982)
3. Zadeh, L.A.: A computational approach to fuzzy quantifiers in natural languages. Comput.
Math. Appl. 9(1), 149–184 (1983). https://doi.org/10.1016/0898-1221(83)90013-5
4. Pupo, I., Piñero, P.Y., Bello, R.E., García, R., Villavicencio, N.: Linguistic data summarization:
a systematic review. In: Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J. (eds.) Artificial
Intelligence in Project Management and Making Decisions, pp. 3–21. Springer International
Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-97269-1_1
5. Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist.
40(1), 121–170 (2014)
6. Zadeh, L.A.: A prototype-centered approach to adding deduction capability to search engines-
the concept of protoform. In: 2002 Annual Meeting of the North American Fuzzy Information
Processing Society Proceedings, pp. 523–525 (2002)
7. Kacprzyk, J., Zadrozny, S.: Linguistic database summaries and their protoforms: towards
natural language based knowledge discovery tools. Inform Sci (Ny) 173(4), 281–304 (2005).
https://doi.org/10.1016/j.ins.2005.03.002
8. Ramos-Soto, A., Martin-Rodilla, P.: Enriching linguistic descriptions of data: a framework
for composite protoforms. Fuzzy Sets Syst. 407, 1–26 (2021). https://doi.org/10.1016/j.fss.
2019.11.013
9. Cornejo, M.E., Medina, J., Rubio-Manzano, C.: Linguistic descriptions of data via fuzzy
formal concept analysis. In: Harmati, I.Á., Kóczy, L.T., Medina, J., Ramírez-Poussa, E. (eds.)
Computational Intelligence and Mathematics for Tackling Complex Problems 3, pp. 119–125.
Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-74970-
5_14
10. To, N.D., Reformat, M.Z., Yager, R.R.: Question-answering system with linguistic summa-
rization. In: 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–8
(2021). https://doi.org/10.1109/FUZZ45933.2021.9494389.
Author Proof
A Novel Method for Filtering a Useful Subset 13
11. Trivino, G., Sugeno, M.: Towards linguistic descriptions of phenomena. Int. J. Approx.
Reason. 54(1), 22–34 (2013). https://doi.org/10.1016/j.ijar.2012.07.004
12. Pérez, I., Piñero, P.Y., Al-subhi, S.H., Mahdi, G.S.S., Bello, R.E.: Linguistic data summariza-
tion with multilingual approach. In: Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J. (eds.)
Artificial Intelligence in Project Management and Making Decisions, pp. 39–64. Springer
International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-97269-1_3
13. Rodríguez, C.R., Peña, M., Zuev, D.S.: Extracting composite summaries from qualitative
data. In: Heredia, Y.H., Núñez, V.M., Shulcloper, J.R. (eds.) Progress in Artificial Intelligence
and Pattern Recognition: 7th International Workshop on Artificial Intelligence and Pattern
Recognition, IWAIPR 2021, Havana, Cuba, October 5–7, 2021, Proceedings, pp. 260–269.
Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-89691-
1_26
14. Rodríguez Rodríguez, C.R., Zuev, D.S., Peña Abreu, M.: Algorithms for linguistic description
of categorical data. In: Piñero Pérez, P.Y., Bello Pérez, R.E., Kacprzyk, J. (eds.) UCIENCIA
2021. SCI, vol. 1035, pp. 79–97. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-
97269-1_5
15. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text
organization. Text 8(3), 243–281 (1988)
16. Hou, S., Zhang, S., Fei, C.: Rhetorical structure theory: a comprehensive review of theory,
parsing methods and applications. Expert Syst. Appl. 157, 113421 (2020)
17. Rodríguez, C.R., Amoroso, Y., Zuev, D.S., Peña, M., Zulueta, Y.: M-LAMAC: a model for
linguistic assessment of mitigating and aggravating circumstances of criminal responsibility
using computing with words. Artif. Intell. Law (2023). https://doi.org/10.1007/s10506-023-
09365-8
Author Proof