PreprintPDF Available

A new network-base high-level data classification methodology (Quipus) by modeling attribute-attribute interactions

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

High-level classification algorithms focus on the interactions between instances. These produce a new form to evaluate and classify data. In this process, the core is a complex network building methodology. The current methodologies use variations of kNN to produce these graphs. However, these techniques ignore some hidden patterns between attributes and require normalization to be accurate. In this paper, we propose a new methodology for network building based on attribute-attribute interactions that do not require normalization. The current results show us that this approach improves the accuracy of the high-level classification algorithm based on betweenness centrality.
Content may be subject to copyright.
A new network-base high-level data classification methodology (Quipus) by
modeling attribute-attribute interactions
Esteban Wilfredo Vilca Zuiga
Dept. of Computing and Mathematics
FFCLRP-USP
Ribeiro Preto, Brasil
evilcazu@usp.br
Liang Zhao
Dept. of Computing and Mathematics
FFCLRP-USP
Ribeiro Preto, Brasil
zhao@usp.br
Abstract—High-level classification algorithms focus on the in-
teractions between instances. These produce a new form to
evaluate and classify data. In this process, the core is a complex
network building methodology. The current methodologies use
variations of kNN to produce these graphs. However, these
techniques ignore some hidden patterns between attributes and
require normalization to be accurate. In this paper, we propose
a new methodology for network building based on attribute-
attribute interactions that do not require normalization. The
current results show us that this approach improves the
accuracy of the high-level classification algorithm based on
betweenness centrality.
1. Introduction
The machine learning classification algorithms are low
level when they use only physical features to classify usually
distance measures like euclidean distance and ignore the
possible interactions between instances as a system [1].
However, high-level classification algorithms focus on ex-
ploit theses characteristics using complex networks as data
representation [2] [3].
There are a variety of techniques to build networks
but usually, they use kNN as the core [1] [4] [5]. These
algorithms produce a network where each node represents
an instance and each edge represents a neighbor in kNN [6].
A complex network is defined as a non-trivial graph [7].
Usually, the quantity of instances on the dataset and the
interactions generates a large graph with numerous edges.
This large graphs presents special characteristics that are
exploited in many techniques to classify data like Between-
ness Centrality [5], Clustering Coefficient [8], Assortativity
[6] and so on.
The current methodologies produce one network reduc-
ing each instance to a node. This approach presents some
different problems like the need to normalize data and
the omission of hidden-patterns between attribute-attribute
interaction.
In this paper, we will present a new methodology that
captures these attribute-attribute interactions building a net-
work for each attribute, removing networks without useful
information, optimizing the importance of each one. We will
use the high-level classification technique presented in [5]
to evaluate the results.
2. Model Description
In this section, we describe our algorithm. First, we
make a literature review. Then, we describe step by step
our algorithm.
2.1. Literature Review
2.1.1. Complex networks as data representation. In order
to capture the instance interactions, we need to represent the
data in a network structure. Different authors present build-
ing methodologies using the k nearest neighbors algorithm.
They transform each instance into a node and the nearest
neighbors will be the neighborhood of that node [3].
Figure 1. Image of two related instances transformed in two linked nodes.
In figure 2 , we can observe how each instance in the
dataset is represented as nodes. The links depend on the
k nearest neighbors. Due to this methodology depends on
kN N, the dataset normalization is needed.
In equation 1, it is described the general methodology
to build the network representation of the data [6].
N(Xi) = (kN N(Xi, yi),otherwise
-radius(Xi, yi),if |-radius(Xi, yi)|> k
(1)
Where Xiis the instance iand yiis its label. kN N
returns the knearest nodes related with the instance Xi
that have the same label yiusing a similarity function
like euclidean distance [8]. -radius returns a set of nodes
arXiv:2009.13511v1 [cs.LG] 28 Sep 2020
{Vj, Vj∈ V :distance(Xi, Xj)< yi=yj}.Vjis
the node representation of the instance Xj. The value is
a percentile of the distances calculated with kN N some
authors consider just the median [6].
Following this equation 1, we build a complex network
Gfrom a dataset D={(X1, y1), ..., (Xn, yn)}. Where each
class is represented as sub graph gi.
Figure 2. Image of a complex network Gwith three classes
gred,gg reen,gblue based on Wine UCI dataset and modularity Q=
0.6562.
2.1.2. Important network measures. To exploit the high-
level approach, we need measures that capture nodes inter-
action like structure, centrality, and connectivity measures.
Clustering Coefficient(CC ): This metric gives infor-
mation about the connectivity between the neighbors
of a node [7]. It is between 0 (no communica-
tion between the neighbors) and 1 (fully connected
neighbors).
Betweenness Centrality (BC ): This metric capture
the communication between nodes using the short-
est paths [2]. For each node we will calculate the
number of geodesic path where this node is present.
A node with higher BC present an important role
in the network communication.
B(i) = X
s6=i∈V X
t6=i∈V
ηi
st
ηst
(2)
where ηi
st is 1 when the node iis part of the geodesic
path from sto tand 0 otherwise. ηst is the total
number of shortest paths between sand t.
Modularity (Q): This metric provides information
about the quality of a given partition [9]. Usually, it
is between 0 and 1. Where 0 means poor community
structure and 1 represents a strong differentiation be-
tween the communities. In supervised learning clas-
sification, the communities are the classes. Higher
modularity represents better separations of the sub
graphs giand probably better classification.
Q=1
2|E|X
i,jV
(Aij kikj
2|E|)(3)
Where Aij is the weight of the edge that links the
vertex iand j.|E|and Vrepresents the number of
edges and nodes respectively. kiis the degree of the
node i.
2.1.3. High-level classification algorithms. These kinds of
algorithms apply these measures in complex networks to
classify.
Based on Impact Score (NBHL): This algorithm
uses different network measures after and before and
insertion [6]. Where the insertion produces fewer
metrics variation the node will be classified.
Based on Importance (PgRkNN): This algorithm use
the Pagerank algorithm to evaluate the importance of
the nodes [1]. For each sub graph gi, we measure
the importance of the nodes related to the node to
be classified. The neighbors with higher importance
will capture the new node.
Based on Betweenness (HLNB-BC): This algorithm
insert the new node into each sub graph giand
search the nodes with similar BC in each sub graph
[5]. These are compare and evaluated to provide
a probability for each class and the higher is the
selected label.
H ≈ (α)Wn+ (1 α)Tn(4)
where His a probability list for each label for one
node. Wnis the BC average difference of the b
closes nodes. Tnis the list of the number of link
for each subgraph gi.αcontrols the weights between
structural information Wnand number of links Tn.
2.2. High-Level Classification Algorithm Using
Attribute-Attribute Interaction (Quipus)
2.2.1. Attribute-attribute interactions. In the section
2.1.1, we analyse how each instance is represented as a
node but this approach ignore some hidden patterns between
attribute-attribute interaction.
Figure 3. Image with 4 graphs that capture the interactions into each
attribute.
In this paper, we create a graph for each attribute to
detect this hidden patterns. In figure 3, we can appreciate
how each attribute is represented as an independent graph.
Using this approach, we can capture the attribute-attribute
interaction. Since we are using attributes that have the same
scale to produces the graphs, our method increases its resis-
tance to non-normalized data. However, there are attributes
that by themselves do not provide relevant information and
require others to be useful. Thus, we will use the Qto
evaluate each graph.
2.2.2. Proposed methodology. In supervised learning, we
split the dataset in two Xtraining and Xtesting . In training
phase, we produce a model that will help us to predict the
testing dataset. In training phase, we need to split the data
again in Xnet and Xopt because we have an optimization
phase. The proportion depends on the quantity of data
available.
The next steps are for training phase:
First, we need to build a graph for each attribute to
capture the hidden patters between them following
the equation 1 on Xnet. Then, we build one more
graph using the instance as a node to capture the
instance-instance interaction.
Second, we calculate the Qfor each graph. To
avoid possible noise of attributes without relevant
information, we ignore the graphs with modularity
lower than the instance-instance network.
Third, we insert each instance from Xopt to the
networks following the same strategy described in
step 1. However, we will keep the link between
different labels because we want to simulate a real
insertion. Introduce each attribute into correspondent
graph and the complete instance in the instance-
instance graph.
Fourth, we obtain the probability to be part of each
class in each graph using the high-level algorithm
HLNB-BC. For example, in a dataset with 3 classes
and 4 attributes, it will give us a list with three prob-
abilities for each graph (12 probabilities in total).
Fifth, we give a weight for each graph from 0 to
1. This will give us a way to reduce or increase
the classification probability of the graphs. Then, we
use an optimization algorithm like particle swarm
to determinate the better weights for each graph to
increase the accuracy of the predicted instances in
Xopt.
Finally, we save the weights and produce the final
graphs following the same procedure in step 1 with
Xtraining .
In testing phase, we insert each instance into the graphs
following the same process in step 4 of testing phase and
multiply the probabilities for each graph with the weights
defined in step 5.
3. Results
In this section, we present the performance of our
methodology Quipus. We use Python as programming
language, Scikit-learn library for machine learning algo-
rithms [?], networkx as graph library [10], and Pyswarms
as PSO optimizer [11]. Each algorithm were tested us-
ing 10-folds cross validation 10 times for training and
testing datasets, and Grid search to tune the param-
eters. We search kfrom 1 to 30, the percentile
was tested with these values [0.1,0.2,0.3,0.4,0.5],b
nearest nodes from 1 to 7 and αwith these values
[0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
The datasets used, their attributes, instances, and classes
are described on table 1.
Dataset Instances Attributes Classes
Iris 150 4 3
Wine 178 13 3
Zoo 101 16 7
TABLE 1. INFORMATION ABOUT THE UCI CLASSIFICATION DATASET
USED ON THESE PROJECT
In section 2.2.2, we split the training data in Xnet and
Xopt. In our tests, we use a stratified random split 80% and
20% respectively. This value could be modified according
to the quantity of data.
Then, we build a network for each attribute and one net-
work for instance-instance interactions. We calculate their
modularities (Q) and compare each attribute network with
the instance-instance network. The networks with lower
modularity will be ignored in the rest of the process. The
table 3 show us the modularities for each network.
Network Modularity Q
Instance-instance 0.3181
Attribute 1 0.3189
Attribute 2 0.0924
Attribute 3 0.0500
Attribute 4 0.1689
.
.
.
.
.
.
Attribute 10 0.2288
Attribute 11 0.3008
Attribute 12 0.3333
TABLE 2. MODULARITIES OF ATTRIBUTE NETWORKS IN UCI WINE
DATASET
For instance, the modularity of the networks attribute
1 and attribute 12 are higher than modularity of instance-
instance network. So, these networks will be used for op-
timization, classification, and insertion. The others will be
ignore because do not have a high community structure.
The insertion of the nodes into each graph follow the
equation 1, but preserving the links with nodes of different
labels. Given that we want to capture the insertion probabil-
ity for each class. Then, we create a weight for each graph
probabilities and start an optimization phase. We use a par-
ticle swarm optimization from Pyswarms library with these
parameters {c1= 0.5, c2= 0.1, w = 0.9, iterations =
500}. Theses could be optimized but we use these fixed
values for these experiments.
For example, in one interaction, our algorithm capture
the weight in table 3.
Network Modularity QWeights Ignored
Instance-instance 0.3181 0.9083 False
Attribute 1 0.3189 0.8065 False
Attribute 2 0.0924 - True
Attribute 3 0.0500 - True
Attribute 4 0.1689 - True
.
.
.
.
.
.
Attribute 10 0.2288 - True
Attribute 11 0.3008 - True
Attribute 12 0.3333 0.1746 False
TABLE 3. MODULARITIES AND WEIGHTS OF ATTRIBUTE NETWORKS IN
UCI WINE DATASET
Once the weights are defined, we proceed to rebuild
the graphs but using the entire training dataset Xtraining.
Finally, the classification phase, will follow the same process
that optimization phase but using the optimized weights.
Figure 4. Image of instance-instance network from one interaction in UCI
wine dataset classification.
Figure 5. Image of attribute-attribute 1 network from one interaction with
higher modularity than its instance-instance network in UCI wine dataset
classification.
In figure 4 , we can observe the instance-instance net-
work from one instance of wine dataset classification. The
black nodes represents instances classified. It present an
structure where the red nodes are in one side, the blue
nodes in the other side, and the green nodes in the middle of
them. In the figure 5, we observe the network from the first
attribute of wine datset that had a modularity of 0.3189.
Once the nodes are inserted this graphs present a higher
modularity Q= 0.6553. Without this methodology, we will
lose these attribute-attribute interactions. These networks
gives us an accuracy of 91.11%.
Dataset k e b α
Iris 12 0.0 3 1.0
Wine 7 0.0 3 1.0
Zoo 1 0.0 1 1.0
TABLE 4. PARAM ETER VAL UES US ED BY N BHL-BC WITH QUIPUS
ME THOD OLO GY IN UC I DATAS ETS
In table 5, we observe the accuracy of Quipus against
the literature network building technique kNN+-radius.
This current technique present problems related to data non-
normalized like wine uci dataset. However, using Quipus,
we reduce this problem. Due to the attribute networks build
their relations in the same scale, the optimized weight
manage the probability force, and reduce its impact in the
final classification.
Results of 10 times using 10-folds cross validation
Dataset Prediction Building (k) Accuracy
Iris HLNB-BC kNN+-radius (7) 95.33 ±11.85
Quipus (12) 95.80 ±09.36
Wine HLNB-BC kNN+-radius (1) 75.84 ±19.15
Quipus (7) 93.03 ±13.08
Zoo HLNB-BC kNN+-radius (1) 96.36 ±12.98
Quipus(1) 96.87 ±04.97
TABLE 5. TABLE WITH ACCURACY OF DIFFERENT BUILDING
METHODOLOGIES AND UCI DATASE TS WIT HOU T NORM ALIZ ATION .
4. Conclusion
The new classification methodology proposed exploit the
hidden patterns in attribute-attribute interactions. Building
networks for each attribute and ignoring the ones with lower
modularity. Also, uses the high-level classification technique
HLNB-BC. Introduces resilience to the model against non-
normalized data.
Many different modification, tests, and experiments have
been left for future work like testing with others high-level
techniques (NBHL, PgRkNN), identify a way to optimize
the parameters of particle swarm, and the use of the Quipus
methodology in other real datasets.
References
[1] M. Carneiro and L. Zhao, “Organizational data classification based
on the importance concept of complex networks,IEEE Transactions
on Neural Networks and Learning Systems, vol. 29, pp. 3361–3373,
2018.
[2] T. Christiano Silva and L. Zhao, Machine Learning in Complex
Networks. Springer International Publishing, 2016.
[3] M. Carneiro and L. Zhao, “Analysis of graph construction methods
in supervised data classification,” in 2018 7th Brazilian Conference
on Intelligent Systems (BRACIS), Oct 2018, pp. 390–395.
[4] S. A. Fadaee and M. A. Haeri, “Classification using link prediction,”
Neurocomputing, vol. 359, pp. 395 – 407, 2019.
[5] E. Vilca and L. Zhao, “A network-based high-level data classification
algorithm using betweenness centrality,” 2020.
[6] T. Colliri, D. Ji, H. Pan, and L. Zhao, “A network-based high level
data classification technique,” in 2018 International Joint Conference
on Neural Networks (IJCNN), July 2018, pp. 1–8.
[7] R. Albert and A.-L. Barab ´
asi, “Statistical mechanics of complex
networks,” Rev. Mod. Phys., vol. 74, pp. 47–97, Jan 2002.
[8] T. C. Silva and L. Zhao, “Network-based high level data classifica-
tion,” IEEE Transactions on Neural Networks and Learning Systems,
vol. 23, no. 6, pp. 954–970, June 2012.
[9] A. Clauset, M. E. J. Newman, , and C. Moore, “Finding community
structure in very large networks,Physical Review E, pp. 1– 6, 2004.
[Online]. Available: www.ece.unm.edu/ifis/papers/community-moore.
pdf
[10] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network
structure, dynamics, and function using networkx,” in Proceedings of
the 7th Python in Science Conference, G. Varoquaux, T. Vaught, and
J. Millman, Eds., Pasadena, CA USA, 2008, pp. 11 – 15.
[11] L. J. V. Miranda, “PySwarms, a research-toolkit for Particle Swarm
Optimization in Python,” Journal of Open Source Software, vol. 3,
2018. [Online]. Available: https://doi.org/10.21105/joss.00433
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Link prediction in a graph is the problem of detecting the missing links or the ones that would be formed in the near future. Using a graph representation of the data, we can convert the problem of classification to the problem of link prediction which aims at finding the missing links between the unlabeled data (unlabeled nodes) and their classes. To our knowledge, despite the fact that numerous algorithms use the graph representation of the data for classification, none are using link prediction as the heart of their classifying procedure. In this work, we propose a novel algorithm called CULP (Classification Using Link Prediction) which uses a new structure namely Label Embedded Graph or LEG and a link predictor to find the class of the unlabeled data. Different link predictors along with Compatibility Score - a new link predictor we proposed that is designed specifically for our settings - has been used and showed promising results for classifying different datasets. This paper further improved CULP by designing an extension called CULM which uses a majority vote (hence the M in the acronym) procedure with weights proportional to the predictions’ confidences to use the predictive power of multiple link predictors and also exploits the low level features of the data. Extensive experimental evaluations shows that both CULP and CULM are highly accurate and competitive with the cutting edge graph classifiers and general classifiers.
Conference Paper
Full-text available
NetworkX is a Python language package for exploration and analysis of networks and network algorithms. The core package provides data structures for representing many types of networks, or graphs, including simple graphs, directed graphs, and graphs with parallel edges and self loops. The nodes in NetworkX graphs can be any (hashable) Python object and edges can contain arbitrary data; this flexibility mades NetworkX ideal for representing networks found in many different scientific fields. In addition to the basic data structures many graph algorithms are implemented for calculating network properties and structure measures: shortest paths, betweenness centrality, clustering, and degree distribution and many more. NetworkX can read and write various graph formats for eash exchange with existing data, and provides generators for many classic graphs and popular graph models, such as the Erdoes-Renyi, Small World, and Barabasi-Albert models, are included. The ease-of-use and flexibility of the Python programming language together with connection to the SciPy tools make NetworkX a powerful tool for scientific computations. We discuss some of our recent work studying synchronization of coupled oscillators to demonstrate how NetworkX enables research in the field of computational networks.
Conference Paper
In machine learning, traditional data classification techniques analyze only physical features of the input data (e.g., distance or distribution) in order to identify the main differences among them. This type of approach is referred to as low level classification. However, the human (animal) brain is able to perform not only low orders of learning, but it is also able to identify patterns according to the semantic meaning of the input data. Data classification that considers both physical attributes and also the pattern formation, is referred to as high level classification. Previous high level classification techniques require a low level technique to work together. Such an approach does not fully highlight the ability of feature extraction embedded in high-level schemes. In this paper, we propose a pure network-based high level classification technique which aims to identify the classes of new instances by detecting and comparing the impact that each one of them has on the topological structure of the network components, which represent each class of the input data set. Eight artificially generated data sets, along with other nine different real classification data sets, were used in order to test this technique, as well as to compare its performance with those obtained by nine traditional and well-known classification models. The results of these tests are very stimulating, indicating that the novel technique proposed in this work may have great potential for further development and application. Moreover, the peculiarity of the concept of pattern based classification provides a new general approach for raw data feature extraction.
Article
Data classification is a common task, which can be performed by both computers and human beings. However, a fundamental difference between them can be observed: computer-based classification considers only physical features (e.g., similarity, distance, or distribution) of input data; by contrast, brain-based classification takes into account not only physical features, but also the organizational structure of data. In this paper, we figure out the data organizational structure for classification using complex networks constructed from training data. Specifically, an unlabeled instance is classified by the importance concept characterized by Google’s PageRank measure of the underlying data networks. Before a test data instance is classified, a network is constructed from vector-based data set and the test instance is inserted into the network in a proper manner. To this end, we also propose a measure, called spatio-structural differential efficiency , to combine the physical and topological features of the input data. Such a method allows for the classification technique to capture a variety of data patterns using the unique importance measure. Extensive experiments demonstrate that the proposed technique has promising predictive performance on the detection of heart abnormalities.
Book
This book presents the features and advantages offered by complex networks in the machine learning domain. In the first part, an overview on complex networks and network-based machine learning is presented, offering necessary background material. In the second part, we describe in details some specific techniques based on complex networks for supervised, non-supervised, and semi-supervised learning. Particularly, a stochastic particle competition technique for both non-supervised and semi-supervised learning using a stochastic nonlinear dynamical system is described in details. Moreover, an analytical analysis is supplied, which enables one to predict the behavior of the proposed technique. In addition, data reliability issues are explored in semi-supervised learning. Such matter has practical importance and is not often found in the literature. With the goal of validating these techniques for solving real problems, simulations on broadly accepted databases are conducted. Still in this book, we present a hybrid supervised classification technique that combines both low and high orders of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features, while the latter measures the compliance of the test instances with the pattern formation of the data. We show that the high level technique can realize classification according to the semantic meaning of the data. This book intends to combine two widely studied research areas, machine learning and complex networks, which in turn will generate broad interests to scientific community, mainly to computer science and engineering areas.
Article
Traditional supervised data classification considers only physical features (e.g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
Article
Complex networks describe a wide range of systems in nature and society, much quoted examples including the cell, a network of chemicals linked by chemical reactions, or the Internet, a network of routers and computers connected by physical links. While traditionally these systems were modeled as random graphs, it is increasingly recognized that the topology and evolution of real networks is governed by robust organizing principles. Here we review the recent advances in the field of complex networks, focusing on the statistical mechanics of network topology and dynamics. After reviewing the empirical data that motivated the recent interest in networks, we discuss the main models and analytical tools, covering random graphs, small-world and scale-free networks, as well as the interplay between topology and the network's robustness against failures and attacks. Comment: 54 pages, submitted to Reviews of Modern Physics