ArticlePDF Available

PAVE: Personalized Academic Venue recommendation Exploiting co-publication networks

Authors:

Abstract and Figures

Academic venues have risen beyond the imagination for the rapid development of information technology. It is necessary for researchers to acknowledge high quality and fruitful academic venues. However, the information overload problem in big scholarly data creates tremendous challenges for mining these venues and relevant information. In this work, we propose PAVE, a novel Personalized Academic Venue recommendation Exploiting co-publication networks. PAVE runs a random walk with restart model on a co-publication network which contains two kinds of associations, coauthor relations and author-venue relations. We define a transfer matrix with bias to drive the random walk by exploiting three academic factors, co-publication frequency, relation weight and researchers’ academic level. PAVE is inspired from the fact that researchers are more likely to contact those who have high co-publication frequencies and similar academic levels. Additionally, in PAVE, we consider the difference of weights between two kinds of associations. Extensive experiments on DBLP data set demonstrate that, in comparison to relevant baseline approaches, PAVE performs better in terms of precision, recall, F1 and average venue quality.
Content may be subject to copyright.
PAVE: Personalized Academic Venue Recommendation
Exploiting Co-publication Networks
Shuo Yua, Jiaying Liua, Zhuo Yanga,, Zhen Chena, Huizhen Jianga, Amr
Tolbab,c, Feng Xiaa
aSchool of Software, Dalian University of Technology, Dalian 116620, China
bComputer Science Department, Community College, King Saud University, Riyadh 11437,
Saudi Arabia
cMathematics Department, Faculty of Science, Menoufia University, Shebin-El-kom 32511,
Egypt
Abstract
Academic venues have risen beyond the imagination for the rapid development of
information technology. It is necessary for researchers to acknowledge high qual-
ity and fruitful academic venues. However, the information overload problem
in big scholarly data creates tremendous challenges for mining these venues and
relevant information. In this work, we propose PAVE, a novel Personalized Aca-
demic Venue recommendation Exploiting co-publication networks. PAVE runs
a random walk with restart model on a co-publication network which contains
two kinds of associations, coauthor relations and author-venue relations. We
define a transfer matrix with bias to drive the random walk by exploiting three
academic factors, co-publication frequency, relation weight and researchers’ aca-
demic level. PAVE is inspired from the fact that researchers are more likely to
contact those who have high co-publication frequencies and similar academic lev-
els. Additionally, in PAVE, we consider the difference of weights between two
kinds of associations. Extensive experiments on DBLP data set demonstrate
that, in comparison to relevant baseline approaches, PAVE performs better in
terms of precision, recall, F1 and average venue quality.
Keywords: Big scholarly data, Recommender systems, Academic venue
Corresponding author
Email address: yangzhuo@dlut.edu.cn (Zhuo Yang)
Preprint submitted to Elsevier November 23, 2017
recommendation, Random walk, Network science
1. Introduction
It is challenging to mine useful and effective information in big scholarly data
due to information overload [38]. The number of researchers, publications, and
academic venues have risen beyond the imagination for the rapid development
of information technology. Recommender systems help researchers deal with5
the problem of rapid growth and complexity of information, and provide users
with personalized information services. With the continuous growth in the size
of research paper repository, recommendation technology for academic entities
has been developed gradually [11]. Nowadays, academic recommender systems
mainly focus on four aspects: collaborator recommendation, paper recommen-10
dation, citation recommendation and academic venue recommendation [41] [4].
Especially, the immense growth of academic venues makes it troublesome for re-
searchers to choose the most relevant venue, which is witnessed by DBLP [18], a
service that provides open bibliographic information on major computer science
journals and proceedings. It has recorded 3,711 conferences and 1,391 journals15
(until 2015). Academic venues recommender systems have substantiated their
necessity and importance because they provide researchers with personalized
venues information pushing service.
In order to better recommend personalized venues for researchers, we con-
sider the researchers’ requirements for venues from the perspective of scientific20
research progress as followings. (1) Where can researchers obtain high quality
venues? (2) What are the most relevant conferences researchers should par-
ticipate? (3) Which venues are the most suitable for researchers to contribute
papers? Firstly, researchers usually get inspirations from papers in high quality
venues. When doing research, researchers would be better to follow high-quality25
conferences and journals, in which we can find more high-quality and relevant
publications. Since researchers want to grow fast in certain domain by obtain-
ing more specific knowledge and fresh ideas, they need to study more publi-
2
cations that can inspire them. However, new researchers usually do not know
which conferences or journals are better choices for their researches. This is30
because that there are big differences among different venues in research focus,
research method, and writing style, etc. To avoid blindness and detours, it is
extremely necessary to recommend researchers more conferences and journals
of high quality [30]. Secondly, researchers participate in conferences to commu-
nicate with other researchers and promote scientific collaboration. As we all35
know, almost all of researchers participate in conferences every year. Academic
conferences not only serve as the platforms to present research work, but also
connect researchers in a domain to have a deep communication and boost the
potential collaboration. Thus, researchers can benefit a lot and make progress
together [26]. While, how to choose more relevant conferences to attend is a te-40
dious task, especially for those new researchers due to the information overload.
Finally, researchers are in need of submitting papers to the most suitable venues
of high quality. With the rapid development of both the quantity and variety
of publication venues in recent decades, it is difficult to decide where to submit
papers. For those experts who have much publication experience, it might be45
a trivial task that select suitable conferences, journals, or scientific forums to
publish their papers since they have already knew well about them. That is,
they might have target venues in mind before they finish their papers. However,
the junior researchers who have few or no publication records, may be not sure
of which specific venue the work should be contributed to. Under the guidance50
of senior researchers, junior researchers may have some distinct or indistinct
target venues to prepare their work. Nonetheless, since submissions may be
rejected, researchers always need backup plans. Thus, choosing an appropriate
venue will be very essential [35] [40].
In recent years, a variety of approaches relating to academic venue recom-55
mendation have been proposed [25, 40, 22, 8, 42, 20]. There are also some
smart conference systems or solutions that help improve participation experi-
ence and solve the conference recommendation problems [34]. Although there
are fruitful methods, systems, or solutions, some factors that may have an influ-
3
ence on recommendation in practice are not roundly taken into consideration.60
Moreover, some of work recommend venues based on homogeneous network [5].
However, academic network is generally with a composition of authors, key-
words, affiliations, and venues, which is heterogeneous indeed. In this work, we
recommend venues for researchers based on a heterogeneous network and take
the three aforementioned requirements into account as well. We propose a novel65
Personalized Academic Venue recommendation model Exploiting co-publication
networks (PAVE). We firstly integrate the academic entities (i.e., authors, publi-
cations, and venues) into a co-publication network [17], which contains two kinds
of nodes (author and venue) and two kinds of associations (co-author relations
and author-venue relations). Figure 1 shows an example of the co-publication70
network. Alice, Bob, Cindy and David are four researchers whose papers have
been published in the four venues. The links include co-author relations (e.g.,
the link between Alice and Bob) and author-venue relations (e.g., the links
between venue A and Alice). Those links along with the nodes compose the
co-publication networks. Furthermore, we introduce three metrics in PAVE: 1)75
co-publication frequency. It can reflect the occurring times of the relations; 2)
relation weight. The two kinds of relations can make differences on the network
edges; 3) academic level. Researchers are more likely to contact those who have
similar academic levels (Researchers are more likely to approach those who have
similar academic level rather than with high academic level, and the researchers80
with similar academic level performs similar characteristic in some ways. For ex-
ample, in Figure 1, Alice and Bob are neighbors. If they are of similar academic
level, then Venue C and Venue D, which Alice has not published any papers but
Bob has already published his work, should be not only taken into account when
we recommend venues for Alice, but also with more attention. This is because85
researchers are more likely to contact other researchers with similar academic
levels and publish papers in a venue, which is most likely to accept their papers.
Details are in Section 3.2). Based on these three hypotheses, we define a transfer
matrix with bias to drive the random walk with restart model (RWR) [2, 12, 36]
by introducing these three academic factors, co-publication frequency, relations90
4
weight and researchers’ academic level. Besides, we innovatively present a new
metric called Ave-Quality to evaluate the performance of recommendation apart
from precision, recall and F1 metrics. Ave-Quality can well show the quality
of recommended venues. In our experiments, PAVE is proved to be effective in
terms of leading a better academic venue recommendation.95
Figure 1: An example of co-publication networks.
In summary, we make the following contributions in this paper.
We develop an innovative solution based on a random walk with restart
model to deal with academic venue recommendation over big scholarly
data. The proposed solution is more favourable in terms of achieving
remarkable personalized academic venue recommendations.100
To reveal researchers’ real intention of academic venues, we define a trans-
fer matrix with bias by utilizing the aforementioned three academic fac-
tors, which can lead the random walk running on the co-publication net-
work with preference.
In addition to precision, recall and F1, We also propose a new metric105
to evaluate the performance. Extensive experiments on DBLP data set
5
measure the basic RWR model, a topic-based model and a friends-based
model for comparison and promising results are presented and analyzed.
The rest of the paper is organized as follows. Related work is discussed
in the next section. Section 3 introduces PAVE model. Section 4 presents110
the performance evaluation results of PAVE, followed by a section dedicated to
conclusion.
2. Related work
2.1. Academic Recommendation
Recommender system is proposed to deal with the issues of information over-115
load and help people make decisions by providing accessible and high quality
recommendations. Academic recommendation generally consists of academic
collaborator recommendation, paper recommendation, citation recommenda-
tion, and academic venue recommendation. For the different recommendation, a
variety of methods are basically divided into four types: Collaborative filtering120
(CF) recommendation, content-based recommendation, network-based recom-
mendation, and hybrid recommendation. Details are as follows.
1. Collaborative filtering recommendation. CF is a popular and widely ac-
cepted approach for recommendation system, like user-based CF, item-
based CF and Matrix Factorization (MF). There are some papers focus125
on academic recommendation by exploiting CF algorithm. For example,
Yu et al. [43] present a prediction method based on collaborative filter-
ing for personalized academic recommendation. Liang et al. [20] propose
a new probabilistic approach that directly incorporates user exposure to
items into collaborative filtering. They consider continuity feature of user’s130
browsing content to help discover collaborative users.
2. Content-based Recommendation. Content-based recommendation mainly
focuses on the profiles, the content of papers, and the context. It is widely
6
used in academic paper recommendation [19] [?] and citation recom-
mendation [13] [3] [7]. Sugiyama and Kan [31] examined the effect of135
modelling a researcher’s past works in recommending scholarly papers to
the researcher. The key part of this model is to enhance the profile de-
rived directly from past works with information. The information comes
from the past works’ referenced papers and papers that cited the work.
High quality papers can bring us shining ideas and also we can cite them140
in our papers. He et al. [13] present the initiative of building a context-
aware citation recommendation system and implement a prototype system
in CiteSeerX since it is challenging to obtain the relevant papers of high
value. Caragea et al. [7] propose an application of Singular Value De-
composition to build a reliable citation recommendation system and to145
recommend the most relevant citations. Pan and Li [24] use topic model
techniques to make topic analysis on research papers.
3. Network-based Recommendation. In academia, collaboration makes re-
searchers more fruitful and productive. Friends-based model is a kind
of neighborhood-based recommendation approach, which is simple and150
fundamental in social network-based recommendation methods. Lopes et
al. [21] present an innovative approach to recommend collaborations on
the context of academic social networks. Specifically, they introduce the
architecture for such approach and the metrics involved in recommending
collaborations and also present an initial case study to validate their ap-155
proach. West et al. [33] propose a citation-based method that makes it
possible to recommend multiple scales of relevance for different users by
using the hierarchical structure of scientific knowledge. Xia et al. [37] con-
sider features of different researchers and propose a novel recommendation
method which results in better recommendations.160
In addition, based on the collaboration network, Random Walk model is
frequently used to analyze the network. Fouss et al. [12] use a Markov-
chain model of random walk to compute similarities between elements of
7
a graph. Stokes et al. [29] use a biased random walk to estimate the ex-
pected time of finding a maximum degree node in a graph. Xia et al. [36]165
present MVCWalker (Random Walk-Based Most Valuable Collaborators
Recommendation Exploiting Academic Factors), which takes three aca-
demic factors, i.e. coauthor order, latest collaboration time, and times of
collaboration into consideration. They compare MVCWalker with the ba-
sic model of RWR and a common neighbor-based model in various aspects170
and achieve better performance. Extraordinary, researchers have already
begun to study weights in random walk model using supervised learning
algorithm. Lars and Jure [2] develop a method based on Supervised Ran-
dom Walks that in a supervised way learns how to bias a PageRank-like
random walk on the network so that it visits given nodes (i.e., positive175
training examples) more often than the others. Similarly to this goal,
we propose the transfer matrix with bias by introducing three academic
factors in this work.
4. Hybrid-based Recommendation. For the collaboration recommendation,
with mined contents getting more and more, hybrid-based methods [16] [10] [32]180
come out gradually. Cohen and Ebel [10] focus on one particular flavor of
context-based collaborator recommendation in a social network, given a
set of keywords. However, collaborations among different domains broaden
our researches a lot. Tang et al. [32] analyze the cross-domain collabora-
tion data from research publications and propose the Cross-domain Topic185
Learning (CTL) model for ranking and recommending potential cross-
domain collaborators. They considered a linear combination of the scores
obtained by the Content-based and the CF methods.
Cold start issue is one of the most fundamental and intractable issues in rec-
ommender system. When tackling cold start problem, these above mentioned190
methods perform differently. CF recommendation methods rely on history col-
laboration relationships of other scholars, which results in poorer performance
comparing with other kinds of methods. In contrast to CF, Content-based rec-
8
ommendation methods mitigate this issue to some extent since these methods
focus on researchers’ own profiles. Nevertheless, content-based recommenda-195
tion methods suffer from cold start challenge when new scholars do not have
their own history profiles or collaboration relationships [1]. Network-based and
hybrid-based recommendation methods perform better in solving cold start is-
sues [27].
2.2. Academic Venue Recommendation200
Plenty of studies have been done on academic collaborator recommendation,
academic paper, and citation recommendation and conference session recom-
mendation, while few focuses on the academic venue recommendation. The tra-
ditional way of recommending a venue to a researcher is by analyzing her/his
papers and comparing it to the topics of different conferences using content-205
based analysis. However, this approach is not so precise due to mismatches
caused by ambiguity in text comparisons. As a result, many researchers fo-
cus on social network based and CF methods. Additionally, some social aware
approaches and hybrid methods have also been proposed for academic venue
recommendation as mentioned above.210
Previous studies have already done some work. Yang et al. [40] propose a
memory-based neighborhood collaborative filtering model to recommend venues
by incorporating both topic and writing-style information of papers. They as-
sume that papers and venues are distinguishable by their writing styles [39].
Pham et al. [25] propose a clustering approach based on the social information215
of users to derive the academic recommendation. They utilize clustering tech-
niques to improve the accuracy of collaborative filtering. However, this approach
mainly involves predicting the publishing venue for a manuscript. Similarly,
Luong et al. [22] propose a social network based approach to recommend pub-
lication venues by exploring author’s network of related co-authors and other220
researchers in the same domain. In addition, Asabere et al. [42] propose a
socially aware based approach to recommend presentation session (community)
venues to participants based on high research interest similarity, strong social re-
9
lations, and the matching of contextual information between the presenters and
participants at the conference venue. Similarly, Xia et al. [35] propose a presen-225
tation session recommender for smart conference participants by utilizing social
properties such as tie strength and degree centrality. Hornick et al. [14] provide
a framework for extending preference-based recommender systems to deal with
problems such as the conference recommendation problem. Huynh Hoang [15]
proposed a collaborative knowledge model running on the collaborative network230
based on the combination of graph theory and probability theory, which aims
at supporting publication venue recommendation. Besides, Wongchokprasitti
et al. [34] present a design for a community-based conference navigator system
collecting the wisdom of community to help conference participants examine the
schedule of paper presentation and add the most interesting sessions.235
Previous works have not recommended venues according to the associations
with researchers. In our paper, we describe the academic publishing scene by a
co-publication network including author-venue network and co-author network,
and model the real publishing process by a RWR model based on graph theory
and probability theory. Our academic venue recommendation model, PAVE, is240
extended from the basic RWR model. We propose the transfer matrix with bias
by introducing three academic factors, i.e. co-publication frequency, relation
weight, and researchers’ academic levels, which ensures that the random walk
performs better when making academic venue recommendations.
3. Design of PAVE245
In this section, we describe the details of PAVE. Furthermore, we explain how
to compute the link importance in the co-publication networks by considering
three academic factors into consideration.
3.1. Overview of PAVE
We exploit PAVE to mine specific academic venues and make personalized250
recommendations for researchers. The model is inspired by the fact that, re-
10
searchers usually desire to keep contact with suitable academic venues, i.e. ac-
knowledging high-quality and fruitful academic venues, participating in most
academic conferences which are closely related to their research, and contribut-
ing to suitable venues where it is possible for them to publish their research pa-255
pers and achievements. Additionally, PAVE is the extension from our previous
work [9], which proposes a random walk based academic venue recommenda-
tions and achieves good recommendation results. In this work, we regard the
topic distribution of researchers’ publications content and venues’ publications
content as feature vectors respectively, which are calculated by an LDA (Latent260
Dirichlet Allocation) model [6]. We define the Kargument in LDA as value 10,
which means we clustered 10 topics for each venue and researcher. Then, we
consider more factors to evaluate the model. Most of all, the three academic
factors we introduced, co-publication frequency, relation weight and researchers’
academic level, aim at improving the recommendations by biasing the random265
walk, so that it traverses more easily to the positive nodes. However, in order
to improve the academic level of researchers, the high academic level of venues
needs to be guaranteed. Therefore, we define a new metric called Ave-Quality
to evaluate the academic level of venues recommended. The detailed process of
PAVE is described below. Also, the structure of our PAVE model is illustrated270
in Figure 2.
We model a co-publication network which consists in the author-venue net-
work and co-author network. As shown in Figure 1, there are two kinds of
nodes (venues and researchers) and two kinds of links (co-author relations and
author-venue relations). Additionally, PAVE is the evolution from a basic RWR275
model, which has been proved to be suitable for calculating the similarity of
nodes in networks. In PAVE, whether a venue should be recommended depends
on its importance of the target researcher. The importance is defined by the
rank score of the venue, which is determined by two factors, i.e. the number
of neighbor nodes and the rank score of incident nodes. The theory seems like280
PageRank [23], a successful application of RWR, which provides us a suitable
11
Figure 2: Structure of PAVE.
use for reference. Equation (1) is similar with PageRank in form.
ARu=1α
N+αX
vIu
ARvPu,v (1)
AR represents the rank score vector. ARuis the rank score (academic level)
of node u.Iuis the set of nodes incident to node u.Pu,v is the transition
probability from node vto node u.αis the damping factor. Nis the number of285
nodes in the network. PAVE compute the node ranking by driving an imaginary
walker randomly walks in the network. The walker has two choices, i.e. with
probability α, walking to next node v, which is one of u’s direct neighbors
(vIu)), or with probability 1 α, returning to source vertex u. Equation (1)
represents one step to get one rank score for node u. With respect to all nodes290
in the whole network, the approach is defined by Equation (2), which is an
iterative process.
AR(t+1) =αS·AR(t)+ (1 α)q(2)
ARtis the rank score vector at step t.qis a row vector (q0, q1..., qu, ..., qn).
For the target node u,qu= 1 and others equal 0. It should be noted that,
AR0=q.Sis the transfer matrix, representing the probability for each node295
12
to skip to the next node. For basic RWR model, the cell of matrix S(i.e. Pu,v in
Equation (1)) is defined as 1
Lv, in which Lvis the number of node v’s neighbors.
It means that, the walker has the same probability to skip to next node. In
PAVE, we do some guidance work by introducing three academic factors. The
change of Pu,v enables the walker to skip based on preference, which will be300
proved better in section 4 for academic venue recommendation.
With reference to Figure 2, the process of PAVE is described in detail as
follows.
Step1. The initial input data is a set of publications with authors’ infor-
mation and venues’ information. PAVE firstly extracts the co-author re-305
lations and author-venue relations, and then, generates the co-publication
networks. There is a link between two authors if they coauthored at least
one paper, as well as a link between researcher and venue if the researcher
published a paper in the venue.
Step2. After initializing the rank score of nodes and weight of edges, PAVE310
runs on the network. During the random walk process, the walker skips to
next node with a modified probability by considering the three academic
factors. The walk will stop until the rank score approximate convergent
or the iterations come to the upper limit.
Step3. After getting the convergent rank score of each node, PAVE sorts315
the venue in accordance to their corresponding rank scores. Finally, re-
move the venues with which the target author has contacted, the TopN
venues are recommended to the target author.
We then present details of how the transfer matrix with bias is computed by
considering the three academic factors.320
3.2. Transfer Matrix with Bias
A random walk in network is a transition from a node to another node. In
the network, if the walker walks from node u, the probability that the walker
13
walks to node vby the next step is only determined by the conditions of node
uand node v. That means, the probability that the walker walks to node vis325
irrelevant to the step before node u. This process is called Markov process. The
process of a random walk is actually a Markov process.
Let puv represent the probability that the walker walks from node uto node
v, then puv can be represented in the following matrix form. This matrix is
called transfer matrix.330
P=
p11 · · · p1m
· · · · · · · · ·
pm1· · · pmm
Obviously, 0 6puv 61, Pm
v=1 puv = 1.
Let tu(n) be the probability of the walker stops at node uafter ntimes walk.
tu(n) is called nsteps state probability. Then the state vector
T(n)=(t1(n), t2(n),· · · , tm(n)) (3)
Apparently, Pm
u=1 tu(n) = 1.
According to the total probability formula, we get335
tv(n+ 1) =
m
X
u=1
tu(n)puv n= 0,1,2,· · · (4)
Then we get the general recursive formula
T(n) = T(0)Pn(5)
It can be known from the Equation (5) that in order to improve the efficiency
of the algorithm, we should reduce nto cut down the multiplication times of
transition probability matrix. No matter what the final recommend rank in
different algorithms is, transition probability matrix with bias can make the340
walker walk to the suitable venue faster than matrix without bias. This is
because the walker walks on purpose under transition probability matrix with
bias, which reduce the steps of walking. So both nand the multiplication times
14
Figure 3: Process of random walk.
can be reduced to cut down running time of algorithm. That is, we should guide
the walker to the nodes that are more proper by proposing a transfer matrix with345
bias instead of transfer matrix without bias. Therefore, we use a transfer matrix
with bias in PAVE, of which each element represents the transition probability
between two corresponding nodes.
According to the Figure 3, we can clearly see the process of random walk
with the new transfer matrix. After initializing nodes and edges weight, we350
modify the transfer matrix by taking three steps as follows into consideration
and get the transfer matrix with bias.
Differentiate the weights of author-venue relations and co-author relations.
Explore frequency of interactions among researchers.
Take the academic level of researchers into consideration.355
Referring to the example shown in Figure 1, there are eight academic entities.
With respect to recommend venues to Alice, she has never contacted venues C
15
and D. According to the characteristics of the RWR model, the walker can walk
from Alice to venues C and D via Bob and Cindy respectively. After several
times of iterative walking, venues C and D are recommended to Alice based on360
the sorted rank score. However, there are several academic factors that can be
introduced to meet the real scene. We exploit three of them to redefine the
transfer matrix in RWR.
Generally, researchers prefer contacting the academic entities (researchers
and venues) which have high frequency of interaction with them, i.e. high365
publishing frequency in the venue or high collaborating frequency with the re-
searchers. As shown in Figure 1, Alice prefers contacting Bob rather than Cindy
because Alice collaborated with Bob twice and with Cindy once. Bob seems to
be more important than Cindy for Alice. Furthermore, Alice prefers contacting
venue A rather than B, since Alice published two papers in venue A. Based on370
this assumption, we define co-publication frequency as Equation (6).
Fu,v =
CPu,v uAuthor, v V enues
CTu,v u, v Authors
(6)
wherein, CPu,v is the count of author u’s publications in venue v.CTu,v is
author u’s collaboration times with author v.
In addition, there are two kinds of associations in co-publication networks,
i.e., co-author relations and author-venue relations. In the case of basic random375
walk model, the difference between these two relations is ignored. Author-
venue relations seems to be more important than co-author relations, because
the event of publishing a paper in the venue is more preferable when profil-
ing the researchers’ interest. This proposition has been proved in subsequent
experiments which can lead to better performance when making academic rec-380
ommendation. We measure the relation weight using Equation (7) based on a
ratio β.
Wu,v =βFu,v (7)
The ratio βis a variable empirical value which is used to regulate the importance
of author-venue relations and co-author relations. The issue is how to set β
16
respectively for these two kinds of relations to achieve the best recommending385
performance. We conduct amount of experiments to determine the β. In PAVE,
the settings of βis determined as 20 for author-venue relations and 1 for co-
author relations, which verifies our hypothesis, the author-venue relations is
more important than co-author relations when profiling the researchers’ interest.
Finally, we propose an assumption: the interest features of academic en-390
tities can be more accurately reflected by similar level neighbors. In case of
researchers, they are more likely to contact other researchers with similar aca-
demic levels and publish papers in a venue which is most likely to accept their
papers. In other words, the relations between similar-level academic entities are
more weighty. The walker should walk along these nodes with more probabili-395
ties in PAVE. In order to measure the similarity of academic entities, we define
a simple metric as shown in equation 8.
LevSimu,v = 1 kARuARvk
maxxNu(kARuARxk)(8)
The Nuis the neighbors set of node u. Equation (8) aims at discovering the
neighbor with smallest rank score disparities based on a normalization method.
If node vshows a maximal gap with node ucomparing with u’s other neighbors,400
the LevSimu,v will be zero. When computing the transfer probability Su,v from
node uto node v, PAVE model adopts Equation (9). The walker can run on
the network with a modified bias.
Su,v =Wu,v
PxNuWu,x
LevSimu,v (9)
4. Performance Evaluation
We conducted extensive experiments using data from DBLP [18], a computer405
science bibliography website hosted at University of Trier in Germany. In this
section, we describe the statistics of the data set, the evaluation metrics and
our experimental procedure for evaluating the performance of PAVE, as well as
detailed analysis of the results.
17
4.1. Experimental Settings410
To measure the performance of PAVE, we implement three comparison ap-
proaches, i.e. the basic RWR model, a Topic-based model and a Friends-based
model. The detailed settings are presented following. (1) RWR is a popular
model widely used in recommender systems. Similar to popular random walk
models, the details and verification method of RWR is resemble to PAVE, except415
the definition of transfer matrix with bias. The probabilities of skipping to next
neighbor node are equal in RWR. (2) The Topic-based method is a content-based
recommendation approach in the strict sense, which is also a kind of famous ap-
proach for content-based recommender system. The core of the approach is to
compute the similarity between researchers and venues. In this implementation,420
we regard the topic distribution of researchers’ publications content and venues’
publications content as feature vectors respectively, which are calculated by an
LDA model [6]. We define the Kargument in LDA as value 10, which means we
clustered 10 topics for each venue and researchers. The similarity of researchers
and venues is defined by the Cosine Similarity based on these feature vectors.425
(3) The Friends-based model is a kind of neighborhood-based recommendation
approach, which are widely used in social network-based recommendation. The
basic idea of friends-based model is to recommend venues according to the num-
ber of neighbors who have relations with the venues. In this implementation,
we treat the researcher’s collaborators and ”collaborators of collaborator” as430
neighbors. If there are many neighbors who contact a venue, the venue should
be recommended to the researcher.
4.2. Data Set
DBLP indexes more than 3.35 million articles in computer science. In this
experiments, the big scale of data makes it time consuming to process the data435
and run the PAVE model. To reduce training time, we use a subset of DBLP.
This subset covers the field of data mining, involving 74 venues (36 journals
and 38 conferences) and 70,326 researchers altogether. Researchers and venues
are connected by 163,446 articles in this co-publication network. Covering most
18
Figure 4: Detailed statistics of the data set from DBLP.
high-quality journals and conferences in the data mining area, the subset has440
been used by other related studies with no subjective bias [36]. The statistics
pertaining to the data set is shown in Table 1. The data set is divided into two
parts. The data before year 2011 are chosen as a training set, and the rest as a
test set.
The detailed statistical characteristic of this co-publication network is shown445
in Figure 4. Figure 4(a) describes the scale of participants or contributors for
each venue. Almost half of the venues keep no more than 500 researchers. The
scale of 11 venues is so large that up to 3,000 researchers publish papers in them.
We can also observe that from Figure 4(b), almost 94.09% of these 70,326 re-
searchers contact not more than 3 venues (77.67% for 1 venue, 11.88% for 2450
venues and 4.54% for 3 venues). However, there are also some excellent re-
19
Table 1: Statistics of Data Set from DBLP.
Statistics venues researchers articles
Number 74 70326 163446
1
searchers with high academic level (account for 0.13%) contributing more than
14 venues. Similarly, Figure 4(c) shows the same trend for the number of re-
searchers’ publications. Most of them published not more than five papers, but
there were also 1.64% researchers publishing more than 14 papers. Figure 4(d)455
shows the number of co-authors for each researchers. In general, the distribu-
tions in Figure 4(b), Figure 4(c), and Figure 4(d) are in the line with long tail
distribution, which correspond to the fact that fewer researchers contribute the
most products or have the most co-authors. We can conclude that, the degrees
(number of neighbors) of most researchers are under 14, which indicates that460
this data set is very sparse.
All experiments were performed on a 64-bit Linux-based operation system,
Ubuntu 12.04 with a 4-duo and 3.2-Ghz Intel CPU, 8-G Bytes memory, and
implemented with Python.
4.3. Metrics465
In our previous work [9], we employ three popular metrics [28], precision, re-
call and F1 score, to evaluate the performance of recommendation. In this work,
we propose a new metric Ave-Quality to enhance the performance of recommen-
dation. For academic recommendation, we usually get a recommendation list
as the output. There is also an accepted list for the target node. So, we can470
divide the result data into three parts, whose details are shown as follows.
A: The recommended and collaborated nodes;
B: The recommended and not collaborated nodes;
20
C: The collaborated and not recommended nodes.
The definition of precision is shown as below:475
P=A
A+B(10)
The metric recall is defined as:
R=A
A+C(11)
To get an integrated metric over precision and recall, we can measure the
model by F1 score, which is usually called F1 and the equation is:
F1 = 2(PR)
P+R(12)
In recommender systems, the quality of recommended items is of great con-
cern. The higher quality systems recommended, the better performance the480
recommendation achieved. It is worth noting that the recommendation could
still be in high quality even if the authors paper was rejected. However, such
data can hardly be obtained, which makes it difficult to consider rejected pa-
pers. As a consequence, in this work we regard a recommendation as high
quality when the author’s paper was accepted for publication in the test set. To485
evaluate the quality of the recommended venues generated by PAVE, we pro-
pose a metric Ave-Quality based on Google’s h5-index1. h5-index is a famous
and authoritative metric, which represents the venues academic level. A venue
is with an h5-index refers that this venue has published h papers each of which
has been cited in other papers at least h times in recent 5 years. The formalized490
definition is shown in equation 13. Vis the set of recommended items. Mis the
length of recommendation list and H5vis the h5-index of venue v. If the average
h5-index of recommended venues is high, that means the PAVE performs well
in recommending high quality venues.
Ave-Quality =PM
vVH5v
M(13)
1https:\\scholar.google.comintlenscholarmetrics.html#metrics
21
In this work, we will use this four metrics to evaluate the performance of495
PAVE.
4.4. Results and Analysis
In this section, we initially implement several experiments for PAVE, basic
RWR, topic-based and friends-based recommendation model on data set dis-
cussed above. We randomly choose 100 researchers as target nodes and run500
PAVE with different target nodes, then, average the value of metrics for the 100
times in the experiments. We repetitively implement such experiments with
recommendation lists of different lengths to evaluate the influence of recom-
mendation list on the result. Additionally, PAVE and RWR are implemented
with a αof 0.8, which is proved to be appropriate in following experiments.505
Figure 5: Performance of PAVE, basic RWR, topic-based and friends-based recommendation
model.
Figure 6: Impact of researchers’ publications number (PN) on PAVE.
In recommendation models, higher efficiency generally refers to higher rec-
ommendation accuracy with shorter length of recommendation list. Figure 5
shows the performance of PAVE, basic RWR, topic-based and friends-based
22
Figure 7: Impact of damping coefficient (Dm) on PAVE.
recommendation model. The xaxis represents the length of recommendation
list, which is in the range of 1-25. The yaxis represents precision, recall and F1510
score respectively. In Figure 5(a), topic-based model decreases with the length of
recommendation list grows and the other three models decline with fluctuation
when the length of recommendation list grows. Topic-based and friends-based
recommendation models perform better in precision only when the length of
recommendation list is 1. However, PAVE and basic RWR perform better in515
precision as a whole. A close view of range 1 to 11 on xaxis, PAVE achieves
higher precision, it comes to a peak value of 8.7% when recommending 3 venues.
With the growth of recommendation list, the performance of the four recom-
mendation approaches tend to be similar. In Figure 5(b), the lines rise. PAVE
and basic RWR have no significant difference, but their recall perform better520
than that of topic-based and friends-based approach. With the number of rec-
ommended venues reaching the max of venues, the recall approximates to 1.
According to Figure 5(c), the F1 score shows similar trend with precision. The
F1 score of PAVE reaches the highest value of 12.95% when recommending 9
venues for each researcher. The upgrade rate ( F1(P AV E )F1(RW R)
F1(RW R)) is 11.3% in525
comparison to basic RWR. It is worth mentioned that, PAVE reaches its peak
at point 9, while basic RWR achieves the highest F1 score at point 11. That
means the recommendation efficiency of PAVE is higher.
These experimental results demonstrate that, the RWR based model can
achieve more accurate academic venue recommendation than topic-based and530
friends-based approaches. Furthermore, our work on transfer matrix with bias
23
improves the performance of PAVE, and makes the recommendation more effi-
cient. Comparing with RWR, the proposed transfer matrix with bias in PAVE
makes it possible for the walker walks along with preferred path rapidly and
precisely. Based on the analysis of experiment data and the theory of PAVE535
model, it can be confirmed that PAVE model does improve the recommendation
accuracy and the modification of transfer matrix with bias is quite proper.
We also made several extensive experiments to measure the performance
of PAVE on different researchers. We mainly focused on the difference of re-
searchers academic level, which is reflected by the number of publications. To540
some extent, the number of publications can reflect the researchers’ contribu-
tions and activeness. Generally, in computer science domain, junior researchers
show lower academic level with few publications, while senior professor show
higher academic level with a lot of high-quality publications. We divide the re-
searchers into three sets: (1) C1 contains researchers whose publications range545
from 2 to 8. This is to ensure the target researcher can appear in both training
and testing data sets. Moreover, we ignore the researchers with only one pub-
lication; (2) C2 contains researchers with 8 to 15 publications; (3) C3 contains
researchers with more than 15 publications. The experimental results are shown
in Figure 6.550
From Figure 6, we can see significant differences relating to the effect on
different sets of researchers even similar trends are shown in precision, recall,
and F1 score respectively. In Figure 6(c), the PAVE achieves the highest value
of 16.37% for F1 score at point 5 when making academic venue recommendation
for the researchers with 2 to 8 publications. The results mean that, PAVE can555
perform better at recommending academic venues for researchers with fewer
publications, i.e., junior researchers, which meets our innovative intention that
recommend academic venues for more effective research and collaboration.
We conduct experiments to show the impact of damping coefficient on PAVE
as shown in Figure 7. For the damping coefficient is between 0 to 1, we test560
four different values of damping coefficient, 0.2, 0.4, 0.6, 0.8, respectively. We
can see it also show the similar trends for the metrics precision, recall, and F1.
24
From Figure 7(a), we can see precision reaches the highest value of 8.7% when
the damping coefficient is 0.8%. For recall, it shows an upward trend and also
higher with the damping coefficient value of 0.8. Similar to precision, F1 gets565
the higher value when the damping coefficient is 0.8 as shown in Figure 7(c).
All in all, PAVE shows the best performance when the damping coefficient is
0.8.
Figure 8: The New Metric Ave-Quality.
Furthermore, we explore the performance of the four models on Ave-Quality.
The αis set as 0.8. In Figure 8, we can see PAVE shows the best performance for570
the Ave-Quality. In other words, PAVE recommends venues of higher academic
level for researchers than other models. When the recommendation list is 3,
Ave-Quality reaches the peak. With the increasing of recommendation list, Ave-
Quality shows a downward trend, but the PAVE is still better than others. This
phenomenon corresponds to the theory that random walk model can identify575
the high-level node with the biased transfer matrix, which means that the three
academic factors we explored can lead the rank value transfer along high-level
nodes. Therefore, PAVE model can rank the high-level node on the top of the
recommending list and finally improve the quality of recommended venues. In
conclusion, PAVE shows a better performance than the other baseline methods.580
25
5. Conclusion
In this paper, we have focused on academic venue recommendation for re-
searchers based on the big scholarly data which is necessary in current academia.
To this end, we have proposed a novel academic venue recommendation model
called PAVE, which exploits three academic factors (i.e., co-publication fre-585
quency, relation weight and researchers’ academic level) to define transfer ma-
trix with bias which drives a random walk with restart model running on co-
publication networks. We conduct extensive experiments on a subset of DBLP
data set to evaluate the performance of PAVE in comparison to other state-
of-the-art approaches: basic RWR, topic-based approaches, and friends-based590
approaches. The experimental results show that, PAVE outperforms the other
approaches in terms of precision, recall, F1 score, and Ave-Quality. According
to the extended experiment, PAVE performs better at recommending academic
venues for researchers with fewer publications, i.e., junior researchers.
Nonetheless, there is still much work for future study in this direction. We595
only exploit three academic factors in co-publication networks. There are many
other features such as citation relations that need to be explored in PAVE. As
future work, more experiments will be performed on other academic data sets.
Acknowledgments
The authors extend their appreciation to the International Scientific Part-600
nership Program ISPP at King Saud University for funding this research work
through ISPP#0078.
References
[1] Adomavicius, G., Tuzhilin, A.. Toward the next generation of recom-
mender systems: A survey of the state-of-the-art and possible extensions.605
IEEE Transactions on Knowledge and Data Engineering 2005;17(6):734–
749.
26
[2] Backstrom, L., Leskovec, J.. Supervised random walks: predicting and
recommending links in social networks. In: Proceedings of the 4th ACM
international conference on Web search and data mining. ACM; 2011. p.610
635–644.
[3] Balog, K., Ramampiaro, H., Takhirov, N., Nørv˚ag, K.. Multi-step
classification approaches to cumulative citation recommendation. In: Pro-
ceedings of the 10th Conference on Open Research Areas in Information
Retrieval. 2013. p. 121–128.615
[4] Beel, J., Gipp, B., Langer, S., Breitinger, C.. Research-paper rec-
ommender systems: a literature survey. International Journal on Digital
Libraries 2016;17(4):305–338.
[5] Beierle, F., Tan, J., Grunert, K.. Analyzing social relations for rec-
ommending academic conferences. In: Proceedings of the 8th ACM Inter-620
national Workshop on Hot Topics in Planet-scale mObile computing and
online Social neTworking. ACM; 2016. p. 37–42.
[6] Blei, D.M., Ng, A.Y., Jordan, M.I.. Latent dirichlet allocation. The
Journal of Machine Learning Research 2003;3:993–1022.
[7] Caragea, C., Silvescu, A., Mitra, P., Giles, C.L.. Can’t see the forest625
for the trees?: a citation recommendation system. In: Proceedings of the
13th ACM/IEEE-CS joint conference on Digital libraries. ACM; 2013. p.
111–114.
[8] Chen, J., Chen, G., Zhang, H., Huang, J., Zhao, G.. Social recommenda-
tion based on multi-relational analysis. In: IEEE/WIC/ACM International630
Conferences on Web Intelligence and Intelligent Agent Technology. IEEE;
volume 2; 2012. p. 471–477.
[9] Chen, Z., Xia, F., Jiang, H., Liu, H., Zhang, J.. Aver: Random
walk based academic venue recommendation. In: Proceedings of the 24th
27
International Conference on World Wide Web Companion. WWW; 2015.635
p. 579–584.
[10] Cohen, S., Ebel, L.. Recommending collaborators using keywords. In:
Proceedings of the 22nd international conference on World Wide Web com-
panion. WWW; 2013. p. 959–962.
[11] Dhanda, M., Verma, V.. Recommender system for academic literature640
with incremental dataset. Procedia Computer Science 2016;89:483–491.
[12] Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.. Random-walk
computation of similarities between nodes of a graph with application to
collaborative recommendation. IEEE Transactions on Knowledge and Data
Engineering 2007;19(3):355–369.645
[13] He, Q., Pei, J., Kifer, D., Mitra, P., Giles, L.. Context-aware citation
recommendation. In: Proceedings of the 19th international conference on
World wide web. ACM; 2010. p. 421–430.
[14] Hornick, M.F., Tamayo, P.. Extending recommender systems for disjoint
user/item sets: The conference recommendation problem. IEEE Transac-650
tions on Knowledge and Data Engineering 2012;24(8):1478–1490.
[15] Huynh, T., Hoang, K.. Modeling collaborative knowledge of publish-
ing activities for research recommendation. In: International Conference
on Computational Collective Intelligence Technologies and Applications.
Springer; 2012. p. 41–50.655
[16] Lee, D.H., Brusilovsky, P., Schleyer, T.. Recommending collaborators
using social features and mesh terms. Proceedings of the American Society
for Information Science and Technology 2011;48(1):1–10.
[17] Lemarchand, G.A.. The long-term dynamics of co-authorship scien-
tific networks: Iberoamerican countries (1973–2010). Research Policy660
2012;41(2):291–305.
28
[18] Ley, M.. Dblp: some lessons learned. Proceedings of the VLDB Endow-
ment 2009;2(2):1493–1500.
[19] Li, L., Chu, W., Langford, J., Wang, X.. Unbiased offline evaluation
of contextual-bandit-based news article recommendation algorithms. In:665
Proceedings of the fourth ACM international conference on Web search
and data mining. ACM; 2011. p. 297–306.
[20] Liang, D., Charlin, L., McInerney, J., Blei, D.M.. Modeling user exposure
in recommendation. In: Proceedings of the 25th International Conference
on World Wide Web. International World Wide Web Conferences Steering670
Committee; 2016. p. 951–961.
[21] Lopes, G.R., Moro, M.M., Wives, L.K., De Oliveira, J.P.M.. Col-
laboration recommendation on academic social networks. In: Advances
in Conceptual Modeling–Applications and Challenges. Springer; 2010. p.
190–199.675
[22] Luong, H., Huynh, T., Gauch, S., Do, L., Hoang, K.. Publication venue
recommendation using author network’s publication history. In: Intelligent
Information and Database Systems. Springer; 2012. p. 426–435.
[23] Page, L., Brin, S., Motwani, R., Winograd, T.. The pagerank citation
ranking: bringing order to the web. Stanford Digital Libraries Working680
Paper 1999;9(1):1–14.
[24] Pan, C., Li, W.. Research paper recommendation with topic analysis.
In: 2010 International Conference on Computer Design and Applications.
IEEE; volume 4; 2010. p. V4–264–V4–268.
[25] Pham, M.C., Cao, Y., Klamma, R., Jarke, M.. A clustering approach685
for collaborative filtering recommendation using social network analysis. J
UCS 2011;17(4):583–604.
[26] Pham, M.C., Kovachev, D., Cao, Y., Mbogos, G.M., Klamma, R..
Enhancing academic event participation with context-aware and social rec-
29
ommendations. In: IEEE/ACM International Conference on Advances in690
Social Networks Analysis and Mining. IEEE Computer Society; 2012. p.
464–471.
[27] Rohani, V.A., Kasirun, Z.M., Kumar, S., Shamshirband, S.. An effective
recommender algorithm for cold start problem in academic social networks.
Mathematical Problems in Engineering 2014;2014(2):505–519.695
[28] Shani, G., Gunawardana, A.. Evaluating recommendation systems. In:
Recommender systems handbook. Springer; 2011. p. 257–297.
[29] Stokes, J., Weber, S.. A markov chain model for the search time for max
degree nodes in a graph using a biased random walk. In: 2016 Annual
Conference on Information Science and Systems (CISS). IEEE; 2016. p.700
448–453.
[30] Sugiyama, K., Kan, M.Y.. Scholarly paper recommendation via user’s re-
cent research interests. In: Proceedings of the 10th annual joint conference
on Digital libraries. ACM; 2010. p. 29–38.
[31] Sugiyama, K., Kan, M.Y.. Towards higher relevance and serendipity in705
scholarly paper recommendation. ACM SIGWEB Newsletter 2015;(Win-
ter):4.
[32] Tang, J., Wu, S., Sun, J., Su, H.. Cross-domain collaboration recommen-
dation. In: Proceedings of the 18th ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM; 2012. p. 1285–1293.710
[33] West, J.D., Wesley-Smith, I., Bergstrom, C.T.. A recommendation
system based on hierarchical clustering of an article-level citation network.
IEEE Transactions on Big Data 2016;2:113–123.
[34] Wongchokprasitti, C., Brusilovsky, P., Parra-Santander, D.. Conference
navigator 2.0: community-based recommendation for academic conferences.715
In: Workshop on Social Reminder Systems. ACM; 2010. .
30
[35] Xia, F., Asabere, N.Y., Rodrigues, J.J., Basso, F., Deonauth, N., Wang,
W.. Socially-aware venue recommendation for conference participants. In:
IEEE International Conference on Ubiquitous Intelligence and Computing.
IEEE; 2013. p. 134–141.720
[36] Xia, F., Chen, Z., Wang, W., Li, J., Yang, L.T.. Mvcwalker: Ran-
dom walk-based most valuable collaborators recommendation exploiting
academic factors. IEEE Transactions on Emerging Topics in Computing
2014;2(3):364–375.
[37] Xia, F., Liu, H., Lee, I., Cao, L.. Scientific article recommendation: Ex-725
ploiting common author relations and historical preferences. IEEE Trans-
actions on Big Data 2016;2:101–112.
[38] Xia, F., Wang, W., Bekele, T.M., Liu, H.. Big scholarly data: A survey.
IEEE Transactions on Big Data 2017;3(1):18–35.
[39] Yang, Z., Davison, B.D.. Distinguishing venues by writing styles. In: Pro-730
ceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries.
ACM; 2012. p. 371–372.
[40] Yang, Z., Davison, B.D.. Venue recommendation: Submitting your paper
with style. In: International Conference on Machine Learning and Appli-
cations. IEEE; volume 1; 2012. p. 681–686.735
[41] Yang, Z., Yin, D., Davison, B.D.. Recommendation in academia: A
joint multi-relational model. In: IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining. IEEE; 2014. p. 566–571.
[42] Yaw Asabere, N., Xia, F., Wang, W., Rodrigues, J.J., Basso,
F., Ma, J.. Improving smart conference participation through socially740
aware recommendation. IEEE Transactions on Human-Machine Systems
2014;44(5):689–700.
[43] Yu, J., Xie, K., Zhao, H., Liu, F.. Prediction of user interest based on
collaborative filtering for personalized academic recommendation. In: 2nd
31
International Conference on Computer Science and Network Technology.745
IEEE; 2012. p. 584–588.
32
... There are number of articles that have used RS for suggesting publication venues. Some articles uses CBF [29][30][31][32], CF [33][34][35][36][37][38][39][40][41][42], hybrid model [43][44][45][46], ranking algorithm [47], SNA [48][49][50][51][52], or different algorithms [53][54][55][56][57][58][59][60] for identifying venues for their domain to publish their research work. None of them deal with the changing interest of the user along with keywords correlation is used along with the title and abstract to find the publication venue, total citation count to find the most appropriate venue, and researchers' opinion on the factors depends upon, impact factor, eigen factor, total citation, review time, acceptance rate, fees, H-index, SJR, article influence. ...
... This does not deal with the content of the article to be submitted thus can provide irrelevant recommendations. Chen at el. works on a technique that is used for suggesting publication venue using a random walk algorithm with the restart model [49,50]. This model is applied to the co-publication network. ...
... The comparative analysis is performed on PUB-VEN with the baseline algorithms of CBF using TF-IDF [5] and CBF using LSA [18]. The evaluation is performed using precision, recall, F1, DCG, and NDCG at N where N is number of recommendations and N ∈{10, 20,30,40,50,60,70,80,90,100}. The precision, recall, and F1 use values according to the metrics of confusion matrix shown in Table 4. ...
Article
Full-text available
Researchers would like to publish their research articles in reputed journals along with quick review time. However, with the growing number of academic publications, it is becoming more difficult for scholars to find venues that are relevant to their domain. This study aims on the development of a technique that focuses on the priorities of the researchers that are linked to the recommendation of suitable suggestion of publication journal. The developed Recommendation System (RS) takes title, abstract, and keyword of the manuscript to be submitted. The proposed algorithm, named PUB-VEN which is hybridization of Content-Based Filtering (CBF), and Collaborative Filtering (CF), which is integrated with the Multi-Criteria Decision Making (MCDM) process to provide suitable journal recommendations by considering the researcher's point of view about different attributes gathered such as impact factor, eigen factor, average review time, etc. which affect the research process effectively. Our results demonstrate that the PUB-VEN provides better recommendations in comparison with state-of-the-art algorithms such as Term Frequency and Inverse Document Frequency (TF-IDF) and Latent Semantic Analysis (LSA). The study concluded that PUB-VEN is providing better precision, recall, F1 Score, Discounted Cumulative Gain (DCG), and Normalized DCG (NCDG). For precision, the gain ranges from 1% to 16%, the improvement in recall is between 33% and 3%, the betterment of result in F1 is by the ratio which ranges from 27% and 2%, the improvement in the result of DCG lies between 15% and 5% and the result of NDCG gain ranges from 6% to 1%. It is useful for the researchers in finding suitable venue for publication.
... Features such as historical data references (Zhou et al. 2020;White et al. 2013;Zou et al. 2022) enable models to leverage past interactions for future predictions, while algorithm-agnostic models Musto et al. 2019;Mandayam Comar and Sengamedu 2017) offer flexibility in selecting the most suitable algorithms for specific tasks. Model-based features Pradhan et al. 2021;Yu et al. 2018), which rely on statistical methods (Schlaefer et al. 2011;Kim et al. 2017) and semantic analysis (Zhang and Zhong 2016;Xu et al. 2015), are used to provide predictions based on predefined models. ...
Article
Full-text available
User intent modeling in natural language processing deciphers user requests to allow for personalized responses. The substantial volume of research (exceeding 13,000 publications in the last decade) underscores the significance of understanding prevalent models in AI systems, with a focus on conversational recommender systems. We conducted a systematic literature review to identify models frequently employed for intent modeling in conversational recommender systems. From the collected data, we developed a decision model to assist researchers in selecting the most suitable models for their systems. Furthermore, we conducted two case studies to assess the utility of our proposed decision model in guiding research modelers in selecting user intent modeling models for developing their conversational recommender systems. Our study analyzed 59 distinct models and identified 74 commonly used features. We provided insights into potential model combinations, trends in model selection, quality concerns, evaluation measures, and frequently used datasets for training and evaluating these models. The study offers practical insights into the domain of user intent modeling, specifically enhancing the development of conversational recommender systems. The introduced decision model provides a structured framework, enabling researchers to navigate the selection of the most apt intent modeling methods for conversational recommender systems.
... Actually, social network-based approaches frequently make journal recommendations by incorporating other relevant subject matter information, interaction frequencies, and related factors. Yu et al. (2018) proposed PAVE, a personalized academic venue recommendation exploiting copublication networks that contains co-author relations and author-venue relations. Meanwhile, co-publication frequency, academic level and weights between two kinds of associations were combined into the recommendation model to achieve better performance. ...
Preprint
Full-text available
To meet scholars' need to recommend both higher accuracy and diversity when submitting interdisciplinary papers, this paper proposes an improved journal diversity recommendation method based on the attention mechanism in deep learning. This method can retain all key information in long texts by using the attention mechanism. It identifies and stores the research directions and hotspots covered in different papers across journals to extract common research topics for each journal type. Five deep learning models based on attention mechanism are introduced, 104,176 paper abstracts from 111 Web of Science journals are used to fine-tune the models. After learning on training set and model testing on the test set, recommendation accuracy and diversity results are calculated for 9 categories. Finally, the recommendation accuracy and diversity of the 5 attention mechanism based deep learning models are compared with benchmark models across different journal types. The experimental results demonstrate the feasibility and superiority of this method comprehensively considering the metrics of accuracy and diversity at a large scale. It provides theoretical and practical advancements to develop an effective journal recommender system which helps scholars to make wise decision for journal submission.
... Algorithm-agnostic models [79][80][81] provide the flexibility to select the most suitable algorithm for a particular task. Model-based [82][83][84] features leverage statistical methods [85,86] and semantic analysis [87,88] to offer predictions based on specific models. Table 3 illustrates the mapping of features to models in user intent modeling, highlighting the frequency of explicit mentions in relevant publications. ...
Preprint
Full-text available
Context: User intent modeling is a crucial process in Natural Language Processing that aims to identify the underlying purpose behind a user’s request, enabling personalized responses. With a vast array of approaches introduced in the literature (over 13,000 papers in the last decade), understanding the related concepts and commonly used models in AI-based systems is essential. Method: We conducted a systematic literature review to gather data on models typically employed in designing conversational recommender systems. From the collected data, we developed a decision model to assist researchers in selecting the most suitable models for their systems. Additionally, we performed two case studies to evaluate the effectiveness of our proposed decision model. Results: Our study analyzed 59 distinct models and identified 74 commonly used features. We provided insights into potential model combinations, trends in model selection, quality concerns, evaluation measures, and frequently used datasets for training and evaluating these models. Contribution: Our study contributes practical insights and a comprehensive understanding of user intent modeling, empowering the development of more effective and personalized conversational recommender systems. With the Conversational Recommender System, researchers can perform a more systematic and efficient assessment of fitting intent modeling frameworks.
... Other systems exploit the literature cited by the article, and try to determine the best publication venue using bibliometric methods [10,11]. An alternative stream of research focuses more on the article authors, exploring their publication history [12] and copublication networks [13,14]. ...
Article
Full-text available
Finding a suitable open access journal to publish academic work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of predatory publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. A systematic requirements analysis was conducted in the form of a survey. The developed tool suggests open access journals based on title, abstract and references provided by the user. The recommendations are built on open data, publisher-independent and work across domains and languages. Transparency is provided by its open source nature, an open application programming interface (API) and by specifying which matches the shown recommendations are based on. The recommendation quality has been evaluated using two different evaluation techniques, including several new recommendation methods. We were able to improve the results from our previous paper with a pre-trained transformer model. The beta version of the tool received positive feedback from the community and in several test sessions. We developed a recommendation system for open access journals to help researchers find a suitable journal. The open tool has been extensively tested, and we found possible improvements for our current recommendation technique. Development by two German academic libraries ensures the longevity and sustainability of the system.
Article
This paper introduces an innovative approach, the LS-SLM (Local Search with Smart Local Moving) technique, for enhancing the efficiency of article recommendation systems based on community detection and topic modeling. The methodology undergoes rigorous evaluation using a comprehensive dataset extracted from the “dblp. v12.json” citation network. Experimental results presented herein provide a clear depiction of the superior performance of the LS-SLM technique when compared to established algorithms, namely the Louvain Algorithm (LA), Stochastic Block Model (SBM), Fast Greedy Algorithm (FGA), and Smart Local Moving (SLM). The evaluation metrics include accuracy, precision, specificity, recall, F-Score, modularity, Normalized Mutual Information (NMI), betweenness centrality (BTC), and community detection time. Notably, the LS-SLM technique outperforms existing solutions across all metrics. For instance, the proposed methodology achieves an accuracy of 96.32%, surpassing LA by 16% and demonstrating a 10.6% improvement over SBM. Precision, a critical measure of relevance, stands at 96.32%, showcasing a significant advancement over GCR-GAN (61.7%) and CR-HBNE (45.9%). Additionally, sensitivity analysis reveals that the LS-SLM technique achieves the highest sensitivity value of 96.5487%, outperforming LA by 14.2%. The LS-SLM also demonstrates superior specificity and recall, with values of 96.5478% and 96.5487%, respectively. The modularity performance is exceptional, with LS-SLM obtaining 95.6119%, significantly outpacing SLM, FGA, SBM, and LA. Furthermore, the LS-SLM technique excels in community detection time, completing the process in 38,652 ms, showcasing efficiency gains over existing techniques. The BTC analysis indicates that LS-SLM achieves a value of 94.6650%, demonstrating its proficiency in controlling information flow within the network.
Article
Full-text available
A scholarly recommendation system is an important tool for identifying prior and related resources such as literature, datasets, grants, and collaborators. A well-designed scholarly recommender significantly saves the time of researchers and can provide information that would not otherwise be considered. The usefulness of scholarly recommendations, especially literature recommendations, has been established by the widespread acceptance of web search engines such as CiteSeerX, Google Scholar, and Semantic Scholar. This article discusses different aspects and developments of scholarly recommendation systems. We searched the ACM Digital Library, DBLP, IEEE Explorer, and Scopus for publications in the domain of scholarly recommendations for literature, collaborators, reviewers, conferences and journals, datasets, and grant funding. In total, 225 publications were identified in these areas. We discuss methodologies used to develop scholarly recommender systems. Content-based filtering is the most commonly applied technique, whereas collaborative filtering is more popular among conference recommenders. The implementation of deep learning algorithms in scholarly recommendation systems is rare among the screened publications. We found fewer publications in the areas of the dataset and grant funding recommenders than in other areas. Furthermore, studies analyzing users’ feedback to improve scholarly recommendation systems are rare for recommenders. This survey provides background knowledge regarding existing research on scholarly recommenders and aids in developing future recommendation systems in this domain.
Article
Full-text available
Finding the right journal for a manuscript to be submitted is difficult and often time-consuming because authors take into account some criteria while searching for the appropriate journal for their manuscript. One of the most important criteria is the content similarity of the journals and manuscript. For this purpose, the subject of the manuscript should be in accordance with the scope of the journal. Also, the manuscript content should be closed to the journals’ trend for higher chance of acceptance. Second criterion is to take into account the impact-factor, acceptance-rate, review-time and publishing houses of the journal, which are suitable for the author’s past publication profile. In this study, a novel method is proposed in which both the content of the article and the author / authors profile are considered together to find the appropriate journal. To the best of our knowledge, this is the first effort in this direction. Experimental results conducted on real data sets have shown that the proposed method is applicable and performs high accuracy values.
Article
Full-text available
With the rapid growth of digital publishing, harvesting, managing, and analyzing scholarly information have become increasingly challenging. The term Big Scholarly Data is coined for the rapidly growing scholarly data, which contains information including millions of authors, papers, citations, figures, tables, as well as scholarly networks and digital libraries. Nowadays, various scholarly data can be easily accessed and powerful data analysis technologies are being developed, which enable us to look into science itself with a new angle. In this paper, we examine the background and state of the art of big scholarly data. We first introduce the background of scholarly data management and relevant technologies. Secondly, we review data analysis methods, such as statistical analysis, social network analysis, and content analysis for dealing with big scholarly data. Finally, we look into representative research issues in this area, including scientific impact evaluation, academic recommendation, and expert finding. For each issue, the background, main challenges, and latest research are covered. These discussions aim to provide a general overview and big picture to scholars interested in this emerging area. This survey paper concludes with a discussion of open issues and promising future directions.
Conference Paper
Full-text available
Academic venues act as the main platform of communities in academia and the bridge of connecting researchers, which have rapidly developed in recent years. However, information overload in big scholarly data creates tremendous challenges for mining useful and effective information in order to recommend researchers to acknowledge high quality and fruitful academic venues, thereby enabling them to participate in relevant academic conferences as well as contributing to important/influential journals. In this work, we propose AVER, a novel random walk based Academic VEnue Recommendation model. AVER runs a random walk with restart model on a co-publication network which contains two kinds of associations, coauthor relations and author-venue relations. Moreover, we define a transfer matrix with bias to drive the random walk by exploiting three academic factors, co-publication frequency, weight of relations and researchers' academic level. AVER is inspired from the fact that researchers are more likely to contact those who have high co-publication frequency and similar academic levels. Additionally, in AVER, we consider the difference of weights between two kinds of associations. We conduct extensive experiments on DBLP data set in order to evaluate the performance of AVER. The results demonstrate that, in comparison to relevant baseline approaches, AVER performs better in terms of precision, recall and F1.
Article
Full-text available
On account of the colossal expansion in the size of research paper repository, the stature of Recommender System has increased, as it can guide the researchers to find papers akin to them from this vast collection. Furthermore, the recommendation methods like collaborative-filtering or content-based do not allow the user's to provide their personalized requirements explicitly; hence the focus is shifted towards the customized Recommender Systems that can scrutinize user's preferences by contemplating their inputs. But the state-of-art recommendation techniques satisfying user's personalized requirements make a strong assumption of static dataset. So, in this work we are going to present a customized Recommender System that can acknowledge the ever growing nature of research paper repository. To accomplish this, the Efficient Incremental High-Utility Itemset Mining algorithm (EIHI), which has been recently introduced in the literature, is used which is specialized to work with dynamic datasets. Experimental results prove that the proposed system satisfies the researcher's personalized requirements and at the same time handles the incremental nature of the research paper repository efficiently.
Conference Paper
Full-text available
Recommender systems are used to filter through vast amounts of items and recommend those that potentially have the highest relevance for the user. Recently, research dealing with recommendations in academia increased. In this paper, we analyze to what extent social relations from existing data can be utilized to generate academic conference recommendations. We design and implement a social recommender system and show how, without the need for explicit ratings, viable recommendations can be made, while at the same time reducing the cost of kNN-neighborhood selection.
Article
Full-text available
Scientific article recommender systems are playing an increasingly important role for researchers in retrieving scientific articles of interest in the coming era of big scholarly data. Most existing studies have designed unified methods for all target researchers and hence the same algorithms are run to generate recommendations for all researchers no matter which situations they are in. However, different researchers may have their own features and there might be corresponding methods for them resulting in better recommendations. In this paper, we propose a novel recommendation method which incorporates information on common author relations between articles (i.e., two articles with the same author(s)). The rationale underlying our method is that researchers often search articles published by the same author(s). Since not all researchers have such author-based search patterns, we present two features, which are defined based on information about pairwise articles with common author relations and frequently appeared authors, to determine target researchers for recommendation. Extensive experiments we performed on a real-world dataset demonstrate that the defined features are effective to determine relevant target researchers and the proposed method generates more accurate recommendations for relevant researchers when compared to a Baseline method.
Conference Paper
Collaborative filtering analyzes user preferences for items (e.g., books, movies, restaurants, academic papers) by exploiting the similarity patterns across users. In implicit feedback settings, all the items, including the ones that a user did not consume, are taken into consideration. But this assumption does not accord with the common sense understanding that users have a limited scope and awareness of items. For example, a user might not have heard of a certain paper, or might live too far away from a restaurant to experience it. In the language of causal analysis (Imbens & Rubin, 2015), the assignment mechanism (i.e., the items that a user is exposed to) is a latent variable that may change for various user/item combinations. In this paper, we propose a new probabilistic approach that directly incorporates user exposure to items into collaborative filtering. The exposure is modeled as a latent variable and the model infers its value from data. In doing so, we recover one of the most successful state-of-the-art approaches as a special case of our model (Hu et al. 2008), and provide a plug-in method for conditioning exposure on various forms of exposure covariates (e.g., topics in text, venue locations). We show that our scalable inference algorithm outperforms existing benchmarks in four different domains both with and without exposure covariates.
Article
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.
Article
The scholarly literature is expanding at a rate that necessitates intelligent algorithms for search and navigation.For the most part, the problem of delivering scholarly articles has been solved. If one knows the title of an article, locating it requires little effort and, paywalls permitting, acquiring a digital copy has become trivial. However, the navigational aspect of scientific search - finding relevant, influential articles that one does not know exist - is in its early development. In this paper, we introduce EigenfactorRecommends - a citation-based method for improving scholarly navigation. The algorithm uses the hierarchical structure of scientific knowledge, making possible multiple scales of relevance for different users. We implement the method and generate more than 300 million recommendations from more than 35 million articles from various bibliographic databases including the AMiner dataset. We find little overlap with co-citation, another well-known citation recommender, which indicates potential complementarity. In an online A-B comparison using SSRN, we find that our approach performs as well as co-citation, but this new approach offers much larger recommendation coverage. We make the code and recommendations freely available at babel.eigenfactor.organd provide an API for others to use for implementing and comparing the recommendations on their own platforms.