ArticlePDF Available

LDICDL: LncRNA-disease association identification based on Collaborative Deep Learning


Abstract and Figures

It has been proved that long noncoding RNA (lncRNA) plays critical roles in many human diseases. Therefore, inferring associations between lncRNAs and diseases can contribute to disease diagnosis, prognosis and treatment. To overcome the limitation of traditional experimental methods such as expensive and time-consuming, several computational methods have been proposed to predict lncRNA-disease associations by fusing different biological data. However, the prediction performance of lncRNA-disease associations identification need to be improved. In this study, we propose a computational model (named LDICDL) to identify lncRNA-disease associations based on collaborative deep learning. It uses an automatic encoder to denoise multiple lncRNA feature information and multiple disease feature information, respectively. Then, the matrix decomposition algorithm is employed to predict the potential lncRNA-disease associations. In addition, to overcome the limitation of matrix decomposition, the hybrid model is developed to predict associations between new lncRNA (or disease) and diseases (or lncRNA). The ten-fold cross validation and de novo test are applied to evaluate the performance of method. The experimental results show LDICDL outperforms than other state-of-the-art methods in prediction performance.
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
LDICDL: LncRNA-disease association
identification based on Collaborative Deep
Wei Lan, Dehuan Lai, Qingfeng Chen, Ximin Wu, Baoshan Chen, Jin Liu, Jianxin Wang, Yi-Ping
Phoebe Chen
Index Terms—lncRNA-disease associations, matrix factorization, stacked denoising autoencoder.
ITis well known that biological genetic information is
primarily stored in protein-coding genes, and RNA is
the intermediary between DNA sequences and proteins [1].
With the development of human genetic engineering, 2% of
the genes have been confirmed to be protein-coding genes,
and the remaining 98% of the genes have not or few protein
encoding abilities [2]. These genes are usually transcribed
into non-coding RNAs [3]. Non-coding RNAs have been re-
garded as the noise of genomic transcription for a long time
[4], [5]. However, recent studies have shown that they play
important regulatory roles in many biological processes of
organism. In particular, long non-coding RNAs (lncRNAs)
which are greater than 200 nucleotides in length have been
unveiled to be related to a broad range of diseases [6]. For
example, it has been found that HOTAIR is overexpressed in
breast cancer, colon cancer, liver cancer and gastrointestinal
stromal tumors [7]. Therefore, identifying lncRNA-disease
association is helpful for biologist not only in understanding
the underlying mechanisms of disease, but also disease
prevention diagnosis and treatment [8], [9].
Many biological experimental studies have been de-
veloped to discover potential lncRNA-disease associations
[10]. Although these methods can exactly discover lncRNA-
disease association, they also have some limitations such
as time-consuming and expensive. With the development
of high-throughput sequencing technology, a large amount
of lncRNA related data, such as the sequence, structure,
function and expression, has been generated [11], [12]. Thus,
many computation-based algorithms have been proposed
to overcome these limitations for potential lncRNA-disease
associations prediction [13]. These computational methods
can be classified into two categories: (1) network-based
methods that use similarity network to predict lncRNA-
disease associations. For example, sun et al [14] proposed
a computational method, RWRlncD, to identify lncRNA-
disease associations based on lncRNA functional similarity
network and the random walk with restart method. Chen
et al [15] presented an algorithm, IRWRLDA, to predict
lncRNA-disease associations in terms of lncRNA similarity
network. They used various measures to calculate lncR-
NA similarity, and IRWRLDA could be used to diseases
without any lncRNA-disease association. Zhou et al [16]
developed a model, RWRHLD, for lncRNA-disease asso-
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2
ciation predictions by integrating three networks into one
heterogeneous network. According to constructing a multi-
level network of lncRNA-disease, Yao et al [17] proposed an
algorithm, LncPriCNet, to prioritize candidate for lncRNA-
disease associations. (2) machine learning-based methods
that prioritize candidate lncRNAs by training disease re-
lated known lncRNAs and unknown lncRNAs. Lan et al
[18] developed an online web server (LDAP) to identify
new associations between lncRNAs and diseases based on
positive-unlabeled (PU) learning. Chen et al [19] proposed
a method (LRLSLDA) to infer lncRNA-disease associations
based on the semi-Supervised learning. Wu et al [20] p-
resented a computational method(GAMCLDA) to predict
lncRNA-disease associations based on graph autoencoder
matrix completion. Fu et al [21] developed a computational
model (MFLDA) to predict the associations between lncR-
NA and disease based on multiple data fusion and matrix
factorization (MF). Lu et al [22] presented a computational
model (SIMCLDA) to prioritize candidate lncRNAs based
on inductive matrix completion. Chen et al [23] proposed
a computational framework, ILDMSF, for lncRNA-disease
association identification based on multiple kernel fusion
and Support Vector Machine (SVM).
These methods have achieved good performance in
predicting the associations between lncRNAs and diseases.
However, they do not make full use of the known lncRNA
characteristic data and disease characteristic data, so there
are limitations on the accuracy and prediction performance
[24], [25]. This paper proposes a novel computational frame-
work (LDICDL) to predict LncRNA-disease associations.
It uses an automatic encoder to denoise multiple lncRNA
feature information data and multiple disease feature in-
formation data. In addition, the matrix factorization algo-
rithm is employed to predict the potential lncRNA-disease
associations. Further, the hybrid model based on stacked
denoising autoencoder and matrix factorization is develope-
d to overcome the limitation of matrix factorization for de
novo prediction. The experimental results demonstrate our
method has better performance than other state-of-the-art
The task of identifying LncRNA-disease associations can be
viewed as taking implicit feedback as the training and test
data. The LncRNA-disease association matrix is represented
by a matrix LDmn,where mand ndenotes the number of
lncRNAs and diseases, respectively. The element of LD(i,j)
is equal to 1 if lncRNA iis associated with disease j,
otherwise 0. The lncRNA information is integrated into
lncRNA feature matrix LFmt, where tdenotes the number
of features. The disease information is merged into disease
feature matrix DFns, where sdenotes the number of dis-
ease features.
2.1 Stacked Denoising Autoencoder
The stacked denoising autoencoder (SDAE) is a kind of
feedforward neural network which is widely used in rec-
ommend system [26]. In LDICDL, the SDAE is employed to
select lncRNAs and diseases feature information, respective-
ly. The original features of lncRNA and disease are tand s
dimensions, respectively. In final, the lncRNAs and diseases
feature information are reduced into kdimensions by using
SDAE. The mini-batch gradient descent algorithm is used to
train SDAE with the batch size=60.
2.2 Matrix Factorization for lncRNA-disease prediction
In the LncRNA-disease association matrix LD, the element
LD(i, j) is defined as follow:
LD(i, j) = {1,if lncRNA i is related with disease j
0,if lncRNA i is not related with disease j
Therefore, the loss function of matrix factorization for
LncRNA-disease association prediction is defined as follow:
Loss =Σi,j αi,j (LD(i, j)L(i, :) ·D(j, :)T)2
+γ(ΣiL(i, :)2+ΣjD(j, :)2)(2)
where γdenotes the regulation parameter. L(i, :) and D(j, :)
denote lncRNA isubspace feature and disease jsub-
space feature, respectively. αi,j is the parameter to show
the confidence between lncRNA iand disease jwhere
αi,j =1+θ(LD(i, j))..2denotes 2-norm.
2.3 Matrix Factorization with Implicit Feedback for
LncRNA-disease association prediction
Considering that the lncRNA-disease associations predic-
tion performance with matrix factorization is poor for the
new lncRNA or disease, which called cold start problem
[13], [27], the hybrid model is proposed to predict lncRNA
and disease associations by combining matrix factorization
with stocked denoising autoencoder. In our method, pre-
dicting association of new lncRNA or disease relies on the
biological features of lncRNA and disease. The structure of
hybrid model with three hidden layer of lncRNA is show in
Figure 1. Xinput l is the input layer for lncRNA features (i.e.
LF ) and Xencode l is the lncRNA features encoding. Xout l
is the output layer of lncRNA features.
ܺଵ̴௟ ܺୣ௡௖௢ௗ௘̴௟ሺ݅ǡ ǣ ሻ ܺଷ̴௟
ܦሺ݆ǡ ǣ ሻ
ܯ݅ǡ ݆
ൌ ܺ௘௡௖௢ௗ௘̴௟ሺ݅ǡ ǣ ሻܦሺ݆ǡ ǣ ሻ
Fig. 1. The overview of hybrid model of lncRNA.
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
The loss function of hybrid model based on lncRNA
features is defined as follow:
Loss =Σi,j αi,j (LD(i, j)L(i, :) ·D(j, :)T)2
+γ(ΣiL(i, :)2+ΣjD(j, :)2)
+γl(LXencode l2) + γn(Xinput l Xout l2)
+Σlayers γwW2(3)
where γ,γl,γnand γwdenote regularization parameters. W
denotes the weight matrix.
The loss function is minimized by block coordinate de-
cent [28]. The L(i, :) is updated in term of Eq.4 below in
training step:
L(i, :) LD(i, :)C(i)D(γI +DTC(i)D)1(4)
where C(i)is a diagonal matrix where C(i)(j, j)=αi,j .
The D(:, j) is updated in term of Eq.5 below in training
D(:, j)LD(:, j)T˜
C(j)L(γI +LT˜
where ˜
C(j)is a diagonal matrix where ˜
C(j)(i, i)=αi,j .
For disease, the structure with three hidden layers is
show in Figure 2. Xinput d is the input layer for disease fea-
tures (i.e. DF ) and Xencode d represents the disease features
encoding. Xout d is the output layer of disease features.
ܺଵ̴ௗ ܺୣ௡௖௢ௗ௘̴ௗሺ݆ǡ ǣ ሻ ܺଷ̴ௗ
ܮሺ݅ǡ ǣ ሻ
ܯ݅ǡ ݆
ൌ ܮ ሺ݅ǡ ǣ ሻ ȉ ܺ௘௡௖௢ௗ௘̴ௗ ሺ݆ǡ ǣ ሻ
Fig. 2. The overview of hybrid model of disease.
The loss function of hybrid model based on disease
feature information is defined as follow:
Loss =Σi,j αi,j (LD(i, j)L(i, :) ·D(j, :)T)2
+γ(ΣiL(i, :)2+ΣjD(j, :)2)
+γd(DXencode d 2) + γn(Xinput d Xout d 2)
+Σlayers γwW2(6)
where γ,γd,γnand γwdenote regularization parameters.
Wdenotes the weight matrix.
The final predicted score matrix S is calculated as follow:
S(i, j) = Ml(i, j ) + Md(i, j)
Ml(i, j) = Xencode l (i, :) ·D(j, :)T(8)
Md(i, j) = L(i, :) ·Xencode d (j, :)T(9)
where S(i, j) denotes the score between lncRNA iand
disease j.Xencode l and Xencode d denote the sub-feature
matrix of lncRNA and disease which are obtained by SDAE
based on lncRNA and disease feature information, respec-
tively. Land Ddenote the sub-feature matrix of lncRNA
and disease obtained from matrix factorization.
The whole workflow of LDICDL is shown in Figure
3. In the first step, the lncRNA-disease association matrix
is decomposed to lncRNA feature subspace and disease
feature information is encoded by using SDAE. Meanwhile,
the lncRNA-disease association matrix is decomposed to
disease feature subspace and lncRNA feature information
is encoded by SDAE. Then, the lncRNA-disease association
score is predicted based on lncRNA feature matrix and dis-
ease encode matrix, and disease feature matrix and lncRNA
encode matrix, respectively. Last, the final score of lncRNA-
disease association is calculated by averaging the scores.
The Block coordinate decent is used to minimize the loss
function. Firstly, the Land Dare updated by equation 4
and 5, respectively. Then, the parameters in the SDAE are
updated using gradient decent with mini-batch. The mean
square errors of output and encoding are used to adjust the
gradient. It repeats the former steps for ttimes.
Fig. 3. The workflow of LDICDL.
3.1 Datasets
The lncRNA-gene associations are downloaded from lncR-
NA2target [29] and lncRNA-gene function associations are
collected from GeneRIF [30]. They are pre-processed using
Open Biomedical Annotator [31]. The lncRNA-miRNA asso-
ciations and disease-miRNA associations are downloaded
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
from starBase v2.0 [32] and HMDD [33], respectively. The
disease-gene associations are downloaded from DisGeNET
[34]. In final, 2697 associations between 240 lncRNAs and
412 diseases are obtained as gold-standard dataset. In addi-
tion, 6066-dimensions feature information of each lncRNA
from lncRNA-related data and 10621-dimensions feature
information of each disease from disease-related data are
collected, respectively.
3.2 Performance evaluation
The ten-fold cross validation and de novo test are employed
to evaluate the performance of different methods. In ten-fold
cross validation, all known associations between lncRNAs
and diseases are divided into ten folds randomly. In each
test, one fold is selected as the test samples and other nine
folds are treated as training samples. All known associations
in test samples are removed by turns and all other known
associations in training samples are used to train model.
Then, the prediction algorithm is carried out to predict the
scores of test samples and candidate samples. In the de novo
test, for disease i, all known associations are removed as test
samples, while all known associations between lncRNAs
and other diseases are considered as training samples. Then,
the scores of associations between lncRNAs and disease i
are calculated by prediction method. After that, the scores
of test and candidate samples are ranked with descending
order and observe whether its ranking is greater than a
specific threshold. If the rank of test sample is greater than
the threshold, it is considered as true positive, otherwise
false negative. If the rank of candidate sample is greater
than the threshold, it is viewed as false positive, otherwise
true negative. Further, the true positive rate (TPR) and false
positive rate (FPR) are calculated as follows:
T P R =T P
T P +F N (10)
F P R =F P
F P +T N (11)
where T P denotes the number of true positive samples, T N
denotes the number of true negative samples, F P denotes
the number of false positive samples, and F N denotes the
number of false negative samples. The receiver operating
characteristic (ROC) curve is draw based on TPR and FPR
at different thresholds and the Area under of ROC (AUC)
is calculated to evaluate the performance of method. If the
AUC equals to 1, it denotes that this method has perfect
performance. If the AUC equals to 0.5, it denotes that the
prediction of model is uncertain.
In addition, the precision and recall are also calculated
as follows:
P recision =T P
T P +F P (12)
Recall =T P
T P +F N (13)
where precision denotes the proportion of the true positive
samples with rankings higher than the special threshold in
the predicted positive samples, recall denotes the propor-
tion of true positive samples whose ranking is lower than
the special threshold in the whole positive samples. Then,
Precision-Recall (PR) curve is plotted based on precision and
recall. Finally, the area under of PR (AUPR) are computed
to evaluate the performance of method.
3.3 Ten-fold cross validation
In order to evaluate the performance of LDICDL, the ten-
fold cross validation is applied in the experiment. We
compare LDICDL with two state-of-the-art methods based
on matrix completion (SIMCLDA [22] and MFLDA [21]).
The performance of different methods is evaluated in term
of AUC. It can be observed from Figure 4 that LDICDL
achieves the AUC of 0.8651, which is significantly higher
than other methods (SIMCLDA 0.8259 and MFLDA 0.6430).
It demonstrates that our method has higher performance
than other methods. In addition, the AUPR is also utilized
to compare the performance of different methods as shown
in Figure 5. The AUPR of LDICDL is 0.0306 in contrast to
0.0227 and 0.0051 with SIMCLDA and MFLDA, respectively.
It proves that our method is more effective than other two
methods. Figure 6 shows the number of correctly retrieved
known lncRNAdisease associations. It can be found that
LDICDL outperforms other methods from top 10 to top 50
To prove our model can obtain deep latent repre-
sentation of features, we conduct the experimental com-
parison between LDICDL and three classical feature ex-
traction methods including Nonnegative Matrix Factoriza-
tion(NMF), Principal Component Analysis(PCA) and Latent
Dirichlet Allocation (LDA). The comparison result on dif-
ferent feature extraction methods is shown in Figure 7. It
can be found from the result that LDICDL which is based
on using the stacked denoising autoencoder outperforms
other methods. Moreover, in order to show the effect of the
combination of MF and SDAE, we compare it with MF and
SDAE, respectively. The result is shown in Figure 8. The
result demonstrates that the combination of MF and SDAE
outperforms than single method (MF or SDAE). We also
compared different regularization methods (L1, L21 and L2)
on matrix factorization. The results are shown in Figure 9,
and the L2 norm obtains the best performance.
Fig. 4. The AUROC of LDICDL, SIMCLDA and MFLDA by using ten-fold
cross validation.
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
Fig. 5. The AUPR of LDICDL, SIMCLDA and MFLDA by using ten-fold
cross validation.
Fig. 6. Number of correctly retrieved known lncRNAdisease associa-
tions for specified rank thresholds based on ten-fold cross validation.
3.4 De novo test
In order to validate the performance of LDICDL in identify-
ing potential association for new diseases, the de novo test is
conducted in the experiment. The de novo test removes all
known associations with lncRNAs from each disease ias the
test set each time. The potential associations between lncR-
NAs and disease iare predicted based on feature informa-
tion. The results of AUROC and AUPR are shown in Figures
10 and 11, respectively. The LDICDL achieves the highest
AUC and AUPR (0.8917 and 0.1666). Compared with other
methods, our method is at least 0.09 higher than other
methods in AUC (SIMCLDA 0.7923 and MFLDA 0.5952)
Fig. 7. The AUROC of LDICDL, PCA, LDA and NMF by using ten-fold
cross validation.
Fig. 8. The AUROC of MF, SDAE and SDAE+MF by using ten-fold cross
Fig. 9. The AUROC of L1, L21 and L2 in MF by using ten-fold cross
and 0.04 higher than other methods in AUPR (SIMCLDA
0.1270 and MFLDA 0.0398). It demonstrates that our method
is superior to other methods in prediction performance of de
novo test. Figure 12 shows the number of correctly retrieved
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
known lncRNA-disease associations. It can be found that
LDICDL outperforms than other methods for top 10 to top
Fig. 10. The AUROC of LDICDL, SIMCLDA and MFLDA by using de
novo cross validation.
Fig. 11. The AUPR of LDICDL, SIMCLDA and MFLDA by using de novo
cross validation.
3.5 The effects of parameters
In the SDAE, the feature information of lncRNA and disease
are reduced into subspace. To test the effect of feature
dimension k, we conduct the ten-fold cross validation by
changing the feature dimension from 50 to 250 by increasing
50 each time. The result is shown in Figure 13. It is observed
that the LDICDL achieves the best performance when the
feature dimension is equal to 100. Therefore, 100 is applied
for the feature dimension kin experiment. All the three
hidden layers use non-linear activation functions tanh, and
the output layer uses the sigmoid. The number of neurons
of the auto-encoder are set to 130, 100 and 130, respectively.
Fig. 12. The number of correctly retrieved known lncRNAdisease asso-
ciations for specified rank thresholds based on de novo validation.
The hyperparameters are selected by random
search proposed in [35]. γand θare chosen from
[0.1,1,10,100,200,300,500,1000],γl:γnand γd:γn
are both chosen from [ 1:1, 100:1, 200:1, 300:1, 400:1, 500:1,
600:1, 700:1, 800:1, 900:1, 1000:1] [36], γwis chosen from
[0.1,0.3,0.5,0.7,0.9]. Then all hyperparameters are sampled
from a uniform distribution over a set of possible values. In
our experiment, we repeat the process 20 times to find the
optimum parameters. The parameters are set as follows:
θ= 100, γ = 300, γl:γn=γd:γn= 100 : 1, γw= 0.3.
Fig. 13. The effect of feature dimension k.
3.6 Case study
To demonstrate the capability of LDICDL in identifying the
potential lncRNA-disease associations, the osteosarcoma is
selected as case study. In case study, all known associations
between lncRNAs and osteosarcoma are treated as positive
samples. Then, the potential associations are predicted by
LDICDL. The predicted lncRNA of osteosarcoma is ana-
lyzed by consulting recent publication.
Osteosarcoma (osteogenic sarcoma) is a primary bone
malignancy that often affects children and young adults
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
(approximately 3.4% of all childhood cancers) [37], [38]. This
cancer is rare (less than 1% of all cancers diagnosed) and the
pathogenesis is unknown. With the development of multi-
agent chemotherapy regimens, the long-term survival rate
is improved from 65% to 70% [39]. Unfortunately, the prog-
nostic and treatment have no improved in several decades.
Table 1 shows the top 10 lncRNA of osteosarcoma predicted
by LDICDL. As shown in Table 1, 9 out of 10 lncRNAs are
proved to relate with osteosarcoma by recent literatures. The
H19 ranked in top 1 has been proved to be related with
osteosarcoma [40]. The rs217727 in H19 can increase IGF2
cord blood level which has significantly associated with
osteosarcoma. It has been proved that the long coding RNA
PVT1 ranked at top 2 can promote cell apoptosis and inhibit
cell proliferation, migration, and invasion in osteosarcoma
cells by regulating the expression of miR-195 [41]. The GAS5
ranked at top 3 can promote the expression of aplasia
Ras Homologue member I (ARHI) which suppresses Cell
Growth and Epithelial-Mesenchymal Transition in Osteosar-
coma by acting as molecular sponger to regulate the expres-
sion of miR-221 [42]. Recent research shows that the NEAT1
ranked at top 4 is significantly upregulated in osteosarcoma
cell lines which has close association with higher clinical
stage, distant metastasis and poorer prognosis. In addition,
it can inhibit Ecadherin expression and promote the metas-
tasis of osteosarcoma by relating with the G9a-DNMT1-Snail
complex [43]. The long coding RNA KCNQ1OT1 ranked
at top 5 has been discovered to associate with cell inva-
sion, migration, growth, proliferation and apoptosis through
enhancing WNT/beta-catenin signaling pathway activity
in osteosarcoma tissue [44]. It has been discovered that
AFAP1AS1 ranked at top 7 is significantly over-expressed
and the knockdown of AFAP1-AS1 can strikingly inhibits
the cell proliferation in osteosarcoma tissue. It demonstrates
that AFAP1AS1 can promote cell proliferation in osteosar-
coma via regulating miR-4695-5p/TCF4-β-catenin signaling
[45]. The long Noncoding RNA XIST ranked at top 8 has
been proved that it can bind to miR-320b and inhibit the
expression of miR-320b in osteosarcoma cells. The miR-320b
can target the Ras-Related Protein RAP2B and inhibit the
expression of RAP2B which is involved in cell proliferation
and invasion of osteosarcoma [46]. It has been revealed that
the CCAT1 rank at top 9 is upregulated in osteosarcoma
tissues and cells, and is related with the cell proliferation
and migration of osteosarcoma by binding to miR-148a
and regulating the signal pathway of phosphatidyl inositol
3-kinase interacting protein 1 (PIK3IP1) [47]. The recent
evidences present that long coding RNA SPRY4-IT1 ranked
at top 10 is over-expressed in osteosarcoma tissues and
SPRY4-IT1 knockdown strikingly inhibits cells proliferation
through inhibiting the expression of G1 [48]. In addition,
some interesting lncRNAs such as MIR155HG are found
by our method. The biological functions of these lncRNAs
are still unknown. It deserves for biologist to validate by
biological experiments.
It is well known than lncRNA is a kind of important non-
coding RNA with the length more than 200 nucleotides [49].
Accumulating evidences show that lncRNA plays critical
Top 10 lncRNA of osteosarcoma predicted by LDICDL
Rank LncRNA Evidence
1 H19 [40]
2 PVT1 [41]
3 GAS5 [42]
4 NEAT1 [43]
5 KCNQ1OT1 [44]
6 MIR155HG Unknown
7 AFAP1-AS1 [45]
8 XIST [46]
9 CCAT1 [47]
10 SPRY4-IT1 [48]
roles in various biological processes such as chromosome
dosage compensation, genomic imprinting, epigenetic regu-
lation, nuclear and cytoplasmic trafficking, cell proliferation,
cell differentiation, cell growth, cell metabolism and cell
apoptosis [50], [51]. In addition, increasing studies demon-
strate that lncRNA has close relationship with various dis-
eases including cancer [28]. Therefore, identifying LncRNA-
disease associations benefits to understand the pathogenesis
of disease, and further disease treatment and drug discov-
In this study, we have proposed a computational
method, called LDICDL, to predict LncRNA-disease asso-
ciations based on collaborative deep learning. In this ap-
proach, the lncRNA-disease association matrix is decom-
posed to lncRNA feature subspace and disease feature infor-
mation is encoded by using SDAE. Meanwhile, the lncRNA-
disease association matrix is decomposed to disease feature
subspace and lncRNA feature information is encoded by
using SDAE. Then, the lncRNA-disease association score
is predicted based on lncRNA feature matrix and disease
encode matrix, and disease feature matrix and lncRNA en-
code matrix, respectively. The final score of lncRNA-disease
association is calculated by averaging the scores. The results
demonstrate LDICDL is competitive and often performs
better than other state-of-the-art methods. In addition, our
method may also be used to other biological entity predic-
tion such as miRNA-disease association prediction [52], [53],
[54], drug-target interaction prediction [55] and disease gene
prediction [56].
This work was partially supported by the National Natu-
ral Science Foundation of China (Nos. 61702122, 61963004
and 61972185), the Natural Science Foundation of Guangx-
i (Nos. 2017GXNSFDA198033 and 2018GXNSFBA281193),
the Key Research and Development Plan of Guangxi
(No. AB17195055), the foundation of Guangxi University
(Nos. 20190240 and XBZ180479), the Innovation Project
of Guangxi Graduate Education (No. YCSW2020020), the
Natural Science Foundation of Yunnan Province of China
(No. 2019FA024), the Hunan Provincial Science and Tech-
nology Program (No. 2018WK4001), the scientific Research
Foundation of Hunan Provincial Education Department
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9
