ArticlePDF Available

PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction

June 2015
BMC Medical Informatics and Decision Making 15(1):43

June 2015
15(1):43

DOI:10.1186/s12911-015-0166-2

Source
PubMed

License
CC BY 4.0

Authors:

Lawrence Wing-Chi Chan

The Hong Kong Polytechnic University

Ying Liu

Cardiff University

Helen Law

The Hong Kong Polytechnic University

Show all 12 authorsHide

Similarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue. We collected and studied 112 abdominal computed tomography imaging examination reports from four hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC), were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report was characterized by an ontological feature vector and there were totally 6216 vector pairs. We optimized the modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector. Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver Operating Characteristics (ROC) analysis. The Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The clinical terms "Dysplastic nodule", "nodule of liver" and "equal density (isodense) lesion" were found the top three image findings associated with HCC in PubMed. Our findings suggest that the optimized similarity measure with specific term weighting to EHRs can improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with equal and generic term weighting approaches.

Projection of image finding terms to feature concepts in SNOMED CT “is-a” hierarchy. Part of the “is-a” hierarchical relationships is illustrated with three examples demonstrating the rules to determine the semantic distances. Four image finding terms: “cirrhosis”, “hepatic fibrosis”, “splenomegaly” and “fatty liver” are considered. The level-4 concepts are regarded as feature concepts. In this case, feature concepts: “liver finding”, “abdominal organ finding” and “fatty liver” are involved. a The term “cirrhosis” at level-7 is the descendant of “liver finding”. Their semantic distance is 3 because there are three “is-a” links between them. b The semantic distance between “hepatic fibrosis” and “liver finding” is 2. c The term “splenomegaly” is not a descendant of “liver finding” but the descendant of “abdominal organ finding”. Thus, the semantic distance between “splenomegaly” and “liver finding” is infinity and that with “abdominal organ finding” is 2. Finally, the term “fatty liver” at level 4 is also a feature concept and the semantic distance is 0

…

A schematic view of the method. Step 1: Manual extraction of the image finding terms and their corresponding synonyms from the reports. Step 2: The concepts of the image finding terms defined in SNOMED CT were identified by using UMLS Terminology Services. Step 3: Edge counting of the semantic distances between the extracted terms and the level-4 feature concepts. Step 4: The feature concepts are weighted by (Step 4a) generic term weighting approach and (Step 4b) specific term weighting approach. Step 5: The feature vectors are generated. Step 6: Similarity scores between feature vectors are calculated by modified direction cosine

…

Plot of AUROC against the value of k. The accuracy of inter-patient HCC co-occurrence prediction increases when k is between 0 and 2 and saturates at the level of 0.735 when k further increases

…

Comparison of term weighting approaches. AUROCs and the 95 % CIs of the equal, generic and specific term weighting approaches are summarized here

…

Figures - uploaded by Lawrence Wing-Chi Chan

Content may be subject to copyright.

Content uploaded by Lawrence Wing-Chi Chan

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

RES E A R C H A R T I C L E Open Access

PubMed-supported clinical term weighting

approach for improving inter-patient similarity

measure in diagnosis prediction

Lawrence WC Chan

,YingLiu

, Tao Chan

, Helen KW Law

, SC Cesar Wong

,AndyPHYeung

,KFLo

,SWYeung

KY Kwok

, William YL Chan

,ThomasYHLau

and Chi-Ren Shyu

Abstract

Background: Similarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems

provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases.

Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based

on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are

equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting

approach supported by PubMed search engine to address this issue.

Methods: We collected and studied 112 abdominal computed tomography imaging examination reports from four

hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC),

were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term

weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report

was characterize d by an ontological feature vector and there were totally 6216 vector pairs. We optimized the

modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector.

Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and

their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver

Operating Characteristics (ROC) analysis.

Results: The Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting

approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the

performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The

clinical terms “Dysplastic nodule”, “nodule of liver” and “equal density (isodense) lesion” were found the top three

image findings associated with HCC in PubMed.

Conclusions: Our findings suggest that the optimized similarity measure with specific term weighting to EHRs can

improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with

equal and generic term weighting approaches.

* Correspondence: wing.chi.chan@polyu.edu.hk

Department of Health Technology and Informatics, Hong Kong Polytechnic

University, Hung Hom, Kowloon, Hong Kong

Full list of author information is available at the end of the article

Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and

reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain

Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,

unless otherwise stated.

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43

DOI 10.1186/s12911-015-0166-2

Background

The huge amount of clinical data managed by the elec-

tronic health record (EHR) system potentiate case-based

decision support where the reference cases are retrieved

based on their similarity with the current case of inte rest

[1, 2]. To measure the inter-patient similarity consist-

ently, the feature vector model has been established by

transforming the clinical information of EHRs, including

laboratory test findings, medical image s and diagnostic

reports, to vector elements systematically [3–6].

The transformation of textual information, suc h a s

image findings , to feature vector requires the support

of a medical ontology [5, 6]. Systematized Nomencla-

ture of Medicine (SNOMED) Clinical Terms (CT) is a

collection o f clinical terms that are organized as con-

cepts and linked in a hierarchy with “is-a” or inverse

“is-a” relationships [7–10]. Concepts at a particular

level of the hierarchical structure are selected as the fea-

ture concepts. The edge count along the path connecting

a term in EHR and a feature concept in the “is-a” hier-

archy represents their semantic distance [3–5, 11, 12].

The ontological feature vector contains numerical ele-

ments, each of which is inferred by integrating the

semantic distances from all the EHR terms to a feature

concept. It has been proved that the ontological ve c tor

model significantly outperforms the simple string

matching in predicting inter-patient co-occurrence of

subclinical disorder [12].

Euclidean distance and direction cosine are two com-

monly used similarity measures but preserve dif ferent

properties. Direction cosine mea sures the similarity ac-

cording to the angle between two feature vec tors only

but Euclidean distance considers the magnitudes of two

vectors in addition to the angle. With such property,

Euclidean distance is more sensitive to the absolute dif-

ference between two EHRs than dire ction cosine. For

high dimensional vector model, they achieved similar

accuracy in neare st neighbour que ries. Howe ver, the

direction cosine is more computationally efficient than

Euclidean distance because the ontological vectors

usually have a large number of zero element s in the

information retrieval applications , expediting the

computation of direction cosine. Identifying similar

examination reports for diagnosis prediction requires

exhaustive search in imaging exa mination database.

As the database is assumed to host a huge number

of eligible reports, the efficiency for computing the

similarity score of an eligible report with the query

report becomes very crucial.

The modified direction cosine (mDC) wa s de veloped

by Chan et al. (2011) t o preserve the advantageous

properties of both Euclidean distance and direction co-

sine and extend the applications to low dimensional

vectormodel[12].InmDC,thefeaturevectoris

augmented by a regularization constant of unity to

acquire the property of Euclidean distance and main-

tain the computational efficiency of direction cosine

[12]. Numerical overflow that happens for direction

cosine can be avoided because the length of the feature

vector will never be c lose to zero due to the inclusion

of regularization constant in mDC. However, it is still

questionable if the performance of mDC can be opti-

mized against different values of this regula riza tion

constant.

The feature conce pts of the above-mentioned ve ctor

model were equally weighted . I n fac t, clinical terms are

unequally associated to a particular disea se. For ex-

ample, hepatic ne crosis and cirrhosis are common

image findings in the computed tomographic scan of

HCC patients. However, “hepatic necrosis” is more

spatially associated with cell death p henomenon in the

simultaneous growth of HCC than “cirrhosis” that re-

veals a fibrotic condition following cell death in HCC.

Thus, term weighting, which has been w ell established

in bioinformatics, should be applied to improve accuracy

of semantic measure or remove unrelated terms [13, 14].

The disease of interest in th is work is HCC, on e of

the ten most common cancers in the world [15]. Ab-

dominal tomographic scan plays an important role in

the diagnosis of HCC because the images can show pat-

terns characterizing the pathophysiology of HCC [16].

Such patterns, after observ ed by radiologists, will be

recorded as findings in the image examination report.

In this work , two novel weighting approaches, namely

generic term weighting and specific term weighting, are

proposed to improve the performance of the onto-

logical vector model in predicting inter-patient HCC

co-occurrence. The performances of these two ap-

proaches were compared with reference to the baseline

approach of equal term we ighting , in which all feature

concepts were equally weighted and t he independent

constant has already been optimized.

The generic and specific term weighting approaches

were implemented ba sed on the systematic search of

PubMed, a huge database indexing biomedical journal

articles. We assumed that a term is highly related to a

particular disease if the chance for co-mentioning the

term, the disease of interest and their synonyms in the

abstracts of the articles is high. The highly weighted

terms identified by this work can also be used to index

the report s for reminding the clinicians of follow-up

using other clinical tests.

Methods

Clinical data collection

Under the criterion that liver is the region of interest for

HCC, 112 image reports of abdominal computed tomog-

raphy examinations were collected retrospectively from

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43 Page 2 of 8

the Ra diology Departments of four local hospitals in

Hong Kong. HCC or liver metasta ses were r eported in

59 cases a nd no abnormality detec ted (NAD) in the

other 53 reports. The age range of the patients was

from 4 to 88 at the ti me of data retrieval. The patients

were de-identified b y using a randomly generated

unique ID. The personal information, includin g name,

identity card number, telephone number and address,

were removed from the report s by third party clinic al

personnel before d ata were collected by the research

team. Human Subject Ethics Approval has been ob-

tained from th e Hong Kong Polytechnic Unive rsity

(HSEAR S20140710 002).

Image finding term extraction

The clinical terms of image findings related to HCC

were identified and extracted manually from the reports

by five practicing radiographers (Authors : APHY, KFL ,

SWY, KYK , WYLC). They learnt the structure, content

and use of SNOMED C T from the Unified Medical

Language System (UMLS). The extraction of clinical

terms was supervised and validated by a radiologist

(Author: TC) and two profes sorial staff with anatomy

and radiography background (Authors: HKWL, T YHL).

The definitions of clinical terms and their synonyms

are standardized by SNOMED CT and unified to con-

cepts by UML S Terminology Se rvices (license code:

NLM-0315126310) where a unique concept ID is

assigned to each concept. The concept s for all the ex-

tracted image finding terms w ere identified. For the

equal and generic term weighting approaches, the iden-

tified concepts were mapped to the corresponding fea-

ture concepts according to SNOMED CT and the

mapped feature concepts and their synonyms were used

for weighting. For the specific term weighting approach,

the extracted terms and their synonyms were u sed for

weighting.

Ontological vector model

The relationship between concepts is defined by the “is-a”

hierarchical tree of SNOMED CT, which consists of levels

of concepts. As the concepts of the extracted terms exist

at different levels, the reports can be consistently com-

pared if the extracted terms are projected to the concepts

at a particular level, which are referred to feature concepts.

The level-4 concepts were chosen as feature concepts in

this work because level-4 provides an optimal classifica-

tion granularity for accurate patient matching [12].

Let f

, m, d

and n be the i

feature concept, the

number of feature concepts , the j

concept extracted

from a report and the number of concept s extracted

from the report respe ctively. The semantic distance be-

tween f

and d

is defined as s

∈ [0,∞]. The value of s

is determined subject to three rules. (see Fig. 1)

1. If d

is the descendant of f

, then s

is the number of

“is-a” link from d

to f

2. If d

is not the descendant of f

, then s

= ∞.

3. If d

is the same as f

, then s

=0.

For each report, a feature vector, given by [a

…,a

, δ], was generated. δ represents a regularization

constant whose value is equal to 10

-k

where k is a non-

negative integer; a

∈. [0,1] represents a vector element

associated with the i

feature concept and is obtained

by the following formula.

ﬃﬃﬃﬃ

1 þ min

j¼1…n

ð1Þ

where p

is the conditional probability of the i

feature

concept given the occurrence of HCC. The value of a

indicates the relatedness of the i

feature con cept with

the image finding terms of a report. The ability of the

feature vector in characterizing a repor t can be modu-

lated by p

. When the value of p

is zero, the effect of the

feature concept on the similarity score is fuy re-

pressed. When the value of p

is one, the effect of the i

feature concept on the similarity score is fully promoted.

Similarity measure

The similarity score betwe en two reports was calculated

by using direction cosine of their f eature vectors , Q

and D.

sim Q DðÞ¼

Q ⋅D

ð2Þ

where “⋅” is the inner product of two ve ctors and |x|. is

the length of a vector x. The similarity score ranges

from 0 to 1. When the similarity score tends to 0, the

vectors Q and D are more dissimilar to eac h other.

When the similarity score tends to 1, they are more

similar to e ach other. To improve inter-patient similar-

ity measure for HCC co-occurrence predictio n, this

work aims to establish a PubMed-supported approa ch

for estimating more precisely the conditional probabil-

ity p

. The implem entation of the inter-patient HCC

co-occurrence prediction is illustrated in Fig. 2.

Optimization of similarity measure

The similarity measure was optimized by determining

its ma ximum performance in predicting HCC co-

occurrence among different values of k. Eqterm weight-

ing (i.e. p

= 1) is considered a s the baseline for the

optimization. The choice of k is of crucial importance

when the extracted terms a re particularly few or even

none. If k tends to infinity (δ ≈ 0), the similarity score

will be unstable and probably undefined due to the tiny

magnitude of feature vector. If k is equal to 0, the

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43 Page 3 of 8

similarity score w ill be dominated by the value of δ ir-

respective of the reports’ content. The accuracy of the

similarity measure in predicting HCC co-occurrence

was plotted against the k value. We determined a value

of k, at which the accuracy attains maximum according

to the trend of the plot. Besides the equal term weight-

ing, the optimal value of k wa s applied to feature vector

for establishing generic and s pe cific term w eighting

approaches.

Generic term weighting

According to equation (1), the feature concepts are

weighted by a panel of p

, which is defined as the condi-

tional probability of the i

feature concept given HCC.

Literature search was performed by using PubMed and

the numbers of abstracts listed in the search results were

used for the estimation of p

. Generic term weighting

was implemented by applying directly the following

formula.

i; generic

of abstracts containing A OR A

OR A

OR…



AND B OR B

OR B

OR…



of abstracts containing B OR B

OR B

OR…



ð3Þ

Fig. 1 - Projection of image finding terms to feature concepts in SNOMED CT “is-a” hierarchy. Part of the “is-a” hierarchical relationships is illustrated with

three examples demonstrating the rules to determine the semantic distances. Four image finding terms: “cirrhosis”, “hepatic fibrosis”, “splenomegaly” and

“fatt y liver” are considered. The level-4 concepts are regarded as feature concepts. In this case, feature concepts: “liver finding”, “abdominal organ finding”

and “fatty liver” are involved. a The term “cirrhosis” at level-7 is the descendant of “liver finding”. Their semantic distance is 3 because there are three “is-a”

links between them. b The semantic distance between “hepatic fibrosis” and “liver finding” is 2. c The term “splenomegaly” is not a descendant of “liver

finding” but the descendant of “abdominal organ finding”. Thus, the semantic distance between “s plenom egaly ” and “liver finding” is inf inity and that

with “abdominal organ finding” is 2. Finally, the term “fatty liver” at level 4 is also a feature concept and the semantic distance is 0

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43

Page 4 of 8

where A and A

are the i

feature concept and its n

synonym respectively; B and B

represent HCC and its

synonym respe ctively. Using this approach, the calcu-

lated weights of feature concepts were the same among

different reports although their descendent terms ex-

tracted from the reports are different.

Specific term weighting

In specific term weighting approach, we seched PubMed

for the abstracts containing the extracted terms and

HCC. The conditional probability of the m

extracted

term given HCC, q

, were estimated by the followin g

formula.

¼ of abstracts containing ½ðC OR C

OR C

OR…Þ

AND B OR B

OR B

OR…





of abstracts containing B OR B

OR B

OR…



ð4Þ

where C and C

. are the m

extracted term and its n

synonym respectively; B and B

represent HCC and its

synonym respectively. We assumed that the condi-

tional probability of the i

feature concept given HCC is

equal to the average of the conditional probabilities of

its N descendent terms extracted from a report given

HCC. The value of p

was calculated by the following

formula.

i; specific

þ q

þ …

ð5Þ

Note that the weighting of feature vector elements is

dependent of the report content. In contrast to the gen-

eric term weighting where the weights don’t change

across reports, the weights of the same feature concept

estimated by specific term weighting approach may dif-

fer from patient to patient.

Statistical analysis

The receiver operating characteristic (ROC) analysis was

performed to the results of inter-patient HCC co-

occurrence predicted by equal, generic and specific term

weighting approaches. For each approach, the ROC

curve was plotted and the Areas under the ROC curve

(AUROC) indicated the accuracy of the prediction, i.e.

the probability of correctly classifying a pair of reports

into same diagnosis (both are HCC; both are NAD) or

different diagnosis (one is HCC and the other is NAD).

In addition to the comparison with the area under the

chance diagonal, AUROCs were compared with each

other to determine an approach with the best performance

Fig. 2 - A schematic view of the method. Step 1: Manual extraction of the image finding terms and their corresponding synonyms from the

reports. Step 2: The concepts of the image finding terms defined in SNOMED CT were identified by using UMLS Terminology Services. Step 3:

Edge counting of the semantic distances between the extracted terms and the level-4 feature concepts. Step 4: The feature concepts are

weighted by (Step 4a) generic term weighting approach and (Step 4b) specific term weighting approach. Step 5: The feature vectors are generated.

Step 6: Similarity scores between feature vectors are calculated by modified direction cosine

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43 Page 5 of 8

and the statistical significance of the observed differences

were also indicated [17, 18].

Results

Feature extraction and report pair formation

We extracted 38 image finding terms from 112 examin-

ation reports (59 HCC and 53 NAD cases). These terms

are uniquely defined by 38 concepts in UMLS and were

projected to 36 feature concepts at level-4 of SNOMED

CT “is-a” hierarchy. The reports were paired up to form

6216 non-redundant pairs, in which 3089 pairs are

matches, i.e. (HCC,HCC) or (NAD,NAD), and 3127 pairs

are mismatches, i.e. (HCC,NAD) or (NAD,HCC).

Optimization of similarity measure

Equal term weighting was considered as baseline for op-

timizing the similarity measure. ROC analysis of inter-

patient HCC co-occurrence prediction was performed

for different values of k. Fig. 3 shows the plot of AUROC

against k. It was found that the accuracy increases for k

between 0 and 2. For k > 2, the AUROC reaches a con-

stant level. Thus, we chose k = 10 for all the term weight-

ing approaches.

Estimation of conditional probabilities

In generic term weighting, abstracts were retrieved by

PubMed search for each feature concept and its syno-

nyms. The count of abstracts containing a feature concept

or its synonyms ranges from 1 to 427154. By incorporat-

ing HCC and its synonyms to the search criteria, the

abstract count was further reduced. The conditional prob-

ability of a feature concept for generic term weighting is

defined as the ratio of these two counts.

In specific term weighting, abst racts were retrieved by

PubMed search for each extracted term and its syno-

nyms. The count of abstracts containing a fe ature con-

cept or its synonyms ranges from 1 to 195708. The

abstract count is further reduced by adding HCC and its

synonyms to the search criteria. The ratio of these two

counts was projected to the corresponding feature con-

cepts. The conditional probability of a feature concept

for specific term weighting is defined as the average of

the ratios across all of its descendent terms extracted

from a report. The values of conditional probabilities

were computed and saved in Excel files (See Additional

files 1, 2, 3, 4).

Comparison of term weighting approaches

The AUROCs and the 95 % confidence intervals (95 %

CIs) of equal, generic and specific term weighting ap-

proaches are shown in Table 1. It was found all three

approaches outperformed the random rater significantly

(p < 0.01). When compared to equal term weighing ap-

proach (AUROC = 0.735), the performance was signifi-

cantly improved by specific term weighting approach

(AUROC = 0.743, p < 0.01) but was significantly worsen

by generic term weighting approach (AUROC = 0.728,

p < 0.01). The conditional probabilities of the extracted

image finding terms given HCC, derived by the specific

term weighting approach, were sorted in descending

order. The top ten image finding terms are listed to-

gether with their conditional probabilitie s in Table 2.

Discussion

Health records ontologically similar to new suspected

case support clinical decision with evidence of the dis -

ease. The reliability of such ontology-similarity-based

case retrieval algorithm depends on the choices of inter-

patient similarity measure and ontological vector model.

It has been proved that modified Direction Cosine (mDC)

avoids the problem of numerical overflow and preserves

the same properties as Euclidean distance does [12]. How-

ever, weighting of the ontological vector was not consid-

ered in the previous studies and it remained unknown if

the performance of mDC can be improved by adjusting

the weights associated with the feature concepts. It was

shown that the performance of the similarity measure

was substan tially improved by setting an extremely

small regularization constant, 10

−10

.Suchsettinghelps

maintain the similarity scores discriminative for com-

paring health records that have very few or e ven no ex-

tracted clinical terms.

Fig. 3 - Plot of AUROC against the value of k. The accuracy of

inter-patient HCC co-occurrence prediction increas es when k is between

0 and 2 and saturates at the level of 0.735 when k further increases

Table 1 - Comparison of term weighting approaches. AUROCs

and the 95 % CIs of the equal, generic and specific term

weighting approaches are summarized here

Term weighting approach AUROC 95 % CI

Equal term weighting 0.735 (0.724, 0.746)

Generic term weighting 0.728 (0.717, 0.739)

Specific term weighting 0.743 (0.732, 0.754)

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43 Page 6 of 8

In generic term weighting, the median and the 10th

percentile of the counts of retrieved abstracts containing

the feature concepts or their synonyms are 4204 and

82.9 respe ctively. In specific term weighting, the median

and the 10th percentile of the counts of retrieved ab-

stracts containing the extracted terms or their synonyms

are 2689.5 and 46.5 respe ctively. The sample sizes of the

retrieved abstracts are large enough to support the esti-

mation of conditional probabilities.

In comparison to equal term weighting, the perform-

ance was imp roved by specific term weighting approach

but worsened by generic term weighting approach. It

implied that the weighted feature vector elements do

not necessarily give better performance but the way,

through which we derived the weights, is crucial for im-

proving performance. In generic term weighting, the fea-

ture concepts at level-4, instead of the clinical terms

extracted from the reports, were used for PubMed

search. The weights are associated with the level-4 con-

cepts only and remain unchanged across different re-

ports. Moreover, the level-4 concepts are not specific

enough to provide reliable results of PubMed search for

estimating the conditional probabilitie s. Specific term

weighting used the extracted terms directly for PubMed

Search. The search results are more reliable for estimating

the conditional probabilities due to the higher granularity

of concepts provided by the extracted terms. Although the

conditional probabilities of the descendent extracted terms

are averaged to generate the weights of feature vector ele-

ments, the keywords for PubMed Search are dynamically

dependent of the report contents and the weights become

more specific.

The high weights of feature concepts dominating the

similarity score are attributed by their descendent terms

extracted from the reports. In Table 2, the top three

image finding terms (conditional probabilities) are “dys-

plastic nodule” (0.934), “nodule of liver” (0.513) and

“equal density (isodense) lesion” (0.438). The a ssoci-

ation of these image findings with HCC is supported by

Sakamoto [19] stating that small equivocal lesions, i.e.

dysplastic nodules, detected by imagin g examination of

liver are regarded a s a precursor of HCC. For the cases

with such image finding but no abnormality detected,

we suggest to index t hem a s “high risk ” so that close

follow-up can be recommended to those patients.

As the conditional probability of the most relevant

image finding “Dysplastic nodule” (0.934) is greater than

ten times of that of the eighth image finding “Hepatic fi-

brosis”, 0.082, only the top seven features are significantly

contributed to the diagnosis prediction performance. The

features other than these top seven features are associated

with negligible weights and have negligible effect on the

prediction. Therefore, the generic and specific term weight-

ing approaches are analogous with feature selection that

makes the vector model more parsimonious with respect

to the number of available cases.

As the numbers of PubMed abstracts are dynamic, the

term weighting results may change from time to time.

In our future studies, it is suggested to enhance the

ontological ve ctor model by incorporating more algo-

rithmic elements from information content model,

which spans an essential dimension of assessing the se-

mantic similarity [20].

Besides the image examination report, laboratory test

findings, such as Alpha fetoprotein (AFP) level and

Child-Pugh score, are important features for the diagno-

sis of HCC. As the electronic health record (EHR) inte-

grated the image examination report and laboratory test

report, the feature vector can be augmented to cover la-

boratory test finding s [21]. The same weighting ap-

proach and similarity measure can also be applied to

such augmented feature vector.

Our development of clinical term weighting approach

not only improved the inter-patient similarity measures

for diagnosis prediction. In fact, this method may be used

to identify large cohorts of patients with similar disease

presentation for retrospective treatment efficacy analysis.

It may also facilitate the identification of targeted patient

cohorts for prospective interventional studies.

Conclusions

The performance of inter-patient similarity measure was

significantly improved by specifically weighting the ele-

ments of the ontological feature vector. PubMed search

was applied to estimate the weights. Early HCC markers,

including dyspla stic nodule, nodule of liver, and equal

density lesion, were identified by PubMed search as image

findings that are strongly associated with HCC.

Table 2 - Top ten image finding terms. The PubMed search

results indicated that some image finding terms were

co-mentioned with HCC very frequently in the abstracts of

biomedical journal articles. The conditional probability of

“Dysplastic nodule” (0.934) is t he highest among all the

extracted terms

Rank Image finding Conditional probability

1 Dysplastic nodule 0.934

2 Nodule of liver 0.513

3 Equal density (isodense) lesion 0.438

4 Nodular hyperplasia of liver 0.329

5 Solitary necrotic liver nodule 0.259

6 Portal vein thrombosis 0.209

7 Space occupying lesion of liver 0.175

8 Cirrhosis of liver 0.170

9 Hepatic fibrosis 0.082

10 Nontraumatic hemoperitoneum 0.064

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43 Page 7 of 8

Additional files

Additional file 1: Generic Term Weighting Computation.

Additional file 2: Specific Term Weighting Computation.

Additional file 3: Similarity Scores with Generic Term Weighting.

Additional file 4: Similarity Scores with Specific Term Weighting.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LWCC and YL have made substantial contributions to conception and design

of the study and have prepared and revised the manuscript. APHY, KFL,

SWY, KYK and WYLC have made substantial contributions to the acquisition,

image finding extraction, analysis and interpretation of data. TC, HKWL,

SCCW and TYHL have contributed the domain knowledge to approving the

image findings and their relationship to liver diseases. CRS has contributed

to verifying the ontological method and revising critically the manuscript.

All authors read and approved the final manuscript.

Acknowledgements

This research was supported by the RGC General Research Fund “PolyU

5118/11E: Clinical Decision Support using Biomedical Ontology and

Literature Supported Patient Similarity for Diagnostic and Prognostic

Pattern Discovery from Electronic Health Records”.

Author details

Department of Health Technology and Informatics, Hong Kong Polytechnic

University, Hung Hom, Kowloon, Hong Kong.

Institute of Mechanical and

Manufacturing Engineering, School of Engineering, Cardiff University, Cardiff

CF24 3AA, UK.

Department of Diagnostic Radiology, University of Hong

Kong, Pokfulam, Hong Kong.

Informatics Institute and Department of

Computer Science, University of Missouri, Columbia, MO, USA.

Received: 24 December 2014 Accepted: 22 May 2015

References

1. Peter BJ, Lars JJ, Søren B. Mining electronic health records: towards better

research applications and clinical care. Nat Rev Genet. 2012;13(6):395–405.

2. Ceuster W, Smith B. Strategies for referent tracking in electronic health

records. J Biomed Inform. 2006;39:362–78.

3. Chan LWC, Benzie IFF, Liu Y, et al.: Is the inter-patient coincidence of a

subclinical disorder related to EHR similarity? 2011 IEEE 13th International

Conference on e-Health Networking, Applications and Services 2011:177–180

doi:10.1109/HEALTH.2011.6026738.

4. Sánchez D, Batet M, Isern D, Valls A. Ontology-based semantic similarity:

a new feature-based approach. Expert Systems With Appli cations.

2012;39(9):7 718–28.

5. Batet M, Sánchez D, Aida V. An ontology-based measure to compute

semantic similarity in biomedicine. J Biomed Inform. 2011;44:118–25.

6. Richesson RL, Andrew JE, Krischer JP. Use of SNOMD CT to represent clinical

research data: a semantic characterization of data items on case report

forms in vasculitis research. J Am Med Inform Assoc. 2006;13(5):536–46.

7. Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G.

Inter-patient distance metrics using SNOMED CT defining relationships.

J Biomed Inform. 2006;39(6):697–705.

8. Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG. Measures of semantic

similarity and relatedness in the biomedical domain. J Biomed Inform.

2007;40(3):288–99.

9. Wasserman H, Wang J. An applied evaluation of SNOMED CT as a clinical

vocabulary for the computerized diagnosis and problem list. AMIA

Symposium. 2003;699–703.

10. Lieberman MI, Ricciardi TN, Masarie FE, Spackman KA. The use of SNOMED

CT simplifies querying of a clinical data warehouse. AMIA Symposium.

2003;910.

11. Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity

measures across the gene ontology: the relationship between sequence

and annotation. Bioinformatics. 2003;19(10):1275–83.

12. Chan LWC, Liu Y, Shyu CR, Benzie IFF. A SNOMED supported ontological

vector model for subclinical disorder detection using EHR similarity. Eng

Appl Artif Intell. 2011;24:1398–409.

13. Falda M, Toppo S, Pescarolo A, Lavezzo E, Camillo BD, Facchinetti A, et al.

Argot2: a large scale function prediction tool relying on semantic similarity

of weighted Gene Ontology terms. BMC Bioinformatics. 2012;13:1–9.

14. Pesquita C, Faria D, Falcao AO, Lord P, Couto FM. Semantic similarity in

biomedical ontologies. PLoS Comput Biol. 2009;5(7), e1000443.

15. Page AJ, Cosgrove DC, Philosophe B, Pawlik TM. Hepatocellular carcinoma:

diagnosis, management, and prognosis. Surg Oncol Clin N Am.

2014;23(2):289–311.

16. Kamel IR, Liapi E, Fishman EK. Multidetector CT of hepatocellular carcinoma.

Best Pract Res Clin Gastroenterol. 2005;19(1):63–89.

17. Hanley JA, Mcneil BJ. The meaning and use of the area under a receiver

operating characteristic (ROC) curve. Radiology. 1982;143:29–36.

18. Hanley JA, Mcneil BJ. A method of comparing the areas under receiver

operating characteristic curves derived from the same cases. Radiology.

1983;148:839–43.

19. Sakamoto M. Early HCC: diagnosis and molecular markers. J Gastroenterol.

2009;44:108–11.

20. Zhou Z, Wang Y, Gu J. A new model of information content for semantic

similarity in WordNet. Second International Conference on Future

Generation Communication and Networking Symposia. 2008;2008:85–9.

21. Gottlieb A, Stein GY, Ruppin E, Altman RB, Sharan R. A method for inferring

medical diagnoses from patient similarities. BMC Medicine. 2013;11:194.

Submit your next manuscript to BioMed Central

and take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color ﬁgure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at

www.biomedcentral.com/submit

Chan et al. BMC Medical Informatics and Decision Making (2015) 15:43 Page 8 of 8

Intégration d’ontologies dans la classification parallèle de données médicales pour le diagnostic de lésions du foie

Thesis

Jun 2021

Rim Messaoudi

Le diagnostic des lésions hépatiques est une tâche complexe surtout lorsque les nodules détectés sont de petites tailles. Dans ce cas, il devient très difficile de connaitre leurs natures (tumeur bénigne ou maligne, type de lésion, etc). Dans des cas similaires, il faut répéter des examens cliniques pendant plusieurs mois pour voir l’évolution des masses hépatiques. Afin de mieux répondre à ces problèmes, il faut trouver des solutions informatiques qui servent à l’optimisation du diagnostic des tumeurs du foie. Dans le contexte de la classification des lésions hépatiques, nous avons développé une première approche ontologique (OntHCC) pour l’aide au diagnostic, à la stadification et au choix de traitement des tumeurs CHC (Carcinome Hépatocellulaire). Cette approche est fondée sur l’analyse d’images IRM de foies infectés et sur des rapports radiologiques. Par la suite, nous avons proposé une deuxième approche ontologique (MROnt) pour la modélisation de l’information médicale contenue dans les rapports radiologiques, dans le cadre du diagnostic et de suivi de tumeurs du foie. La détection automatique des tumeurs du foie nécessite un processus de diagnostic primaire en utilisant obligatoirement les images médicales (par exemple IRM ou scanner). Pour ce faire, nous avons intégré l’apprentissage profond dans la classification d’images IRM avec prise de contraste. Dans la suite de la thèse et afin d’accroitre la performance du processus de classification des images, nous avons intégré les connaissances sémantiques. L’objectif est de profiter de la base de connaissances offerte par les ontologies pour décrire les images médicales et fournir des informations sur les tumeurs détectées (par exemple, le type, la taille et le stade). En outre, notre approche consiste à développer un CNN multi-label afin de supporter les ontologies développées (OntHCC et MROnt). Nous montrons l’efficacité des approches et prototypes proposés dans ces travaux de thèse à travers des évaluations numériques comparatives et des études de cas.

Association Patterns of Ontological Features Signify Electronic Health Records in Liver Cancer

Article

Full-text available

Aug 2017

Electronic Health Record (EHR) system enables clinical decision support. In this study, a set of 112 abdominal computed tomography imaging examination reports, consisting of 59 cases of hepatocellular carcinoma (HCC) or liver metastases (so-called HCC group for simplicity) and 53 cases with no abnormality detected (NAD group), were collected from four hospitals in Hong Kong. We extracted terms related to liver cancer from the reports and mapped them to ontological features using Systematized Nomenclature of Medicine (SNOMED) Clinical Terms (CT). The primary predictor panel was formed by these ontological features. Association levels between every two features in the HCC and NAD groups were quantified using Pearson’s correlation coefficient. The HCC group reveals a distinct association pattern that signifies liver cancer and provides clinical decision support for suspected cases, motivating the inclusion of new features to form the augmented predictor panel. Logistic regression analysis with stepwise forward procedure was applied to the primary and augmented predictor sets, respectively. The obtained model with the new features attained 84.7% sensitivity and 88.4% overall accuracy in distinguishing HCC from NAD cases, which were significantly improved when compared with that without the new features.

Article

Full-text available

Nov 2016

Sherry-Ann Brown

A Bootstrap Machine Learning Approach to Identify Rare Disease Patients from Electronic Health Records

Article

Full-text available

Sep 2016

Rare diseases are very difficult to identify among large number of other possible diagnoses. Better availability of patient data and improvement in machine learning algorithms empower us to tackle this problem computationally. In this paper, we target one such rare disease - cardiac amyloidosis. We aim to automate the process of identifying potential cardiac amyloidosis patients with the help of machine learning algorithms and also learn most predictive factors. With the help of experienced cardiologists, we prepared a gold standard with 73 positive (cardiac amyloidosis) and 197 negative instances. We achieved high average cross-validation F1 score of 0.98 using an ensemble machine learning classifier. Some of the predictive variables were: Age and Diagnosis of cardiac arrest, chest pain, congestive heart failure, hypertension, prim open angle glaucoma, and shoulder arthritis. Further studies are needed to validate the accuracy of the system across an entire health system and its generalizability for other diseases.

A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine

Article

Full-text available

May 2022

Precision medicine can be defined as the comparison of a new patient with existing patients that have similar characteristics and can be referred to as patient similarity. Several deep learning models have been used to build and apply patient similarity networks (PSNs). However, the challenges related to data heterogeneity and dimensionality make it difficult to use a single model to reduce data dimensionality and capture the features of diverse data types. In this paper, we propose a multi-model PSN that considers heterogeneous static and dynamic data. The combination of deep learning models and PSN allows ample clinical evidence and information extraction against which similar patients can be compared. We use the bidirectional encoder representations from transformers (BERT) to analyze the contextual data and generate word embedding, where semantic features are captured using a convolutional neural network (CNN). Dynamic data are analyzed using a long-short-term-memory (LSTM)-based autoencoder, which reduces data dimensionality and preserves the temporal features of the data. We propose a data fusion approach combining temporal and clinical narrative data to estimate patient similarity. The experiments we conducted proved that our model provides a higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms.

Article

May 2021

A patient centric social network enables connecting patients suffering from the same disease or health conditions. The growth of such a network depends highly on the recommendations like ‘patient to patient’ and ‘caregivers to a patient’. From a patient’s point of view, discovering a person with similar conditions like him gives him some sort of solace thereby encouraging him to extend support to the other or lookout for support. In this paper, we have proposed a recommendation strategy for a group of patients in a social network, by deriving similarities in the unstructured clinical text found in their profiles. To carry out our task, we used physician notes of the MIMIC-III database, a publicly available large database comprising of de-identified health-related data as patient profiles. We computed the similarities between them and visualized possible social network graphs that resulted out of recommendations based on those similarities.

Identify Rare Disease Patients from Electronic Health Records through Machine Learning Approach

Conference Paper

Jul 2018

Vector-model-supported optimization in volumetric-modulated arc stereotactic radiotherapy planning for brain metastasis

Article

Mar 2017

Long planning time in volumetric-modulated arc stereotactic radiotherapy (VMA-SRT) cases can limit its clinical efficiency and use. A vector model could retrieve previously successful radiotherapy cases that share various common anatomic features with the current case. The prsent study aimed to develop a vector model that could reduce planning time by applying the optimization parameters from those retrieved reference cases. Thirty-six VMA-SRT cases of brain metastasis (gender, male [n = 23], female [n = 13]; age range, 32 to 81 years old) were collected and used as a reference database. Another 10 VMA-SRT cases were planned with both conventional optimization and vector-model-supported optimization, following the oncologists' clinical dose prescriptions. Planning time and plan quality measures were compared using the 2-sided paired Wilcoxon signed rank test with a significance level of 0.05, with positive false discovery rate (pFDR) of less than 0.05. With vector-model-supported optimization, there was a significant reduction in the median planning time, a 40% reduction from 3.7 to 2.2 hours (p = 0.002, pFDR = 0.032), and for the number of iterations, a 30% reduction from 8.5 to 6.0 (p = 0.006, pFDR = 0.047). The quality of plans from both approaches was comparable. From these preliminary results, vector-model-supported optimization can expedite the optimization of VMA-SRT for brain metastasis while maintaining plan quality.

Ontological features of Electronic Health Records reveal distinct association patterns in liver cancer

Conference Paper

Dec 2016

A method for inferring medical diagnoses from patient similarities

Article

Full-text available

Sep 2013
BMC MED

Clinical decision support systems assist physicians in interpreting complex patient data. However, they typically operate on a per-patient basis and do not exploit the extensive latent medical knowledge in electronic health records (EHRs). The emergence of large EHR systems offers the opportunity to integrate population information actively into these tools. Here, we assess the ability of a large corpus of electronic records to predict individual discharge diagnoses. We present a method that exploits similarities between patients along multiple dimensions to predict the eventual discharge diagnoses. Using demographic, initial blood and electrocardiography measurements, as well as medical history of hospitalized patients from two independent hospitals, we obtained high performance in cross-validation (area under the curve >0.88) and correctly predicted at least one diagnosis among the top ten predictions for more than 84% of the patients tested. Importantly, our method provides accurate predictions (>0.86 precision in cross validation) for major disease categories, including infectious and parasitic diseases, endocrine and metabolic diseases and diseases of the circulatory systems. Our performance applies to both chronic and acute diagnoses. Our results suggest that one can harness the wealth of population-based information embedded in electronic health records for patient-specific predictive tasks.

Argot2: A large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms

Article

Full-text available

Mar 2012
BMC BIOINFORMATICS

Predicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic scale, are necessary and urgent. In this scenario, the Gene Ontology has provided the means to standardize the annotation classification with a structured vocabulary which can be easily exploited by computational methods. Argot2 is a web-based function prediction tool able to annotate nucleic or protein sequences from small datasets up to entire genomes. It accepts as input a list of sequences in FASTA format, which are processed using BLAST and HMMER searches vs UniProKB and Pfam databases respectively; these sequences are then annotated with GO terms retrieved from the UniProtKB-GOA database and the terms are weighted using the e-values from BLAST and HMMER. The weighted GO terms are processed according to both their semantic similarity relations described by the Gene Ontology and their associated score. The algorithm is based on the original idea developed in a previous tool called Argot. The entire engine has been completely rewritten to improve both accuracy and computational efficiency, thus allowing for the annotation of complete genomes. The revised algorithm has been already employed and successfully tested during in-house genome projects of grape and apple, and has proven to have a high precision and recall in all our benchmark conditions. It has also been successfully compared with Blast2GO, one of the methods most commonly employed for sequence annotation. The server is freely accessible at http://www.medcomp.medicina.unipd.it/Argot2.

Ontology-based semantic similarity: A new feature-based approach

Article

Full-text available

Mar 2012
EXPERT SYST APPL

Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation.

Article

Full-text available

Jan 2003

Is the inter-patient coincidence of a subclinical disorder related to EHR similarity?

Conference Paper

Jun 2011

Electronic Health Record (EHR) provide clinical evidence for identifying subclinical diseases and supporting decisions on early intervention. Simple string matching cannot link up the conceptually similar but verbally different clinical terms in patient records, limiting the usefulness of EHR. A novel ontological similarity matching approach supported by the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) is proposed in this paper. The disease terms of a patient record are transformed into a vector space so that each patient record can be characterized by a feature vector. The similarity between the new record and an existing database record was quantified by a kernel function of their feature vectors. The matches are ranked by their similarity scores. To evaluate the proposed matching approach, medical history and carotid ultrasonic imaging finding were collected from 47 subjects in Hong Kong. The dataset formed 1081 pairs of patient records and the ROC analysis was used to evaluate and compare the accuracy of the ontological similarity matching and the simple string matching against the presence or absence of carotid plaques identified in ultrasound examination. It was found that the simple string matching randomly rated the record pairs but the ontological similarity matching provided non-random rating.

Hepatocellular Carcinoma Diagnosis, Management, and Prognosis

Article

Apr 2014

The successful management of hepatocellular carcinoma (HCC) requires a multidisciplinary approach, incorporating hepatologists, oncologists, surgical oncologists, transplant surgeons, and radiologists. With improvements in technology and better long-term outcomes data, management strategies for HCC have become more methodical and more successful. This article focuses on some of the most critical advances relating to carcinogenesis, surveillance, and management.

A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity

Article

Dec 2011
ENG APPL ARTIF INTEL

Electronic Health Records (EHR) form a valuable resource in the healthcare enterprise because clinical evidence can be provided to identify potential complications and support decisions on early intervention. Simple string matching, the common search algorithm, is not able to map a query to the similar health records in the database with respect to the medical concepts. A novel ontological vector model supported by the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) is proposed in this paper to project the disease terms of a health record to a feature space so that each health record can be characterized using a feature vector, giving a fingerprint of the record. The similarity between the query and database health records was measured by similarity measures of their feature vectors and string matching score respectively. Three types of similarity measures were considered in this study, namely, Euclidean distance (ED), direction cosine (DC) and modified direction cosine (mDC). Medical history and carotid ultrasonic imaging findings were collected from 47 subjects in Hong Kong. The dataset formed 1081 pairs of health records and ROC analysis was used to evaluate and compare the accuracy of the ontological vector model and simple string matching against the agreement of the presence or absence of carotid plaques identified by carotid ultrasound between two subjects. It was found that the score generated by simple string matching was a random rater but the ontological vector model was not. In other words, the degree of health record similarity based on the ontological vector model is associated with the agreement of atherosclerosis between two patients. The vector model using feature terms at the SNOMED-CT level 4 gave the best performance. The performance of mDC was very close to that of ED and DC but the properties of mDC make it more suitable for the retrieval of similar health records. It was also shown that the ontological vector model was enhanced by the support vector classifier approach.

Mining electronic health records: Towards better research applications and clinical care

Article

May 2012
NAT REV GENET

Clinical data describing the phenotypes and treatment of patients represents an underused data source that has much greater research potential than is currently realized. Mining of electronic health records (EHRs) has the potential for establishing new patient-stratification principles and for revealing unknown disease correlations. Integrating EHR data with genetic data will also give a finer understanding of genotype-phenotype relationships. However, a broad range of ethical, legal and technical reasons currently hinder the systematic deposition of these data in EHRs and their mining. Here, we consider the potential for furthering medical research and clinical care using EHR data and the challenges that must be overcome before this is a reality.

A New Model of Information Content for Semantic Similarity in WordNet

Conference Paper

Jan 2009

Information Content (IC) is an important dimension of assessing the semantic similarity between two terms or word senses in word knowledge. The conventional method of obtaining IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with actual usage in text as derived from a large corpus. In this paper, a new model of IC is presented, which relies on hierarchical structure alone. The model considers not only the hyponyms of each word sense but also its depth in the structure. The IC value is easier to calculate based on our model, and when used as the basis of a similarity approach it yields judgments that correlate more closely with human assessments than others, which using IC value obtained only considering the hyponyms and IC value got by employing corpus analysis.

Early HCC: Diagnosis and molecular markers

Article

Jan 2009

Michiie Sakamoto

Hepatocellular carcinoma (HCC) is one of the most common malignant tumors. HCC occurs mainly in patients with chronic liver disease such as in hepatitis B and C infection. These high-risk patients are closely followed up, and increasing numbers of small equivocal lesions are detected by imaging diagnosis. They are now widely recognized as a precursor or early stage of HCC and are classified as dysplastic nodules or early HCC. It is considered that early HCC is a key step in the process of HCC development and progression. However, the molecular mechanisms of early hepatocarcinogenesis are far from clear. Specific mutations of classical oncogenes or tumor suppressor genes have not been identified in early HCC so far. Recent progress in comprehensive analysis of gene expression is shedding some light on this issue. It has been reported that HSP70, CAP2, glypican 3, and glutamine synthetase could serve as molecular markers for early HCC. Further analysis is expected to evaluate their usefulness in routine pathological diagnosis including biopsy diagnosis and also as serum markers for early detection of HCC.

PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction

Abstract and Figures

Recommended publications

Fighting fake Chinese Herbal Medicines

Simulation model prepares cardiologists for surgeries

Small (≤2 cm) hepatocellular carcinoma in patients with chronic liver disease: Comparison of gadoxet...

Determining when impairment constitutes incapacity for informed consent in schizophrenia research

Development and validation of a screening instrument for bipolar spectrum disorder: The Mood Disorde...

Analysis of human tremor in patients with Parkinson Disease using entropy measures of signal complex...