ArticlePDF Available

Answering engineers' questions using semantic annotations

Authors:

Abstract and Figures

Question–answering (QA) systems have proven to be helpful, especially to those who feel uncomfortable entering keywords, sometimes extended with search symbols such as +, *, and so forth. In developing such systems, the main focus has been on the enhanced retrieval performance of searches, and recent trends in QA systems center on the extraction of exact answers. However, when their usability was evaluated, some users indicated that they found it difficult to accept the answers because of the absence of supporting context and rationale. Current approaches to address this problem include providing answers with linking paragraphs or with summarizing extensions. Both methods are believed to be sufficient to answer questions seeking the names of objects or quantities that have only a single answer. However, neither method addresses the situation when an answer requires the comparison and integration of information appearing in multiple documents or in several places in a single document. This paper argues that coherent answer generation is crucial for such questions, and that the key to this coherence is to analyze texts to a level beyond sentence annotations. To demonstrate this idea, a prototype has been developed based on rhetorical structure theory, and a preliminary evaluation has been carried out. The evaluation indicates that users prefer to see the extended answers that can be generated using such semantic annotations, provided that additional context and rationale information are made available.
Content may be subject to copyright.
Answering Engineers’ Questions Using Semantic Annotations
Paper Number: 06
Question-Answering (QA) systems have proven to be helpful especially to those who feel
uncomfortable entering keywords, sometimes extended with search symbols such as +, *, etc. In
developing such systems, the main focus has been on the enhanced retrieval performance of search es,
and recent trends in QA systems centre on the extraction of exact answers. However, when their
usability was evaluated, some users indicated that they found it difficult to accept the answers due to
the absence of supporting context and rationale. Current approaches to address this problem include
providing answers with linking paragraphs or with summarising extensions. Both methods are
believed to be sufficient to answer questions seeking the names of objects or quantities that have only
a single answer. However, neither method addresses the situation when an answer requires the
comparison and integration of information appearing in multiple documents or in several places in a
single document. This paper argues that coherent answer generation is crucial for such questions and
that the key to this coherence is to analyse texts to a level beyond sentence annotations. To
demonstrate this idea, a prototype has been developed based on Rhetorical Structure Theory and a
preliminary evaluation has been carried out. The evaluation indicates that users prefer to see the
extended answers that can be generated using such semantic annotations, provided that additional
context and rationale information are made available.
Keywords: Information retrieval, question-answering, semantic annotations, natural language
processing, Rhetorical Structure Theory
1
1 Introduction
Electronic documents are one of the most common information sources in organisations and
approximately 90% of organisational memory exists in the form of text-based documents. It has been
reported that 35% of users find it difficult to access information contained in these documents and at
least 60% of the information that is critical to these organisations is not accessible using typical search
tools (80-20 software, 2003). There are two main problems. The first is that there is simply too much
information to be searched. The second is that differences exist between the indexing approaches used
in search engines and the way people perceive and access the contents of documents. This means that
most users find searching for relevant information difficult since it is not possible for them to enter
keywords in a sufficiently precise form for them to be used effectively by current search engines.
Current retrieval systems accept queries from users in the form of a few keywords and retrieve a long
list of matching documents. Users then have to sift through the documents to locate the information
they are looking for. For simple fact-based queries, e.g. What material should be used for this turbine
blade? most users can enter satisfactory keywords that rapidly find the required answers. However,
keyword-based systems cannot cope with questions involving: (1) comparing, e.g. What are the
advantages and disadvantages of using aluminium compared with steel for this saucepan? (2)
reasoning, e.g. How safe are commercial fights? and (3) extracting answers from different documents
and fusing them into a complete answer. In order to obtain useful answers to these types of question,
users currently have to expend considerable time and effort.
In previous research in this domain, automatic query expansion and taxonomy-based searches have
been proposed. Query expansion improves on keyword searching for short questions that require only
a few documents to be located to provide the answers. Taxonomy-based searches require the
hierarchical organisation of domain concepts. This relieves the user of having to enter accurate
2
keywords as information is searchable by selecting concepts. However, considerable effort is required
to create the classifications and maintain the hierarchy. For example, Yahoo 1 employs around 50
subject experts to maintain their directories and indexes. Not many organisations can afford to adopt
such a strategy and current automatic classifiers only achieve 60-80 % accuracy (Mukherjee & Mao,
2004). This means that in automatic classification around a third of the documents will be missed or
misclassified. Manual classifications are subjective, and often based on the few sentences which are
deemed important for individual indexers. Clearly when information is sought that does not match the
indexing, such a taxonomy-based approach does not help.
Offering users the facility to enter their queries in natural language might greatly enhance current
search engine interfaces and be particularly helpful for less experienced users who are not adept at
advanced keyword searches. Recent research into natural language-based retrieval systems has mainly
pursued a Question-Answering (QA) approach. QA systems have successfully retrieved short answers
to natural language questions instead of directing users to a number of documents that might contain
the answers. Typically, QA uses a combination of Information Retrieval (IR) and Natural Language
Processing (NLP) techniques. IR techniques are used to pinpoint a subset of documents and to locate
parts of those documents that are related to the questions. NLP techniques are used for extracting brief
answers. There is great interest in developing robust and reliable QA systems to exploit the enormous
quantity of information available on-line in order to answer simple questions such as: Who is the
president of USA?. This is relatively easy since straightforward NLP techniques, such as pattern-
matching, are sufficient to answer it. The numerous occurrences and multiple reformulations of the
same information available on the Web greatly increase the chance of finding answers that are
syntactically similar to the question (Brill et al., 2001). On the Intranets run by organisations, the
quantity of information, although large, is much less than on the Web and the number occurrences and
reformulations less exhaustive.
1 http://www.yahoo.com
3
Unlike users searching on the Web, those in organisations are likely to ask questions that are not easily
answered by simply looking up syntactic similarities in databases. Such answers can be considered
complex and may need to be inferred from different parts of a single text or from multiple texts. The
initial question posed by a user may be ill-formed, i.e. too broad or too specific, making it difficult for
the retrieval system to interpret and hence further interaction with the user is often necessary.
Answering such complex questions has received little attention within QA research. To answer such
questions, the issues of correctly interpreting the question and presenting the answer must both be
addressed. When presenting answers to complex questions, it is not sufficient just to present the
answer, i.e. the user needs additional supporting information with which to assess the trustworthiness
of the answer. For example, hemlock poisoning and drinking hemlock can both be considered correct
answers to the question: How did Socrates die? (Burger et al., 2001). Users who have some
background knowledge about poisoning might appreciate a brief answer, as they do not want to read
through a long text to extract the answer themselves. Other users with less background knowledge
might prefer to see where the answers came from and want to read more text explaining the answers
before accepting them. Searching precisely for How did Socrates die? on the Web using Google
produces around 748,000 results. It is clear that with a question such as this, answers with varying
formulations were mentioned in numerous documents or in many parts of a single document. Answers
appearing as multiple instances need to be fused efficiently in order to reduce repeated information.
Apart from what is simply stated in the question, the user’s real intention might have been to know the
reason why Socrates chose to die by poisoning. For questions that are ill-formed, it is important that
answers are extended with related information that increases a user’s understanding of the answers. It
is therefore necessary to research suitable ways of presenting answers in a clear and coherent manner,
and providing sufficient supporting information to allow users to decide whether or not to trust the
answers.
4
A combination of two approaches, both using semantic relations, is therefore proposed for presenting
clear and coherent answers. First, duplicate information is removed. Second, answers are synthesised
from multiple occurrences, and then justified by adding supporting information. In order to achieve
this, semantic analysis is necessary of both the questions and the texts from which the answers are to
be extracted. Figure 1 shows an example of how these ideas might be implemented2. The initial
question posed by the user, i.e. What triggered the engine fire alarm in Boeing 727-217?, was aimed
at understanding the cause or causes of the fire alarm going off. The question itself is ambiguous
since it does not specify the engine nor the date of the flight. Assuming that the system correctly
understands the question, it can return the failure of the number 2 engine starter as the cause.
However, for complex incidents such this case, it is difficult to pinpoint particular causes and regard
them as independent of the remaining information. That is, there might be more than one cause for a
single incident, and some causes may depend on other causes. For example, in the example above
there are other contributing causes, e.g. the start valve had re-opened because of a short circuit or the
engine starter had failed due to over-speeding. There could also be consequent effects, e.g. residual
smoke and fire damage to the structure surrounding the number 2 engine. For presenting answers like
this, it is necessary to consider the actual information needs of the user from a knowledge-level. For
example, when faced with an unexpected observation (problem), engineers first assess whether or not
the problem is serious and requires diagnosis. Diagnosis normally proceeds by finding reasons or
causes that impact on the observation. Once the causes are identified, then it is likely that solutions
are required to prevent recurrences. The impact of making various hypothetical changes is likely to be
assessed, along with the advantages and disadvantages of the various solutions proposed. This
example demonstrates that the information needs for users in specific organisations are complex,
requiring not only sophisticated retrieval processing but also the presentation of retrieval results in as
natural a form as possible. Successful synthesis and presentation of such answers depend on the
ability to compare information on a semantic-level such that it produces a chain of semantic relations.
2 An example text is from http://www.tsb.gc.ca/en/reports/air/1996/a96o0125/a96o0125.asp
5
Figure 1. An example of generating a coherent and justified answer
A prototype of semantic-based QA system implementing these ideas has been developed. The
underlying approach is based on identifying various discourse relationships between two spans, such
as cause-effect and elaboration. These types of relationship are derived from a computational
linguistic theory known as Rhetorical Structure Theory (RST). This theory defines a set of rhetorical
relations and uses them to describe how the sentences are combined to form a coherent text (Mann &
Thomson, 1988). As such, RST analysis discovers relationships within a sentence or among
sentences. Since sentences are not usually comprehensible when isolated, this approach provides a
more sophisticated content analysis. These annotations are then used to remove duplicate information
and synthesise answers from multiple occurrences. Finally these answers are justified by adding
supporting information. As information is compared at the semantic-level rather than at the string
level, it is possible to determine whether a causal link exists between two events. This paper mainly
addresses questions related to causal inference and describes a prototype system to test the ideas. The
proposed system is targeted at the engineering area, however the methodology is generic and can be
applied to other domains.
2 Literature Review
Users find QA systems helpful as they do not need to go through each retrieved document to extract
the information they need. Until recently, most QA systems only function ed on specifically created
collections and on limited types of questions, but some attempts have been made to scale systems to
open domains like the Web (Kwok et al., 2001). Experiments show that in comparison to search
engines, e.g. Google, QA systems significantly reduce the effort to obtain answers. AskJeeves3 and
3 www.ask.com
Question: What triggered the engine fire alarm in Boeing 727-217?
The failure of the number 2 engine starter caused a fire as the investigation reveals that
residual smoke and fire damage to the structure surrounding the number 2 engine.
Extended answer:
The engine start valve master switch did not protect the complete circuit, → {causing a short
circuit in the engine wiring harness, a new voltage must have been subsequently available},
allowed the number 2 engine start valve to re-open, {causing the number 2 engine
starter to over speed, because it was being rotated by the air turbine with no load on the
starter}, → causing the failure of the starter as evidenced by a two- by three-inch hole in the
side of the starter gear case, and the air turbine had come out through the retaining screen.
the retaining screen.
6
Brainboost4 are the examples of Internet search engines with a QA interface, but neither provides
fully-fledged QA capabilities. AskJeeves relies on hand-crafted question templates that enable
automatic answer searches, and returns lists of documents instead of intelligently extracting brief
answers. Brainboost supplies answers in plain English, but the correctness of its answers is limited to
specific questions only, and for many questions neither relevant texts nor exact answers are found.
Currently, developments in QA have focused on improving system performance through more
advanced algorithms for extracting exact answers (Voorhess, 2002) A project organised by the US
National Institute of Standards and Technology (NIST) has established benchmarks for evaluating QA
systems. Two new QA system response requirements were introduced in 2002: (1) to return an exact
answer; and (2) to return only one answer. Previous requirements had allowed systems to return five
candidate answers, and the answers could be between 50 and 250 bytes in length. This demonstrates
that current QA systems are focusing on retrieving exact answers to factual questions. For these
systems, performances of over 80% correct answers have been reported. However, user evaluations
consistently highlight the fact that the usability is hindered by the absence of context information that
would allow them to evaluate the trustworthiness of an answer. For example, user studies conducted
by Lin et al. (2003) suggest that users prefer to see the answer in a paragraph rather than as an exact
answer, even for a simple question like: Who was the first man on the Moon?
In comparison to open-domain QA systems, e.g. Web, domain-specific QA systems have the
following additional characteristics (Diekema, et al., 2004; Hickle et al, 2004; Nyberg et al., 2005):
limited amount of data is available in most cases
domain-specific terminologies have to be dealt with
user questions are complex.
4 www.brainboost.com
7
Shallow text processing methods are mostly used for QA systems on the Web due to Web redundancy,
which means that similar information is stated in a variety of ways and repeated in different locations.
However, in the engineering domain, suitable data can be scarce and answers to some questions might
only be found in a few documents and these may exhibit linguistic variations from the questions.
Therefore, intensive NLP techniques that can analyse unstructured texts using semantics and domain
models are more appropriate. Domain ontologies and thesauri are required to define domain-specific
terminologies. Hai and Kosseim (2004) used information in a manually created thesaurus to rank
candidate answers by annotating the special terms occurring both in the queries and candidate
answers. They also used a concept hierarchy for measuring similarities between a document and a
query. Ontologies have also been used for expanding terms in the questions and clarifying ambiguous
terms (Nyberg et al., 2004). Since ontologies can be regarded as storing information as triples e.g.
person work-for organisation, users can submit questions linked to such classes and relations in
natural language (Lopez et al., 2005). For example, the question: Is John an employee of IBM? can be
answered by recognising: (1) John is a person; (2) IBM is an organisation; and (3) employee is
inferred from ‘someone who works for an organisation’. Questions other than factual ones need
special attention and a profile of the user can help to improve system performance (Diekema et al.
2004). Suitable ways of presenting answers and how much information should be provided must also
be determined. To address these problems, some researchers proposed interactive QA. To that end,
some QA systems rephrase the questions submitted to confirm whether or not users’ information needs
have been correctly identified (Lin et al., 2003). Advanced dialog implementations have also been
suggested. However, Hickl et al. (2004) argue that the decomposition of user questions into simpler
ones with which answer types are associated could be a more practical solution than a dialog
interaction.
8
Generally, semantic annotations are treated as a similar task to named-entity recognition that identifies
domain concepts and their associations in a single sentence ( Aunimo & Kuuskoski, 2005). This paper
extends the notion of semantic annotation to include discourse relations that identify what information
is generated from the extended sequences of sentences. This goes beyond the meanings of individual
sentences by using their context to explain how the meaning conveyed by one sentence relates to the
meaning conveyed by another. A discourse model is essential for constructing computer systems
capable of interpreting and generating natural language texts. Such models have been used: to assess
student essays with respect to their writing skills; to summarise scientific papers; to extend the
answers to a user’ question with important sentences; and to generate personalised texts customised
for individual reading abilities (Bosma, 2005; Burstein et al., 2003; Teufel, 2001; Williams & Reiter,
2003).
3 Engineering Taxonomy
Retrieval systems in engineering need to employ domain-specific terminologies that differentiate
between specific and general terms. Specific terms are essential to understand users’ questions and
characterise documents in relation to those questions. Some general terms have specific meanings in
engineering. For example the term shoulder has multiple meanings in a dictionary, and in most cases,
it means the part of the body between the neck and the upper arm. However, in engineering, it can
refer to a locating upstand on a shaft. Domain taxonomies arrange such terms into a hierarchy. An
example of an engineering taxonomy is the Engineering Design Integrated Taxonomy (EDIT) and this
taxonomy is used throughout this paper. It consists of four root concepts (Ahmed, 2005):
The design process, i.e. a description of the different tasks undertaken at each stage of product
development, e.g. conceptual design, detail design, brainstorming.
The physical product to be produced, e.g. assemblies, sub-assemblies and components, using
part-of relations. For example, a motor and shaft of a motor.
9
The functions that must be fulfilled by the particular component or assembly. For example,
one of the functions of a compressor disc is to secure the compressor blade and one of the
functions of a cup is to contain liquid.
The issues, namely the considerations, that a designer must take into account when carrying
out a design process, e.g. considering the unit costs or production processes.
A detailed description of the development of a generic methodology to develop engineering design
taxonomies that was used for EDIT can be found in (Ahmed et al., 2005).
4 The Proposed Method
In general, a document can be encoded with various semantics, e.g. customer reviews or causal
accounts of engineering failures, and accessed by users who have very different interests. For
example in the case of product reviews by customers, negative and positive customer opinions are the
main messages for market researchers. On the other hand, designers are more interested in design-
related issues, comments and problems associated with engineering failures. It would therefore be
beneficial to include those semantics that facilitate searching for information in a way that reflects the
interests of the users. For example, for a designer whose task is to reduce fan noise, guidance on how
to minimise aerodynamic noise should be retrieved. On the other hand, if that designer is more
interested in using a specific method for noise reduction, then documents describing the methods
along with their advantages or disadvantages are more useful. With keyword-based indexing, it is not
feasible to extract such semantics since most natural language texts have annotations that are too basic
and no explicit descriptions of the concepts are available. Annotations are formal notes attached to
specific spans of text. Their complexity and representation depend on the mark-up language used.
10
The proposed method works as follows: (1) a document is annotated with a set of relations derived
from RST; (2) the document is classified with EDIT indexes; (3) the document is parsed using NLP
indexing techniques; (4) the RST-annotated document is converted into predicate-argument forms for
effective answer extraction; and (5) a user question is analysed using the same NLP technique. Steps
(1), (2), and (3) can proceed independently, but the step (1) must precede step (4).
4.1 Semantic annotations based on RST
Discourse Analysis (DA) is crucial for constructing computer systems capable of interpreting and
generating natural language texts. DA studies the structure of texts beyond sentence and clause levels,
and structures the information extracted from the texts with semantic relations. It is based on the idea
that well-formed texts exhibit some degree of coherence that can be demonstrated through discourse
connectivity, i.e. logical consistency and semantic continuity between events or concepts. This is in
contrast with most keyword-based indexing that exclusively addresses the sub-sentence level, omitting
the fact that sentences are inter-connected to create a whole text. In order to establish a more robust
and linguistically informed approach to identify important entities and their relations, a deeper
understanding is necessary.
Annotating a text with a discourse structure requires advanced text processing, linguistic resources
such as taxonomies, and, possibly, manual intervention by experts. It certainly increases the work
required to develop QA systems. However, if QA systems are only targeted at certain domains, where
a limited number of texts has to be searched, and experts are available to assist, then detailed linguistic
analysis is feasible. DA generates a discourse structure by defining discourse units, either at sentence
or clause level, and assigning discourse relations between the units. Discourse structures can reveal
various text features and attempts have been made to use them to identify important sentences that are
key to understanding the contents of documents (Kim et al., 2006b; Marcu, 1999). Discourse
11
structures can be used to compare units in multiple documents in order to evaluate similarities and
differences in their meanings, as well as to detect anomalies, duplications and contradictions.
Rhetorical relations are central constructs in RST and convey an author’s intended meaning by
presenting two text spans side by side. These relations are used to indicate why each of the spans was
included by the author and to identify the nucleus spans that are central to the purpose of the
communication. Satellite spans depend on the nucleus spans and provide supporting information.
Nucleus spans are comprehensible independently of the satellites. For example, consider the
following two text spans: (1) Given that the clutch was functional, and (2) it is unlikely that the engine
was driving the starter. A condition relation is identified, with span (2) being the nucleus. Satellite
span (1) is only used to define the condition in which the situation in span (2) occurs. These two spans
are coherent since the person who reads them can establish their relationship.
Rhetorical relations between spans are constrained in three ways: (1) constraints on a nucleus; (2)
constraints on a satellite; and (3) constraints on the link between a nucleus and a satellite. They are
elaborated in terms of the intended effect on the text reader. If an author presents an argument in a
text that is identified as an evidence relation, then it is clear that the author was intending to increase a
reader’s belief in the claim represented in a nucleus span by presenting supporting evidence in a
satellite span. Such relations are identified by applying a recursive procedure to a text until all
relevant units are represented in an RST structure (Taboada & Mann, 2006). The procedure has to be
recursive because the intended communication effect may need to be expressed in a complex unit that
includes other relations. The results of such analyses are RST structures typically represented as
trees, with one top-level relation encompassing other relations at lower levels.
It is difficult to determine the correct number of relations to be used and their types. In the simplest
domains only two relation types may be required, whereas some complex domains may require over
12
400 (Hovy, 1993). Hovy argued that taxonomies with numerous relation types represent sub-types of
taxonomies with fewer types. Some relation types are difficult to distinguish, e.g. elaboration and
example. If there are too many types, inconsistencies of annotation are likely. If there are too few, it
may not be possible to capture all the different types of discourse. Mann and Thompson (1988), for
example, listed 33 relation types to annotate a wide range of English texts. To reduce inconsistencies
of annotation, our method combines similar relation types and eliminates those that do not appear
frequently. A preliminary examination with sample engineering domain data from aircraft incident
reports (see Section 5.1) resulted in the following nine types: background, cause-effect, condition,
contrast, elaboration, evaluation, means, purpose, and solutionhood. Each of them is described
below, along with an example taken from the sample domain data, i.e. aircraft incident reports, using
(N) to indicate a nucleus span and (S) a satellite span.
Background
This type of relation is used to increase a reader’s background understanding (S) of the nucleus span
(N).
(S) While the helicopter was approximately 25 feet above ground level en route to Tobin Lake,
Saskatchewan, to pick up a bucket of water
(N) the engine fire warning light came on and the pilot saw smoke coming out of the engine
cowling.
Cause-Effect
This type of relation is used to link the cause in the nucleus span to the to the effect in the satellite
span or vice versa.
(N-S) Analysis of the fuel hose indicates that the steel braid strands failed
(S-N) as a result of chafing.
13
Condition
This type of relation is used to show the condition (S) under which a hypothetical situation (N) might
be realised.
(S) Given that the clutch was functional,
(N) it is unlikely that the engine was driving the starter.
Contrast
This type of relation is used to contrast incompatibilities between situations, opinions, or events and
there is no distinction between (N) and (S).
(N-S) It is considered likely that the fire was momentarily suppressed
(N-S) but because of the constant supply of fuel and ignition, it re-ignited after the retardant was
spent.
Elaboration
This type of relation is used to elaborate (S) on the situation in (N).
(N) The variable inlet guide vane actuator (VIGVA) hose, which provides fuel pressure to open
the variable guide vanes, was found pinched between the top of the starter/generator and the
impeller housing assembly.
(S) Further inspection of the pinched fuel hose revealed a hole through the steel braiding and
inner lining.
Evaluation
This type of relation is used to provide an evaluation (S) of the statement in (N).
(N) The second option, the engine start valve master switch,
(S) does not provide a positive indication to the flight crew of the start valve operation.
14
Means
This type of relation is used to explain the means (S) by which (N) is realised.
(N) The rest of the fire was extinguished
(S) using a fire truck that arrived on the site.
Purpose
This type of relation is used to describe the purpose (S) achieved through (N).
(N) At 13.5 flight hours prior to the occurrence, the starter/generator had been removed
(S) to accommodate the replacement of the starter/generator seal, then re-installed.
Solutionhood
This type of relation is used to link the problem (S) with the solution (N).
(N) The number 2 engine start control valve and starter were replaced, and the aircraft was
returned to service.
(S) It was determined that the number 2 engine starter had failed.
Figure 2 shows a screenshot of the RST annotation of the sample domain data. A software tool,
RSTTool, is used to complete the annotation (O’Donnell, 2000). RSTTool offers a graphical interface
with which annotators segment a given text into text spans and specify relation types between them. A
computer program written in Perl by the first author has been developed in order to automatically
extract the RST annotations stored by RSTTool. In the box at the bottom of Figure 2 can be seen the
decomposed text spans with the individual spans identified by square brackets. Above can be seen the
corresponding RST analysis tree.
15
[C-GRYC had been modified by the previous owner8 8, ] [Da Services Ltd 89,] [to incorporate a n engine start valve
master switch90]. [The modification was accepted by Transport C anada when the aircraft was imported into Canada
in 199291]. [The engine start valve master switch was put into the electrical circuit between the engine start switches
and the start valve cutout switches on the engine starter 92]. [It provides protection for the s tart circuit up to the start
valve cutout switch93].
Figure 2. Screenshot of RST analysis using RSTTool (O’Donnell, 2000)
It is common to use discourse connectives (or cue phrases) for automatic discovery of discourse
relations from texts. For example, by detecting the word but, a contrast relation between two adjacent
texts can be identified. This approach is easy to implement but can lead to a low coverage, i.e. the
ratio of correctly discovered discourse relations to the total number of discourse relations. A study by
Taboada and Mann (2006) showed that the levels of success using cue phrases ranged from 4% for the
‘summary’ relation to over 90% for the ‘concession’ relation. In order to improve the coverage,
machine learning methods have been used. Marcu et al. (2002) used Naive Bayesian probability to
generate lexical pairs that can identify relation types without relying on cue phrases. For example, the
approach can extract a ‘contrast’ when one text contains good words and another bad words, even
when but does not appear. Whereas this approach produces a good performance, the assumption that
the lexical pairs are independent of each other can lead to a considerable number of training sentences
being required, sometimes over 1 000 000. Although a low presence of cue phrases can lead to many
undiscovered relations, they can serve as a reference for annotators. Discourse text spans are inserted:
16
(1) at every period, semicolon, colon, or comma; and (2) at every cue phrase listed in Table 1.
Annotators first refer to the cue phrases to test whether the corresponding relation types can be used
for a given text. If no direct match is identified, then they select the closest one using their judgement.
Table 1 summarises cue phrases extracted from Knott and Dale (1995) and Williams and Reiter
(2003).
RST-annotated texts are converted into a predicate-argument structure, i.e. predicate (tag
1
:argument
1
,… tag
n
:argument
n
). Predicates represent the main verbs in sentences and tags include subjects,
objects and prepositions. For example, consider the following sentence: Analysis of the fuel hose
indicates that the steel braid strands failed as a result of chafing. For this sentence the ‘evidence’
relation type is used to annotate it as follows: evidence((indicate(subject:analysis of the fuel hose)),
(fail(subject:the steel braid strand, pp:as a result of chafing))).
Table 1. Cue phrases for identifying relation types
Relation types Cue phrases
Background With, probably
Cause-Effect Because, since, as, as a consequence, as a result, thus,
therefore, due to, lead to, consequently
Condition as long as, if…then, if, so long as, unless, until
Contrast although, by contrast, even though, however, though, whereas,
while
Elaboration also, in addition, in particular, for example, in general
Evaluation with, so, but, which, even so
Means by, with, using
Purpose in order to, for the purpose of
Solutionhood proposed solution, options
17
4.2 Semantic-based QA Description
4.2.1 Term indexing
In general, it is difficult to extract good index terms due to inherent ambiguity in natural language
texts. A term in a text, i.e. an alpha-numeric expression, can have different meanings depending on
the domain in which it is being used, and a term can appear more frequently in one domain than in
another. Publicly accessible dictionaries, e.g. WordNet (Miller et al., 1993) are good resources for
obtaining the meanings of the terms, both manually and automatically. For example, according to
WordNet, blade has nine meanings. One definition is: especially a leaf of grass or the broad portion
of a leaf as distinct from the petiole. However, another in the engineering domain is: flat surface that
rotates and pushes against air or water. Terms can also be used in different domains with the same
meaning. For example, certification does not have a different meaning in the engineering domain.
Most keyword-based search systems index a document with a list of keywords ranked with relevance
weightings. Whereas these keywords might be sufficient to describe superficially the contents of a
document, it is difficult to interpret the true message if their precise meanings are not established.
NLP, on the other hand, produces a rich representation of a document at a conceptual level. To
achieve human-like language processing, NLP includes a range of computational techniques for
analysing and representing natural texts at one or more levels of linguistic analysis (Liddy, 1998). It is
common to categorise such techniques into the six levels listed below, each of which has a different
analysis capability and implementation complexity (Allen, 1987). The application of NLP on a text
can be implemented at the simplest level, e.g. morphological level and then extended into a fully-
fledged pragmatic analysis that shows a superior understanding, but requires large resources and
extensive background information. In this paper, NLP processing includes the first five levels, i.e. it
excludes the pragmatic level.
18
Morphological level: component analysis of words, including prefixes, suffixes and roots, e.g.
using is stemmed into use.
Lexical level: word level analysis including a lexical meaning and a Part-Of-Speech (POS)
analysis, e.g. apple is a kind of fruit and is tagged as Noun.
Syntactic level: analysis of words in a sentence in order to determine the grammatical structure
of the sentence.
Semantic level: interpretation of the possible meanings of a sentence, including the
customisation of the meanings for given domains.
Discourse level: interpretation of the structure and the meaning conveyed from a group of
sentences.
Pragmatic level: understanding the purposeful use of language in situations particularly those
aspects of language which require world knowledge.
Figure 3 shows the steps of the indexing process. The text in the box at the bottom of Figure 2 is used
as an example.
Step 1: Pre-processing
Step 4: Term weighting
- Paragraph identification
- Sentence decomposition
- Term identifciation
- POS taggings
- Phrase identification
- Term normalisation
- Acronym identification
Step 2: Syntactic parse
Step 3: Lexical look-up
- Okapi method
Figure 3. Steps of the indexing process
Step 1: Pre-processing
19
One paragraph is identified in the example text, which is then decomposed into four sentences. The
first sentence is:
C-GRYC had been modified by the previous owner, DA Services Ltd, to incorporate an engine start
valve master switch.
Terms are identified as words lying between two spaces including full stop.
Step 2: Syntactic parse
The Apple Pie Parser (Sekine & Grishman, 2001) is used for a syntactic parse that tags POS and
identifies phrases. POS identifies not what a word is, but how it is used. It is useful to extract the
meanings of words since the same word can be used as a verb or a noun in a single sentence or in
different sentences. In a traditional grammar, POS classifies a word into eight categories: verb, noun,
adjective, adverb, conjunctive, pronoun, preposition and interjection. The Apple Pie Parser refers to
the grammars defined in the Penn Treebank to determine the POSs (Marcus et al., 1993). For
example, the first word C-GRYC is tagged as NNPX, i.e. proper single noun. The remain POSs for the
sentence above are shown below:
POS taggings: C-GRYC/NNPX had/VBD been/VBN modified/VBN by/IN the/DT previous/JJ
owner/NN DA/NNPX Services/NNPS Ltd/NNP to/TOINF incorporate/VB an/DT engine/NN start/NN
valve/NN master/NN switch/NN.
Phrase identification groups words grammatically, e.g. into Noun Phrases (NPs) such as { the previous
owner DA Services Ltd} and {an engine start valve master switch}.
Step 3: Lexical look-up
20
Each POS-tagged word is compared with WordNet definitions to achieve term normalisation.
Acronym identification extends an acronym found in a text fragment with its full definition. An
example of term normalisation is:
modifiedmodify
and of acronym identification is:
DADan-Air.
Step 4: Term weighting
Although it is possible to analyse the full contents of a document, this becomes computationally
expensive when the documents are large. For an effective retrieval, it is desirable to extract only those
portions of a document that are useful and to transform them into special formats. Text indexing
determines the central properties of the content of a text in order to differentiate relevant portions of
text from irrelevant ones. The quality of each index term is evaluated to determine if it is an effective
identifier of the text content. A relative importance weighting is then assigned to each index term. A
common approach is to index a document divided into paragraph-sized units. In this paper, the Okapi
algorithm is used (Franz & Roukos, 1994; Robertson et al., 1995). It weights (
jk
w
) a term (
kt
) in a
paragraph (
j
p
) as follows:
 
 
5.0
5.0log
*
_
*5.15.0
n
nN
c
lenave
Plen
c
w
jk
j
jk
jk
Equation (1)
where,
jk
c
is the frequency of the term (
k
t
).
N
is the total number of paragraphs in a dataset and
n
is the number of paragraphs that have contents containing the term (
k
t
).
)( j
plen
is the total
number of frequencies of all terms presented in a paragraph (
j
p
) and
ave len_
is the average number
of terms per paragraph. Using the term weighting method, the example sentence is stored into a vector
model, i.e. each term is associated with its calculated weighting.
21
4.2.2 Domain knowledge in QA
An engineering taxonomy such as EDIT is a useful means to identify domain-specific terms in a
document. The successful extraction of domain-specific terms can improve the accuracy of QA. For
example the answer to the question: What material should be used for this turbine blade is more easily
identified if Titanium is marked-up as a type of material. Among the four root concepts defined in
EDIT, only two are used: Issue and Product. These two concepts exhibit different characteristics.
According to Ahmed (Ahmed, 2005), issue categories are considerations designers must take into
account when carrying out a design process. These can be the descriptions of problems arising during
a product’s lifecycle or new design requirements to be satisfied. In contrast, product categories
comprise a hierarchy list of product names, decomposing an overall technical product or system into
smaller and smaller elements. Different techniques are therefore needed to handle them in the
documents. For issue categories, any technique that automatically classifies a document into pre-
defined categories is suitable. For product categories, the technique of Named-Entity (NE)
recognition is used. In the QA method proposed in this paper, the techniques developed by Kim et al.
(2006a; 2006b) are used. The technique for the classifying issue categories is described in (Kim et al.,
2006b) and the one for classifying product categories, using probability-based NE identifiers, is
described in (Kim et al., 2006a).
4.2.3 QA overview
Figure 4 shows the overall architecture of the proposed QA system. State-of-the-art QA systems can
achieve an accuracy of up to 80%, as demonstrated by recent tests undertaken using TREC datasets,
which mainly consist of newspaper documents (Voorhees, 2002). However, this level of performance
is not expected to be repeated in other environments. The questions in the above tests were carefully
constructed, i.e. no misspellings, and they were mostly factual and based on a single interaction with a
user, i.e. no dialogue.
22
The prototype system proposed in this paper does not aim at achieving better accuracy in question
analysis or in finding answers. Instead, its main objective is to demonstrate the efficiency of RST-
based annotations for coherent answer generation, i.e. Answer Generation, see step (5) in Figure 4.
Figure 4. Overall architecture of the proposed QA system
Each of the steps in Figure 4 will now be described.
Step 1: Question Analysis
The Question Analysis Module decomposes a question into three parts: (1) Question Word; (2)
Question Focus; and (3) Question Attribute. The Question Word indicates a potential answer type,
e.g. where, when, etc. The Question Focus is a word, or a sequence of words, that describe the user’s
information needs that are expressed in the question. The Question Attribute is the part of the question
that remains after removing the Question Word and the Question Focus. It is used to rank candidate
answers in a decreasing order of relevance. An example is given below.
23
Question: What were the consequences of the vibration of the starter/generator?
(1.1) Syntactic parse
POS: What|WP were|VBD the|DT consequences|NNS of|IN the|DT vibration|NN of|IN the|DT
starter|NN /|SYM generator|NN
Phrase identification: NPL{the consequences} PP{the vibration of the starter/generator}
(1.2) Question Word {what}, Question Focus {consequences}, Question Attribute {the vibration of
the starter/generator}
(1.3) EDIT indexes: <Product category=‘Starter_Ducting’> starter <Product
category=‘Electrical_Generator’> generator <Issue category=‘Vibrations’> vibration
(1.4) Relation type: effect
(1.5) Answer format: cause-effect(Question Attribute, <Answer>)
The Question Focus, i.e. consequences, for the example question above is matched with the effect in
the cause-effect relation type. Therefore, the possible answers should be the effects of the events
described in the Question Attribute. For an automatic matching, a semantic similarity between the
Question Focus and the relation type is computed using the method proposed by Resnik (1995). This
method is based on the number of the edges in a semantic hierarchy, e.g. WordNet, encounted between
two terms when locating them in the hierarchy.
Step 2: Answer Retrieval
The Answer Retrieval Module uses the Question Attribute identified by the Question Analysis Module
to select paragraphs that might contain candidate answers. A cosine-based similarity calculation is
used for ranking the selected paragraphs in order of relevance to the keywords that appear in the
Question Attribute.
24
 
t
i
t
i
i
ij
t
i
i
ij
j
w
w
ww
pqsim
1
2
1
1
,
*
*
2
Equation (2)
where,
j
p
is a given paragraph and
q
is a Question Attribute.
ij
w
is the weight of the term,
i
t
, in
the paragraph (
j
p
). The similarity value is normalised by the total weights of common words.
Step 3: Answer Extraction and Step 4: Answer Scoring
The Answer Extraction Module examines the paragraphs selected in the Answer Retrieval Module in
order to select text spans from which candidate answers can be extracted. The EDIT indexes,
Question Focus, and Question Attribute are used to determine whether the text spans contain the
answer. In doing so, it is necessary to measure how well they are related to the question. An overall
similarity between the text spans and the question is computed by summing following three similarity
scores: (1) the score reflecting whether a given text span is classified with the EDIT indexes; (2) the
score reflecting whether a given text span contains the Question Focus; and (3) the score reflecting the
degree of similarity between a given text span and the Question Attribute. They are summed as
follows:
).(_*))(1()(_*)(_*)( iiiij tsattrstsrststseditstss
Equation (3)
where
)( ij
tss
is the score of the text span,
i
ts
in the paragraph (
j
p
).
)10(
and
)10(
are used to normalise the score
)( ij
tss
to lie be between 0 and 1.
)(_ itsedits
is defined as:
Nindstsedits
positionsk
kii
)()(_
Equation (4)
25
where
)( ki
inds
is a Boolean indicating whether or not a given text span is classified with an EDIT
index number,
ki
ind
, positions is the set of matches against the EDIT indexes returned by the
Question Analysis Module, and N is the total number of elements in this set.
)(_ itsrsts
is a Boolean variable that is true if the RST annotation for the text span matches the
annotation for the question.
)(_ itsattrs
is defined as:
Mtstsattrs
n
m
mi
i
1
)()(_
Equation (5)
where
)( mi
ts
is a Boolean indicating whether or not a given term,
m
t
in the text span,
i
ts
is
matched with a term in the Question Attribute,
m
t
, and M is the total number of terms in the Question
Attribute. The scored text spans are sorted in decreasing order by value and those above a pre-defined
threshold selected.
Step 5: Answer Generation
For coherent answer generation, duplicate sentences or clauses should be removed. Two sentences or
clauses are recurrent if they are exactly equivalent or if they differ only in the level of generality. For
example, the following two clauses are equivalent: the starter had failed and the failure of the number
2 engine starter. On the other hand, sentence 1 below can be replaced by sentence 2 without loss of
information because sentence 2 subsumes the information in sentence 1.
Sentence 1: It is probable that a short circuit in the engine wiring harness allowed the number 2
engine start valve to re-open, causing the number 2 engine starter to over speed and subsequently fail.
Sentence 2: It is probable that a short circuit in the engine wiring harness allowed the number 2
engine start valve to re-open, causing the number 2 engine starter to over speed and subsequently fail,
resulting in an engine fire.
26
Automatic text summarisation systems employ various approaches to compare similar sentences that
have different wordings (Mani & Maybury, 1999). In general, these systems use the following two
steps to produce summaries from a document:
Step 1 identifies and extracts important sentences to be included in a summary.
Step 2 synthesises the extracted sentences to form a summary.
There are two common methods of synthesis in step 2. The non-extractive summary method
suppresses repeated sentences either by extracting a subset of the repetitions or by selecting common
terms. It then reformulates the reduced number of sentences to produce the summary (Barzilay et al.,
1999; Jing & McKeown, 2000). The extractive summary method focuses on the extraction of
important sentences and assembles them sequentially to produce the summary. The objective of these
automatic summarisation systems is to create a shortened version of an original text in order to reduce
the time spent reading and comprehending it. The objective of our proposed approach, on the other
hand, is to extend and synthesize text spans to allow the generation of coherent answers.
In the proposed approach, two text spans are compared to determine whether or not both are similar
using Equation (2). Text spans that have higher similarity values than the pre-defined threshold are
excluded. The algorithm of the Answer Generation Module is shown below as pseudo-code.
Variable definition:
answerList: chains of ‘cause and effect’ to be generated by the Answer Generation Module
textspanList: a list of text spans returned by the Answer Scoring Module
relationlinkedList: a list of text spans linked through the ‘cause and effect’ relation obtained
from the RST annotations.
27
t: one text span being examined and extracted from the textspanList
thresh: a pre-defined threshold used to compare similarity between two text spans
t1: temp variable, t2: temp variable
Initialisation:
answerList ← empty
Repeat (
t← retrieve(textspanList), thresh 0.8, t1empty, t2 ← empty.
If (answerList does not contain t)
THEN (build chains of ‘cause and effect’ for t and merge them with answerList
{
foreach t1 relationlinkedList(t)
{
foreach t2 ← answerList
{ compute similarity between t1 and t2
IF (similarity > thres) { NOTHING }
ELSE { update(answerList,t1) }
}
}
}
ELSE { NOTHING}
remove(textspansList,t)
t←retrieve(textspansList)
)
Until (textspanList is EMPTY)
28
5 Pilot study
This section presents a preliminary evaluation of the prototype QA system. The evaluation tests the
following two hypotheses: (1) the proposed system is efficient at extracting and presenting answers to
causal questions using relation types, and (2) the presentation of the synthesised answers helps users to
understand the retrieved results. The first hypothesis was evaluated by comparing the performance of
the prototype system against that of a standard QA system. The standard system is different from the
prototype system in the following ways:
Step 1: the Answer Format is revised as <Answer> Question Focus Question Attribute. Th is format
indicates that the system should find text spans that have syntactic variations with the Question Focus
and semantic similarities with the Question Attribute. Using the example question in Section 4.2.3, a
potential answer text span can be <Answer> is the consequence of <Question Attribute>.
Step 2: is not revised.
Step 3 and Step 4: the Equation (3) is revised as:
).(_)( iij tsattrstss
Step 5: is not used.
With the standard system, multiple instances were extracted without synthesising them. This
comparison examined whether the answer generation method described in Section 4.2 avoids repeated
information and generates coherent answers. The second hypothesis was evaluated by measuring user
performance in a simple trial.
29
5.1 Trial dataset
For the trial, three official aircraft incident investigation reports were downloaded from websites 5.
Although the incidents happened to three different aircraft types (Boeing 727-217, Bell 205A-1
helicopter, and Boeing 737-8AS), the incidents share a common cause, i.e. an in-flight engine fire.
Although the reports were written by different incident investigation teams, they share a broadly
similar terminology, e.g. emergency landing, engine starter valve, etc. After removing embedded
HTML tags and images, the average document length was 1820 words, or 24 paragraphs. Each
document was first indexed as described in Section 4.2.1, and RST annotations were applied by the
first author using RSTTool. A total of 1 94 relations were annotated. The total numbers for each
relation type were: Background = 20, Cause-Effect= 45, Condition = 16, Contrast = 30, Elaboration =
32, Evaluation = 24, Means = 10, Purpose = 12, and Solutionhood = 5.
5.2 The trial
Six Engineering graduate students and two members of the Engineering Department staff of
Cambridge University participated in the trial. A brief introduction to the trial and trial dataset was
given to the participants. Each participant was asked to answer multiple questions and their
performance and accuracy were measured. The trial consisted of reviewing the answers to three
questions, i.e. one for each incident report. These answers were split into two groups. The first group
was extracted and synthesized using the prototype QA system and the second was extracted using a
standard QA system. For a fixed period of time, the participants were instructed to read the answers
on-line for both systems (see Figure 5), and follow links to the original document if desired. After
this, a further list of questions related to the answers the users had just read was given to the users.
5 (1) http://www.tsb.gc.ca/en/reports/air/1996/a96o0125/ a96o0125.asp,
(2) http://www.tsb.gc.ca/en/reports/air/2002/A02C0114/A02C0114.asp,
(3) http://www.aaib.gov.uk/sites/aaib/cms_resources/dft_avsafety_pdf_029538.pdf
30
Their answers to these questions were used to test their understanding of the answers they had just
seen. In order to avoid the evaluation problems caused by the inclusion of incorrect answers, both
groups of answers were examined in order to verify that they were all true.
Figure 5. An example screenshot of the proposed system
Table 2 shows the three initial questions, along with the associated questions that were used to test the
users’ understanding.
Table 2: The three original questions along with their associated questions
31
Question 1: What triggered the engine fire alarm on the Boeing 727-217?
1. On which engine of the Boeing 727-217 was the fire alarm observed?
2. What were the consequences of the failure of the number 2 engine starter?
3. Why did the number 2 engine starter overspeed?
4. Did the starter valve of the number 2 engine close after the engine was started?
5. Why did the number 2 engine starter valve re-open?
6. How can we determine if the starter valve is open?
Question 2: What triggered the engine fire alarm on the Bell 205A-1 helicopter?
1. Why did the starter/generator start to vibrate?
2. What were the consequences the vibration of the start/generator?
3. Why was the hold-down nut at the 12 o’clock position left-out?
4. Was the engine fire alarm activated due to the abrasion of the cooling fan?
Question 3: What triggered the shut-down of the engine 2 on the Boeing 737-8AS?
1. Does this incident have the same engineering problem to the Boeing 727-217?
2. Did the failure of No 4 bearing in the number 2 engine contribute to the event?
3. What were the consequences of presence of the engine vibration?
5.3 Trial results
Answers to the three original questions shown in Table 2 were extracted and synthesized using the
method described in Section 4.2. The answers then were compared to the set of answers prepared in
Section 5.2. The threshold for the Answer Retrieval Module, i.e. the value for Equation (2), was set as
0.5 meaning that the paragraphs which had cosine-similarity values over 0.5 were selected. The
values for
and
specified in Equation (3) were set as 0.3 and 0.2, respectively. The threshold for
the Answer Scoring Module, i.e. the values for Equation (3) was set as 0.2.
32
The results are shown using two tables, i.e. Table 3 and Table 4. Table 3 summarises the results of the
‘cause and effect’ chains generated by the proposed QA. Table 4 compares the performance of the
proposed QA on retrieving correct text spans with that of the standard QA. Precision and recall were
used to measure the performance. In this paper, precision is defined as the percentage of the retrieved
text spans that are identified as right among the total number of retrieved text spans. Recall is the
percentage of the retrieved right text spans among the total number of right text spans.
Table 3. The overview of ‘cause and effect’ chains generated by the proposed QA
Num. of paragraphs Num. of text spans Depth of the chains
Question 1 17 12 4
Question 2 8 9 4
Question 3 11 6 2
The second column in Table 3 specifies the number of paragraphs returned by the Answer Retrieval
Module and the third one specifies the number of text spans returned by the Answer Extraction and
Scoring Modules. The fourth one specifies the depth of the chains, i.e. the number of cause and effect
nodes along the longest path from the root node down to the farthest leaf node in the chains. For
example, for the Question 1, one ‘cause and effect’ chain with a depth of 4 was generated by
synthesising and extending the 12 text spans extracted from the 17 paragraphs.
The following shows examples of correct and incorrect text spans for the Question 1.
Correct text spans:
(1) The failure of the number 2 engine starter resulted in an engine fire.
(2) The hazard associated with an engine fire caused by a starter failure was recognized and
addressed in AWD 83-01-05 R2.
33
(3) It is probable that a short circuit in the engine wiring harness allowed the number 2 engine start
valve to re-open, causing the number 2 engine starter to over speed and subsequently fail, resulting
in an engine fire.
Incorrect text spans:
(1) Because of the engine’s proximity to the elevator and rudder control systems, a severe in-flight fire
in the number 2 engine is potentially more serious than a fire in either the number 1 or 3 engine.
(2) Fire damage to the engine component wiring precluded any significant testing of the wiring
harness.
(3) Two fire bottles were discharged into the number 2 engine compartment; however, the fire
warning light remained on.
Table 4. Comparison of two QA systems for the task of retrieving correct text spans
STANDARD QA PROPOSED QA
Precision Recall Precision Recall
Question 1 0.45 0.5 0.67 1
Question 2 0.6 0.67 0.78 0.78
Question 3 0.67 0.57 0.83 0.79
Average 0.57 0.58 0.76 0.86
As shown in Table 4, on average, the proposed QA achieved 76% precision and 86% recall when
retrieving text spans for three questions. On the other hand, the standard QA achieved 57% precision
and 58% recall. This suggests that the proposed QA has considerable potential for extracting and
synthesizing answers to causal questions. The task of retrieving text spans is similar to the sentence
selection task in automatic text summarisation systems.
The text summarisation systems referred to earlier in this paper are by Barzilay et al. (1999) and by
Jing and McKeown (2000). In the context of multi-document summarisation, Barzilay et al. (1999)
34
focused on the generation of paraphrasing rules that were used to compare semantic similarity between
two sentences. They tested the rules for the task of identifying common phrases among multiple
sentences. The automatically generated common phrases were then reviewed by human judges. The
reviews identified 39 common phrases and among them the system correctly identified 29 of them. In
addition the identified phrases contained 69% of the correct subjects and 74% of the correct main verbs.
On average, the system achieved 72% accuracy.
Jing and McKeown (2000) carried out three evaluations. The first tested whether the automatic
summarisation system could identify a phrase in the original text that corresponds to the selected
phrase in a human-written abstract. When tested with 10 documents, the automatic system achieved
82% precision and 79% recall on average. The second evaluation tested whether the automatic system
could remove extraneous sentences, i.e. sentence reduction. The result showed that 81% of the
reduction decisions made by the system agreed with those of humans. The third evaluation tested
whether the automatic system could generate coherent summaries. The system achieved 6.1 points out
of 10, i.e. 61% accuracy for generating coherent summaries. Only the first evaluation focused on the
sentence selection.
The performance of our proposed QA when retrieving correct text spans with 76% of precision and
86% recall is slightly better than the work by Barzilay et al. (1999), i.e. 72% accuracy, and comparable
to the work by Jing & McKeown (2000), i.e. 82% precision and 79% recall.
On average, out of 13 questions, the users in the first group, i.e. those who read the answers given by
the proposed QA, incorrectly answered two out of the 13 questions, whereas the users in the second
group incorrectly answered five questions. On average, the users in the first group completed the trial
within 19 minutes, and the users in the second group completed the trial within 25 minutes. Five of
the 13 questions were correctly answered by all the users in the first group, whereas just one question
35
was correctly answered by all the users in the second group. All users in the second group incorrectly
answered question number 6 ‘how can we determine if the starter valve is open’.
Although the preliminary results are encouraging, it is difficult to draw firm conclusions from this trial
for the following reasons: (1) the low number of users in the two groups; and (2) the number of causal
relations in the trial dataset was small. Users in the first group expressed the opinion that the
synthesized chains of ‘cause and effect’ description were helpful in understanding the causes of the
three incidents.
6 Conclusion and further work
Researchers in computational linguistics have speculated that the relation types defined in RST can
improve the performance of QA systems when answering complex questions. The class of causal
reasoning questions, either predictive or diagnostic, is one that we have shown might be better
answered using these relation types. The reason for this is that the majority of causal questions can be
answered in multiple ways, i.e. it is difficult to pinpoint particular causes and regard them as
independent of the remaining information. Generally, identifying the causes of a specific event
involves creating chains of ‘cause and effect’ relations. Without a deep understanding of all the
relevant information contained in a document, it is not possible to derive such causal chains
automatically. It is still not known how users would like such causal chains to be presented, and it is
not suggested that the interface proposed in this paper is necessarily the best. The contribution of this
paper is the demonstration of a method for synthesizing causal information into coherent answers.
The source information can be scattered over different parts of a single document or over multiple
documents. The pilot study indicated that the proposed QA was more efficient at extracting and
synthesizing answers when compared with standard QA, i.e. 19 percentage points increased precision
and 28% percentage points increased recall. The pilot study also indicated that the synthesized chains
36
of ‘cause and effect’ descriptions were helpful not only for quickly understanding the direct causes of
the three incidents but also for being aware of related contexts along with the rationales for the causes
of the incidents.
The main objective is to improve the understanding of the answers generated by QA systems. An
answer is considered to be coherent if duplicate expressions are eliminated and if it is appropriately
extended with additional information. This additional information should help users verify the
answers and increase their awareness of relevant domain information. Using RST annotations, it has
been shown that it is feasible to compare and integrate the information at a semantic level. This leads
to a way of presenting answers in a more natural manner. A pilot trial demonstrated that the answers
generated by the prototype QA system led to more rapid and improved understanding of those
answers.
Further work is planned with the aim of improving the performance of the prototype system in three
ways. First, since engineers have varying levels of domain expertise, the system should consider the
preferences and profiles of individuals. Inexperienced engineers might have very broad information
requests and prefer to explore the domain, whereas experienced engineers might have detailed
information requests aimed at refining their existing knowledge. Novice engineers require more
background information, probably assembled using ‘elaboration’ or ‘background’ relation types.
Second, synthesising sentences extracted from different documents is crucial to generate answers that
are longer than one sentence. When writing a sequence of linked sentences, authors often replace
noun phrases by pronouns, or shortened forms of the phrase, in subsequent sentences, e.g. the number
2 engine starter is replaced by it or the starter. Coreference (anaphora) resolution is a process for
determining the multiple representations of a noun phrase and is a key issue in computational sentence
synthesis. However the main focus of research in this area has been on the resolution of personal
37
pronouns e.g. he, him, his etc. Various techniques have been proposed for automatic coreference
identification and it is planned to extend the prototype QA system by adapting these techniques.
Third, the crucial issue of automatic RST annotation will be addressed since this is essential for the
practical application of the system. Kim et al. (2004) have applied a machine learning algorithm, i.e.
Inductive Logic Programming (ILP), to analyse documents created using the Design Rationale editor
(DRed). This enabled the automatic identification of the relation types (Bracewell & Wallace, 2003,
Bracewell et al., 2004). Tests have demonstrated approximately 80% accuracy. This high figure can
be attributed partly to the structure of the DRed documents in the dataset. These documents are
carefully structured using an argumentation model derived from that of IBIS (Kunz & Rittel, 1970).
The documents comprise linked textual elements of a predefined set of types. These element types
include ‘issue’, ‘answer’ and ‘argument’. The links between them are directed but untyped. This
algorithm will be extended to deal with other types of documents, e.g. Web pages and unstructured
texts.
The main objective of this research was to answer more complex questions than current QA systems
are capable of answering. There are five modules in the architecture of the proposed QA system:
Question Analysis; Answer Retrieval; Answer Extraction; Answer Score; and Answer Generation.
The Question Analysis Module analyses the question in terms of the Question Word, Question Focus
and Question Attribute. The next three modules retrieve, extract and score answers from documents
that have been manually annotated and semi-manually indexed. The manual annotation is based on
nine of the 33 relation types defined in RST. The semi-manual indexing uses the issue and product
categories of the EDIT engineering taxonomy. The main contribution of this research lies in the fifth
module. This module synthesises causal information into coherent answers, drawing information from
both different parts of a single document and from multiple documents. A prototype implementation
shows promise, but additional testing is required. Further developments are proposed that will: (1)
38
allow the system to take into account the preferences and profiles of users; (2) extend the system to
include coreference identification; and (3) eliminate the manual annotation of documents. As with all
computer support systems, the interface is critical and here further empirical research is needed.
Acknowledgements
This work was funded by the University Technology Partnership for Design, which is a collaboration
between Rolls-Royce, BAE SYSTEMS and the Universities of Cambridge, Sheffield and
Southampton. We also thank S. Banerjee and T. Pedensen for software implementing the similarity
measurement method proposed by Resnik.
References
80-20 software. (2003). 80-20 Retriever Enterprise Edition.http://www.80-20.com/brochures/Personal
Email Search Solution.pdf
Ahmed, S. (2005). Encouraging Reuse of Design Knowledge: A Method to Index Knowledge. Design
Studies Journal, 26(6), 565-592.
Ahmed, S., Kim, S., & Wallace, K. M. (2005). A methodology for creating ontologies for engineering
design. Proc. ASME 2005 Int. Design Engineering Technical Conf. on Computers and Information in
Engineering, DETC 2005-84729. U.S.A.
Allen, J. (1987). Natural Language Understanding. Benjamin/Cummings Publishing Company, Inc.
Aunimo, L., & Kuuskoski, R. (2005). Question Answering Using Semantic Annotation. Proc. Cross
Language Evaluation Forum (CLEF). Austria.
Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information Fusion in the Context of Multi-
Document Summarization. Proc. Annual Computational Language, pp. 550-557. U.S.A.
39
Bosma, W. (2005). Extending Answers using Discourse Structure. Proc. Workshop on Crossing
Barriers in Text Summarization Research in RANLP, pp. 2-9. Bulgaria.
Bracewell, R. H., Ahmed, S., & Wallace, K. M. (2004). DRed and design folders: a way of caputuring,
storing and passing on knowledge generated during design projects. Proc. Design Automation Conf.,
ASME. USA.
Bracewell, R.H., & Wallace, K. M. (2003). A tool for capturing design rationale . Proc. 14th Int. Conf.
on Engineering Design, pp. 185-186. Stockholm.
Brill, E., Lin, J., Banko, M., Dumais, S. T., & Ng, A.Y. (2001). Data-intensive question answering.
Proc. Tenth Text REtrieval Conf. (TREC 2001), pp. 183-189. U.S.A.
Burger, J., Cardie, C., Chaudhri, V., Gaizauskas, R. and et al. (2001). Issues, Tasks and Program
Structures to Roadmap research in Question & Answering (QA), NIST
Burstein, J., Marcu, D., & Knight, K. (2003). Finding the write stuff: Automatic identification of
discourse structure in student essays. IEEE Intelligent Systems, Jan/Feb, 32-39.
Diekema, A. R., Yilmazel, O., Chen, J., Harwell, S., He, L., & Liddy. E. D. (2004) . Finding Answers
to Complex Questions. In New Directions in Question Answering (Maybury, M. T., Ed), pp. 141-152.
AAAI-MIT Press.
Franz, M., & Roukos, S. (1994) TREC-6 Ad-Hoc Retrieval, Proc. the Sixth Text Retrieval Conf.
(TREC-6), pp.511-516.
Hai, D., & Kosseim, L. (2004). The Problem of Precision in Restricted-Domain Question-Answering:
Some Proposed Methods of Improvement. Proc. Workshop on Question Answering in Restricted
Domains in ACL, pp. 8-15. Barcelona.
Hickl, A., Lehmann, J., Williams, J., & Harabagiu, S. (2004). Experiments with Interactive Question
Answering in Complex Scenarios. Proc. North American Chapter of the Association for
Computational Linguistics annual meeting (HLT-NAACL), U.S.A.
Hovy, E. H. (1993). Automated Discourse Generation Using Discourse Structure Relations. Artificial
Intelligence, 63(1-2), 341-385.
40
Jing, H., & McKeown, K. R. (2000). Cut and Paste Based Text Summarization. Proc. 1st Meeting of
the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp.
178-185. U.S.A.
Kim, S., Bracewell, R.H., & Wallace, K.M. (2004). From discourse analysis to answering design
questions. Proc. Int. Workshop on the Application of Language and Semantic Technologies to support
Knowledge Management Processes, pp. 43-49. U.K.
Kim, S., Ahmed, S., & Wallace, K. M. (2006a). Improving document accessibility through ontology-
based information sharing. Proc. Int. Symposium series on Tools and Methods of Competitive
Engineering, pp. 923-934. Slovenia.
Kim, S., Bracewell, R.H., Ahmed, S., & Wallace, K. M. (2006b). Semantic Annotation to Support
Automatic Taxonomy Classification. Proc. Int. Design Conference (Design 2006), pp. 1171-1178.
Croatia.
Knott, A., & Dale, R. (1995). Using linguistic phenomena to motivate a set of coherence relations.
Discourse Processes, 18(1), 35-62.
Kunz, W., & Rittel, H. W. J. (1970). Issues as Elements of Information Systems. Working Paper 131.
Center for Planning and Development Research, Berkeley, USA. Elsevier Scientific Publishing
Company, 55-169, Inc. Amsterdam.
Kwok, C. Etzioni, O., & Weld, D. S. (2001). Scaling Question Answering to the Web. Proc. of the
10th Int. Conf. on World Wide Web, pp. 150-161. Hong Kong
Liddy, E. D. (1998). Enhanced Text Retrieval Using Natural Language Processing. Bulletin of the
American Society for Information Science and Technology, 24(4), 14-16.
Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., & Karger, D. R. (2003). What Makes a
Good Answer? The Role of Context in Question Answering. Proc. of the IFIP TC13 Ninth Int. Conf.
On Human-Computer Interaction, Switzerland.
41
Lopez, V., Pasin, M., & Motta, E. (2005). AquaLog: An Ontology-Portable Question Answering
System for the Semantic Web. Proc. of the Second European Semantic Web Conference (ESWC), pp.
546-562. Greece.
Mani, I., & Maybury, M. (1999). Advances in Automatic Text Summarisation. The MIT Press.
Mann, W., & Thompson, S. (1988). Rhetorical structure theory: Toward a functional theory of text
organization. Text, 8(3), 243-281
Marcu, D., & Echihabi, A. (2002). An unsupervised approach to recognising discourse relations. Proc.
the 40th Annual Meeting of the Association for Computational Linguistics, pp. 368-375. U.S.A.
Marcu, D. (1999). Discourse trees are good indicators of importance in text. In Advances in Automatic
Text Summarization (Mani, I., Maybury, M., Eds.), MIT Press
Marsh, J.R., & Wallace, K. (1997). Observations on the Role of Design Experience. Proc. WDK
Annual Workshop, Switzerland.
Miller, G.A., Beckwith, R.W., Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to wordnet:
An on-line lexical database. International Journal of Lexicography, 3(4), 235-312.
Mukherjee, R., & Mao, J. (2004). Enterprise Search Tough Stuff ACM Queue, 2(2), 36-46
Nyberg, E., Mitamura, T., Frederking, R., Pedro, V., Bilotti, M., Schlaikjer, A., & Hannan, K. (200 5).
Extending the JAVELIN QA System with Domain Semantics. Proc. of the Workshop on Question
Answering in Restricted Domains at AAAI, U.S.A.
O'Donnell, M., (2000). RSTTool 2.4 -- A Markup Tool for Rhetorical Structure Theory . Proc. of the
Int. Natural Language Generation Conference (INLG'2000), pp. 253-256. Israel.
Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in Taxonomy. Proc.
14th Int. Joint Conf. on Artificial Intelligence, pp. 448-453.
Robertson, S. E., Walker, S., Jones, S., & Hancock-Beaulieu, M. G. (1995) Okapi at TREC-3. Proc. of
the Third Text REtreval Conference (TREC-3), pp. 550-225.
Salton, G. (1989). Advanced Information-Retrieval Models. In Automatic Text Processing (Salton, G.
Ed.), chapter 10. Addison-Wesley Publishing Company.
42
Sekine S., & Grishman R. (2001). A Corpus-Based Probabilistic Grammar with only two Non-
Terminals, Proc. Fourth Int. Workshop on Parsing Technologies, pp. 216-223. Czech Republic.
Taboada, M., & Mann. W. (2006). Rhetorical Structure Theory: Looking back and Moving ahead
Discourse Studies, 8(3) (to appear)
Teufel, S. (2001). Task-based evaluation of summary quality: Describing relationships between
scientific papers. Proc. Int. Workshop on Automatic Summarization at NAACL, U.S.A.
Voorhess, E. M. (2002). Overview of the TREC 2002 Question Answering Track. Proc. of the Text
Retrieval Conference. (TREC).
Williams, S., & Reiter, E. (2003). A corpus analysis of discourse relations for natural language
generation. Proc. of Corpus Linguistics, pp. 899-908, U.K.
43
... Knowledge Management (KM), as a key enabling technology for distributed enterprises in the 21st century, has attracted considerable attention in recent engineering design research [2]. Previous research on KM for engineering design tends to be very diverse, ranging from understanding engineering designers in design activities in terms of their information needs, information organisation and usage, and information-seeking behaviours [3][4][5][6][7][8][9][10], to the development of structured models to represent design knowledge [11][12][13][14][15][16] as well as the development of methods and tools for knowledge capture, retrieval and reuse [14,[17][18][19][20][21]. ...
... However, previous research has shown that, despite the ever-increasing investment in ICT within enterprises, the most effective and efficient way of finding and re-using design information is still by consulting experienced colleagues [3,6,21,31], which is in part due to information overload and ineffectiveness of retrieval [22]. Work in this area is much less compared to knowledge capture [17]. Moreover, most knowledge retrieval methods developed in previous research are based on formal knowledge models and cannot effectively find contextual design information [17,39,40]. ...
... Work in this area is much less compared to knowledge capture [17]. Moreover, most knowledge retrieval methods developed in previous research are based on formal knowledge models and cannot effectively find contextual design information [17,39,40]. Consequently, more research is required to develop intelligent retrieval methods in conjunction with the development of effective knowledge models. ...
Article
Current research on design knowledge capture and reuse has predominantly focused on either the codification view of knowledge or the personalisation view of knowledge, resulting in a failure to address designers’ knowledge needs caused by a lack of context of information and insufficient computational support. Precisely motivated by this gap, this work aims to address the integration of these two views into a complete, contextual and trustworthy knowledge management scheme enabled by the emerging collaborative technologies. Specifically, a knowledge model is developed to represent an integrated knowledge space, which can combine geometric model, knowledge-based analysis codes and problem-solving strategies and processes. On this basis, a smart collaborative system is also designed and developed to streamline the design process as well as to facilitate knowledge capture, retrieval and reuse as users with different roles are working on various tasks within this process. An engineering case study is undertaken to demonstrate the idea of collaborative knowledge creation and sharing and evaluate the effectiveness of the knowledge representation model and the collaborative technologies employed. As evidenced in the development and evaluation, the methods proposed are effective for capturing an integrated knowledge space and the collaborative knowledge management system not only facilitates problem-solving using knowledge-based analysis but also supplies in-context tacit knowledge captured from the communications between users throughout the design process.
... Using the ontology model, researchers have developed representation models which describe product concepts, functional relationships and rationale as semantic networks. This has proved to be useful for knowledge sharing and reuse [27][28][29][30][31][32][33][34]. Chhim et al. represented the design failure mode and effects analysis knowledge with ontologies and semantic Web technologies and applied SPARQL query to search for potential design failure modes [33]. ...
Article
Complex product development increasingly entails creation and sharing of design knowledge in a collaborative and integrated working environment. In this context, it has become a central issue to address the multifaceted feature of design knowledge for such a collaborative knowledge sharing scheme. This paper proposes a hypernetwork-based approach to explicitly capturing the relationships between various elements in a multifaceted knowledge representation. Specifically, a knowledge hypernetwork model is constructed, which is composed of a designer network, a product network, an issue network and a knowledge unit network. The relationships between various nodes from different networks are identified and defined according to specific node properties. In addition, topological characteristics of the hypernetwork structure are analyzed together with the statistical indicators. Based on this model, the Bayesian approach is adopted to conduct the collaborative reasoning process whereby knowledge elements relevant to the current design task are recommended according to the issues to be resolved and the current design context. A case study conducted in this work shows that the proposed approach is effective in capturing the complex relationships between multi-faceted knowledge elements and enables collaborative retrieval and reasoning of knowledge records.
... However if users are not effectively shielded from extra cognitive load due to this complexity, it may hinder practical application. While design capture in DRed is deliberately done with a simple, generic ontology, natural language processing techniques may be applied with domain ontologies to perform off-line semantic mark-up of documents [19]. This has the potential to improve context specific retrieval of past rationale and other design information. ...
Conference Paper
Full-text available
This paper addresses the general issue of software tool support for designers, helping them to structure, to communicate and to document activities of generation, evaluation and decision. Here the focus is on detailed consideration of desired and undesired behavioural relationships among elements of complex design artefacts and with end users. This is an area that has recently been under discussion by proponents and critics of Affordance Based Design methods. Our solution approach is to extend an existing graph based software tool for design rationale capture that has been in widespread use in an international aerospace company for several years. We are integrating its Issue Based Information System (IBIS) based design argumentation with hierarchical Functional Analysis Diagrams (FAD), a form of Concept Map. The resulting software is being tested by practical application on pilot projects in the company, and initial experiences have been very favourable. The new graph element types, bidirectional relationship types between graphs, and supporting navigational facilities are described. Their use is illustrated by example of an integrated hierarchical FAD and assembly geometry model of a gas turbine engine.
... Whereas current approaches tend to focus on a single cause effect relation by isolating its association from the rest of the relations, in practice it is difficult to untangle them when understanding how and why the event was caused. In previous work, we demonstrated the usefulness of such causal chains when understanding aviation accident reports [17]. When tested with eight subjects, it was observed that the chains helped the subjects to understand better the events such that they could answer questions quickly and accurately. ...
Conference Paper
Full-text available
Textual documents are the most common way of storing and distributing information within organizations. Extracting useful information from large text collections is therefore the goal of every organization that would like to take advantage of the experience encapsulated in those texts. Entering data using a free text style is easy, as it does not require any special training. However, unstructured texts pose a major challenge for automatic extraction and retrieval systems. Generally, deep levels of text analysis using advanced and complex linguistic processing are necessary that involve computational linguistic experts and domain experts. Linguistic experts are rare in engineering organizations, which thus find it difficult to apply and exploit such advanced extraction techniques. It is therefore desirable to minimize the extensive involvement of linguist experts by learning extraction patterns automatically from example texts. In doing so, the analysis of given texts is necessary in order to identify the scope and suitable automatic methods. Focusing on causality reasoning in the field of fault diagnosis, the results of experimenting with an automatic causality extraction method using shallow linguistic processing are presented.
... There are also other studies related to information extraction that are meant for information retrieval purpose. Kim et al. [30] employed an integrated taxonomy of engineering design for answer tag extraction and generation from engineering documentation. By using rhetoric structure theory for discourse analysis on engineering documents, the annotation process was performed and eventually served as the knowledge-base for a question and answering system. ...
Article
With the advent of various services and applications of Semantic Web, semantic annotation had emerged as an important research area. The use of semantically annotated ontology had been evident in numerous information processing and retrieval tasks. One of such tasks is utilizing the semantically annotated ontology in product design which is able to suggest many important applications that are critical to aid various design related tasks. However, ontology development in design engineering remains a time consuming and tedious task that demands tremendous human efforts. In the context of product family design, management of different product information that features efficient indexing, update, navigation, search and retrieval across product families is both desirable and challenging. This paper attempts to address this issue by proposing an information management and retrieval framework based on the semantically annotated product family ontology. Particularly, we propose a document profile (DP) model to suggest semantic tags for annotation purpose. Using a case study of digital camera families, we illustrate how the faceted search and retrieval of product information can be accomplished based on the semantically annotated camera family ontology. Lastly, we briefly discuss some further research and application in design decision support, e.g. commonality and variety, based on the semantically annotated product family ontology.
... There are also other studies related to information extraction that are meant for information retrieval purpose. Kim, Bracewell, and Wallace (2007) employed an integrated taxonomy of engineering design for answer tags extraction and generation from engineering documentation. By using rhetoric structure theory for discourse analysis on engineering documents, the annotation process was performed on engineering documents and eventually the tagged documents served as the knowledge base for a question and answering system. ...
Article
Full-text available
We review the scholarly contributions that utilise natural language processing (NLP) techniques to support the design process. Using a heuristic approach, we gathered 223 articles that are published in 32 journals within the period 1991–present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions and others. Upon summarising and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.
Preprint
Full-text available
We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.
Article
The tendon width of the lateral rectus muscle can be a useful indicator of the effect of unilateral lateral rectus recession in intermittent exotropia. The aim of this study was to determine whether the tendon width of lateral rectus would be useful for predicting the effect of bilateral lateral rectus (BLR) recession. We studied a total of 45 patients between 3 and 15 years of age who had undergone bilateral rectus recession for the basic type of intermittent exotropia. The actual effect of lateral recession was calculated by adding the absolute value of the angle of preoperative deviation and the postoperative deviation on the second day and dividing the figure by the total amount of recession. We then calculated the hypothetical effect of lateral rectus recession, considering the tendon width of each eye, and added the effects of both eyes. The hypothetical effects were defined as 3 PD when the tendon width was 8-8.5 mm; 3.5 PD when it was 7-7.5 mm; and 2.5 PD when it was 9-9.5 mm, based upon earlier statistical analysis. We compared both effects using a paired t-test. The mean difference between the actual and the hypothetical effects of BLR recession in all patients was 2.88 PD (P=0.001, range: 0-5.50 PD). However, when the amount of preoperative exodeviation was below 25 PD, the differences were not statistically significant (P=0.086). The tendon width may also be useful indicator in BLR recession if the preoperative exodeviation is below 25 PD.
Article
Full-text available
The purpose of this study was to develop a web-based senescence preparation education program to promote successful aging. This program was developed based on Network-Based Instructional System Design (NBISD) model, using the following 5 processes: analysis, design, development, implementation, and evaluation. The program was operated for 10 weeks from March 17 to May 25, 2008. There were 4 menu bars, introduction, related data, lecture room, and communication on the main page. In the operation of this program, HTML, ASP, JAVA Script, Namo web editor, Edit Plus, Front Page and multimedia technology were applied. The program content consisted of understanding elderly people, physical health, activity & exercise, nutrition, medication use, psychological health, intellectual health, understanding death, welfare system and leisure activity. This program could be a useful means to provide senescence preparation information to middle-aged adults. Also, it is expected to offer individualized learning opportunities to many learners in various settings. Nurses should further develop and facilitate various learning strategies including web- based programs for elder care.
Conference Paper
Full-text available
This paper describes a software tool called DRed (the Design Rationale editor), that allows engineering designers to record their design rationale (DR) at the time of its generation and deliberation. DRed is one of many proposed derivatives of the venerable IBIS concept, but by contrast with other tools of this type, practicing designers appear surprisingly willing to use it. DRed allows the issues addressed, options considered, and associated arguments for and against, to be captured graphically. The software, despite still being essentially a research prototype, is already in use on high profile design projects in an international aerospace company, including the presentation of results of design work to external customers. The paper compares DRed with other IBIS-derived software tools, to explain how it addresses problems that seem to have made them unsuitable for routine use by designers. In addition to the capture and presentation of the DR itself, the set of linked DR graphs can be used to provide a map of the contents of an electronic Design Folder, containing all the documents created by an individual or team during a design project. The structure of the knowledge model instantiated in such a Design Folder is described. By reprising a design case study published at the DTM 2003 conference, concerning the design of a Mobile Arm Support (MAS), the DRed knowledge model is compared with the previously proposed Design Data Model (DDM), to show how it addresses the shortcomings identified in the DDM. Finally the methodology and results of the preliminary evaluation of the use of DRed by aerospace designers are presented.
Article
Full-text available
Information plays a crucial role in the entire life cycle of a product, i.e. from conceptual design through use and maintenance to eventual disposal. Once key personnel move on from a company, vital know-how may only exist in archived product documents, leading to great interest in ways of extracting readily sharable information from them. Studies have shown that knowledge workers spend appreciable amounts of their productive time searching for information online, and that those searches often prove unsuccessful. In addition, most engineers have difficulty in formulating effective query terms and current keyword-based searches are inefficient for helping them to select better queries. It is commonly argued that this problem can be solved by giving explicit definitions to the information through an ontology. Ontology-based information sharing has the potential to reduce the effort expended by engineers on retrieval and reuse of existing information. This paper presents an approach that extracts domain concepts from the documents based on an ontology definition. In particular, the applications of information extraction and document enrichment through engineers' comments are discussed. The EDIT engineering ontology is described, and its role in improving document enrichment is presented.
Conference Paper
Full-text available
This paper discusses an approach of modelling design ratio- nal expressed in natural language sentences into a discourse model. The discourse model is used to classify captured rationale into discourse cat- egories by taking into account semantic and pragmatic relationships be- tween two design elements. It is expected that accessibility and reusabil- ity of captured rationale is improved since retrieval is supported within discourse contexts. A small dataset was collected to test whether selected discourse relations are extractable and whether a machine learning al- gorithm can generate rules under which appropriate relations can be automatically marked.
Conference Paper
Full-text available
The paper presents a new taxonomy classification method that generates classification criteria from a small number of important sentences identified through semantic annotations. Rhetorical Structure Theory (RST) is used to discover the semantics. The annotations identify which parts of a text are more important for understanding its contents. The extraction of salient sentences is a major issue in text summarisation. Statistical analysis is commonly used, but for subject-matter type texts, linguistically motivated natural language processing techniques, e.g. semantic annotations, are preferred. An experiment to test the method using documents collected from industry demonstrated that classification accuracy can be improved by up to 16%.
Article
We present a novel method for task-based evalua- tion of summaries of scientific articles. The task we propose is a question-answering task, where the questions are about the relatedness of the current paper to prior research. This evaluation method is time-efficient with respect to material preparation and data collection, so that it is possible to test against many different baselines, something that is not usually feasible in evaluations by relevance decision. We use this methodology to evaluate the quality of summaries our system produces. These summaries are designed to describe the contribution of a scientific article in relation to other work. The re- sults show that this type of summary is indeed more useful than the baselines (random sentences, keyword lists and generic author-written summaries), and nearly as useful as the full texts.
Conference Paper
This paper describes a methodology for developing ontologies for engineering design. The methodology combines a number of methods from social science and computer science, together with taxonomies developed in the field of engineering design. A case study is used throughout the paper focusing upon the use of an ontology for searching, indexing and retrieving of engineering knowledge. An ontology for indexing design knowledge can assist the users to formulate their queries when searching for engineering design knowledge. The root concepts of the ontology were elicited from engineering designers during an empirical research study. These formed individual taxonomies within the ontology and were validated through indexing a set of ninety-two documents. Relationships between concepts are extracted as the ontology is populated with instances. The identified root concepts were found to be complete and sufficient for the purpose of indexing. A thesaurus and an automatic classification are being developed as a result of this evaluation. The methodology employed during the test case is presented in this paper. There are six separate stages, which are presented together with the research methods employed for each stage and the evaluation of each stage. The main contribution of this research is the development of a methodology to allow researchers and industry to create ontologies for their particular purpose and to develop a thesaurus for the terms within the ontology. The methodology is based upon empirical research and hence, focuses upon understanding a user’s domain models as opposed to extracting an ontology from documentation.