ArticlePDF Available

On the predictive ability of narrative disclosures in annual reports

May 2010
European Journal of Operational Research 202(3):789-801

May 2010
202(3):789-801

DOI:10.1016/j.ejor.2009.06.023

Source
RePEc

Authors:

Xin Ying Qiu

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China

Padmini Srinivasan

University of Iowa

We investigate whether narrative disclosures in 10-K and 10K-405 filings contain value-relevant information for predicting market performance. We apply text classification techniques from computer science to machine code text disclosures in a sample of 4280 filings by 1236 firms over five years. Our methodology develops a model using documents and actual performance for a training sample. This model, when applied to documents from a test set, leads to performance prediction. We find that a portfolio based on model predictions earns significantly positive size-adjusted returns, indicating that narrative disclosures contain value-relevant information. Supplementary analyses show that the text classification model captures information not contained in document-level features of clarity, tone and risk sentiment considered in prior research. However, we find that the narrative score is not providing information incremental to traditional predictors such as size, market-to-book and momentum, but rather affects investors' use of price momentum as a factor that predicts excess returns.

Overview of design.

…

Descriptive statistics.

…

Supplementary analysis.

…

Figures - uploaded by Xin Ying Qiu

Content may be subject to copyright.

Content uploaded by Xin Ying Qiu

Content may be subject to copyright.

This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research

and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or

licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the

article (e.g. in Word or Tex form) to their personal website or

institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are

encouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Stochastics and Statistics

On the predictive ability of narrative disclosures in annual reports

Ramji Balakrishnan

a,*

, Xin Ying Qiu

, Padmini Srinivasan

The University of Iowa, Tippie College of Business, Iowa City, IA 52246, USA

Christopher Newport University, Luter School of Business, Newport News, VA 23606, USA

The University of Iowa, Computer Science Department and Tippie College of Business, Iowa City, IA 52246, USA

article info

Article history:

Received 6 February 2009

Accepted 18 June 2009

Available online 30 June 2009

Keywords:

Economics

Finance

Text mining

Capital markets

abstract

We investigate whether narrative disclosures in 10-K and 10K-405 ﬁlings contain value-relevant infor-

mation for predicting market performance. We apply text classiﬁcation techniques from computer sci-

ence to machine code text disclosures in a sample of 4280 ﬁlings by 1236 ﬁrms over ﬁve years. Our

methodology develops a model using documents and actual performance for a training sample. This

model, when applied to documents from a test set, leads to performance prediction. We ﬁnd that a port-

folio based on model predictions earns signiﬁcantly positive size-adjusted returns, indicating that narra-

tive disclosures contain value-relevant information. Supplementary analyses show that the text

classiﬁcation model captures information not contained in document-level features of clarity, tone and

risk sentiment considered in prior research. However, we ﬁnd that the narrative score is not providing

information incremental to traditional predictors such as size, market-to-book and momentum, but

rather affects investors’ use of price momentum as a factor that predicts excess returns.

1. Introduction

A primary use of accounting reports is to help investors evaluate an organization’s ﬁnancial prospects. Narratives are an important infor-

mation source for analysts and a critical component in annual reports (Rogers and Grant, 1997). A majority of the ﬁnancial analysts sur-

veyed by the AIMR (2000) indicate that management discussion is a very or extremely important item when evaluating ﬁrm value.

However, perhaps because of the relative costs of gathering and analyzing numerical versus textual data, most academic research has fo-

cused on the quantitative disclosures in annual reports. Moreover, because of ﬂexibility in framing these disclosures with respect to choice

of words and tone in addition to content, it is likely that the information in narratives is not fully impounded into contemporaneous prices

(see Li, 2006 for additional observations in this regard). In this study, we modify and apply techniques from the text classiﬁcation branch of

computer science to the narrative disclosures in 10-K and 10-K405 ﬁlings in order to predict market returns.

In a training sample, we pair the narrative disclosure in the 10K documents with the subsequent performance and use standard text

classiﬁcation techniques to build a predictive model. In particular, we deﬁne out- and under-performing ﬁrms as the top (bottom) 25%

of all ﬁrms, and group ﬁrms into three classes (out-performing, average and under-performing) based on their actual performance from

period tto tþ1. We then use text disclosed in period t(that relate to performance for the period t1tot) and the performance class

to build a model that associates the text in a 10K report for a period with next period’s performance. This automated text-classiﬁcation

exercise, which employs many features such as the number, frequency, and count of words that are similar (dissimilar) across documents,

yields a model that can classify the text for an arbitrary ﬁrm as to its predicted performance. We test the model’s predictive ability by

applying it to the documents for period tþ1 (this testing sample of documents relates to performance for tto tþ1) to predict performance

for the period tþ1totþ2. We use these predictions to form and maintain a portfolio. Speciﬁcally, for each year, our equally weighted

portfolio buys stock in ﬁrms we predict to out-perform the market and sells predicted under-performers. The magnitude of the size-ad-

justed returns for the portfolio is then a joint test of the presence of value relevant information in narrative disclosures and our ability

to systematically extract it. (Of course, like the anomalies literature, our analysis also assumes that the information is not impounded

immediately into prices.)

On average, our portfolio yields an average size-adjusted return of 12.16% per year. Our classiﬁer is word-based, i.e., it extracts key

words from the texts and uses these as features to build predictive models. We conduct additional tests to examine the extent to which

doi:10.1016/j.ejor.2009.06.023

*Corresponding author. Tel.: +1 (319) 335 0958; fax: +1 (319) 335 1956.

E-mail addresses: Ramji-Balakrishnan@uiowa.edu,ramji.balakrishnan@gmail.com (R. Balakrishnan).

European Journal of Operational Research 202 (2010) 789–801

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor

Author's personal copy

our (word-level) model adds to models built using (document-level) meta-features such as clarity and tone. Motivated by prior research we

add the following three meta-features to our model: Fog index (Li, 2006), risk sentiment (Henry, 2006a) and optimistic tone (Davis et al.,

2006). (As a check, we replicated the association between changes in clarity (the fog index) and market returns. A portfolio based on

changes in clarity leads to a size adjusted return of 10%, a magnitude similar to that reported in Li, 2006.) Adding these three docu-

ment-level meta-features to the text classiﬁcation model, however, leads to statistically similar returns (11.74% for the augmented model

versus 12.16% for the raw model). In contrast, a model that contains only the three textual meta-features does not generate any excess

return whatsoever. We also ﬁnd that while the meta-features distinguish between average and extreme performance (as also predicted

by our model) they are less able to distinguish the direction of the difference (i.e., between out- and under-perform), particularly for

sub-samples of ﬁrms. That is, while the scores on the meta-features of out- and under-performing ﬁrms differ from the average ﬁrm

(i.e., they are denser and reﬂect greater risk sentiment), the two groups do not differ between themselves. We conclude that the word-level

text classiﬁer model we employ captures more information than is represented in the meta-features of clarity, risk sentiment, and tone,

suggesting a fruitful role for word-based text classiﬁcation methods in accounting and ﬁnance research.

Next, to gain insight into the source of the information content, we examine quantitative properties of ﬁrms in the predicted classes.

These univariate comparisons indicate that the text-based disclosure score we develop is correlated with attributes such as size and mar-

ket-to-book. Indeed, sub-sample analyses indicate a greater portfolio return for glamour versus value ﬁrms, and for small versus large

ﬁrms. Thus, one explanation for our results is that our text-classiﬁcation captures ﬁrm attributes that may be readily computed using

ﬁnancial data. While the correlation between text disclosure and ﬁnancial characteristics is an interesting ﬁnding in itself, it also is possible

that the text disclosures affect the association between numeric dimensions and market performance. We examine these conjectures by

regressing the excess returns on known factors such as earnings surprise, price momentum, ﬁrm size, and market-to-book ratio. We ﬁnd no

main effect for the score on narrative disclosures, suggesting that the disclosure score provides no new information over that provided by

known ﬁnancial factors. However, we ﬁnd that the coefﬁcient for interaction term with price momentum reliably differs from zero. (The

interaction with market-to-book ratio is weakly signiﬁcant.) We infer that differences in the disclosures across ﬁrms affect conﬁdence in the

numerical estimates, a ﬁnding consistent with ﬁrms with differing proﬁles following differing text disclosure strategies.

Our study makes both methodological and economic contributions. Recent literature (e.g., Li, 2006; Davis et al., 2006; Henry, 2006a,b;

Tetlock et al., 2006) that conducts a large sample study of the characteristics of narrative disclosures considers speciﬁc dimensions of

narrative disclosure.

Li, 2006) examines how changes in a ﬁrm’s fog index (a measure of readability) correlates with earnings prospects

and persistence, thereby shedding light on managerial incentives to alter readability. In a levels study, Davis et al. (2006) consider the asso-

ciation between the tone (optimistic/pessimistic) of current reports with future ROA. Henry (2006a) conducts an event study that links the

tone of press releases with market reactions.

In contrast, our text-classiﬁcation algorithm offers three key methodological advantages. First,

it simultaneously considers all aspects of the disclosure such as length and word choice, thereby avoiding the need to impose an external

model to generate meta-features such as optimism or readability. Allowing an unconstrained relation lets the predictive model capture com-

plex interactions among features. This attribute is particularly important for analysis of text because the relations can occur at the word,

sentence and/or document level. And yet (the second advantage), our approach is open to including meta-features such as the fog index,

thereby helping us understand the information captured by meta-features. (We perform such an extension.) Third, our approach can be

readily extended to include other text sources such as analyst forecasts, economic reports or industry analyses that also might be relevant

for ﬁrm valuation and for predicting performance. Indeed, it is possible to differentially assign weights to these sources in terms of their

credibility, freshness, and so on, which extensions are not possible with the current approaches which rely on features developed from

external models.

Economically, we show that current period disclosure quality is associated with future returns and that the disclosures affect the con-

ﬁdence in estimates.

Our results indicate considerable beneﬁts from research that reﬁnes such predictive models by increasing the dataset

(e.g., adding economic forecasts), and by conditioning the model on parameters such as industry and product-life cycle. Overall, the techniques

we explore in this paper point toward a rich set of questions that parallel the use of numeric disclosures and examine the use of narrative

disclosures by market participants as well as management incentives connected with such disclosures.

The rest of this paper is as follows: Section 2 describes our research question and Section 3 provides an overview of the methodology.

We discuss sample selection process in Section 4 and provide sample descriptions. We report results in Section 5 and offer concluding re-

marks in Section 6.

2. Background

Beginning with the seminal work by Ball and Brown (1968), a vast literature examines whether and how market participants employ

ﬁnancial reports to evaluate a ﬁrm’s future performance, and thus, its value. Fields et al. (2001); Kothari (2001) and Healy and Palepu

(2001) provide recent surveys. In contrast to the attention paid to the properties of and the information contained in ﬁnancial data dis-

closed by ﬁrms, there is a paucity of research examining the narrative disclosures. However, such narratives are an important information

source to the analysts and a critical component in annual reports. Rogers and Grant (1997) found that the management discussion and

analysis (MD&A) section in annual reports constituted the largest proportion of information cited by the analysts. They state (p. 17),

‘‘[I]n total, the narrative portions of the annual report provide almost twice as much information as the basic ﬁnancial statements”.

There is also a research stream (Barron et al., 1999; Clarkson et al., 1999; Subramanian et al., 1993; Smith and Tafﬂer, 2000) that primarily relies on hand-coded classiﬁcation

of a small sample of ﬁrms when investigating their research question.

Henry (2006b) considers a partitioning algorithm (CART) and shows that including data about key words and document style improves classiﬁcation accuracy. She performs a

10-fold analysis on contemporaneous data. That is, the model is trained with 90% of the observations and tested on the remaining 10%. Thus, the model is not implementable

because it uses data from the same period to predict returns. That is, she uses actual data from 1998 for 90% of ﬁrms to predict returns in 1998 for the remaining 10% of ﬁrms. In

contrast, our approach and tests lead to implementable approach in that we use actual data from 1998 to predict returns for 1999.

The disadvantage is that the underlying model is not transparent because it might be non-linear. Although not our focus, with additional structure and analyses, it is possible

to determine the relative ‘‘weights” of the attributes.

Botosan (1997) and Botosan and Plumlee (2000), who follow the convention of using the AIMR ranking of corporate disclosure as a measure of disclosure quality, are notable

exceptions.

790 R. Balakrishnan et al. /European Journal of Operational Research 202 (2010) 789–801

Author's personal copy

Similarly, a survey of ﬁnancial analysts by the Association for Investment Management and Research (AIMR, 2000) found that ‘‘86% of sur-

veyed ﬁnancial analysts indicated that management’s discussion of corporate performance was an ‘extremely’ or ‘very’ important factor in

assessing ﬁrm value” (AIMR, 2000).

Research corroborates practitioners’ claims that the narrative in an annual report contains value relevant information. For instance,

using quality scores provided by analysts, Clarkson et al. (1999) ﬁnds that the quality of forward-looking information in MD&A directly

relates to the ﬁrm’s upcoming performance. Botosan (1997) studies the association between disclosure level and the cost of equity capital,

and ﬁnds that voluntary disclosures substitute for analyst following in lowering cost of capital. Bryan (1997) shows that discussions of fu-

ture operations and planned capital expenditures are associated with one-period-ahead changes in sales, earnings per share, and capital

expenditures. Barron et al. (1999) ﬁnds that high MD&A quality (in terms of compliance with the disclosure requirements) reliably reduces

errors in analysts’ earnings forecasts. We also interpret the SEC’s plain English disclosure rules as acknowledging the importance of nar-

rative disclosures when evaluating earnings and cash ﬂow (Firtel, 1999).

The source of the information content in narrative disclosures is subtle and hard to measure. Subramanian et al. (1993) ﬁnd that

well performing ﬁrms used ‘‘strong” writing in their reports while poor performers’ reports contained signiﬁcantly more jargon or

modiﬁers and were hard to read. Smith and Tafﬂer (2000) identify thematic keywords from Chairman’s statements and generate

discriminant functions to predict company failure. Kohut and Segars (1992) study president’s letters in annual reports and suggest

that, as a communications stratagem, poor performing ﬁrms tend to emphasize future opportunities over poor past ﬁnancial perfor-

mance. Lang and Lundholm (2000) ﬁnd that ‘‘optimistic” pre-announcement disclosures of equity offerings lower the cost of equity

capital.

Because of the difﬁculty in data collection and measurement, early studies that examine the qualitative aspects of the disclosure usually

employ hand-collected data and examine small samples. They also typically rely on experts to code the quality of disclosure (e.g., AIMR

scores). Recognizing these limitations, Core (2001) suggests that computing the measure of disclosure quality could greatly beneﬁt from

the techniques of other research areas such as computer science, computational linguistics, and artiﬁcial intelligence. There also is interest

in developing analyses that test the information content and the predictive ability of narrative disclosures in a large-sample study with

automatic coding of data.

Recent research (Li, 2006; Henry, 2006a,b; Davis et al., 2006) has responded to this call. Typical examples include Li (2006) who shows

that changes in the readability of the MD&A section are predictive of future return and Davis et al. (2006) who show that tone (a count of

pessimistic versus optimistic words) is associated with future ROA. Note that, like our study, Li (2006) assumes that market price does not

instantaneously impound the information contained in narrative disclosures. We view these papers as positing a relation between some

dimension(s) of textual data and future performance. Thus, these papers construct a measure (e.g., fog index, count of positive words)

of the typically one dimension (readability, optimism) studied, and use traditional statistical methodology such as OLS regressions to test

the association between the measure and performance. The values and relations among the parameter estimates form the basis for infer-

ences about patterns in the data.

Our innovation is the use of an algorithmic approach (see also Henry, 2006b) to develop a predictive model.

Our approach, which draws

from foundations in computer science, focuses on predictive accuracy and treats the data structure or pattern as an unknown. The goal is to let

the algorithm ‘‘learn” the underlying model using the most relevant information from the entire set available. Thus, the focus is not on gen-

erating model parameters but on ﬁtting the best possible model. Such an approach confers at least three advantages:

We can simultaneously consider many different aspects of the disclosure such as length, readability and word choice, thereby avoid-

ing the need to specify ex ante the meta-features of interest such as optimism or readability. Such an unconstrained relation lets the

predictive model discover and capture complex interactions among features. This attribute is particularly important for analysis of

text because the relations can occur at the word, sentence and/or document level. Indeed, we can (and do so in our extensions)

include document-level meta-features such as the fog index, thereby helping us understand the information captured by meta-

features.

The approach can be easily extended to include other information sources such as economic reports. Including such data is particularly

useful because market participants parse the annual report in the broader economic context and the other information available to

them.

Indeed, current development in computer science allows for models that differentially weight information sources in terms of their

credibility, freshness, and so on.

We can use the model to identify sub-sets of the population that systematically differ in terms of the information content of their

disclosure.

Because of these advantages, the use of algorithmic text classiﬁcation models is widespread in diverse areas such as marketing, biomed-

icine, music, law and web crawlers (Dave et al., 2003; Popa et al., 2007;Pérez-Sancho et al., 2005;Thompson, 2001; Pant and Srinivasan,

2005), although their use in ﬁnance and accounting is nascent. The primary disadvantage is that the method does not readily yield

parameters that we could use to assess the statistical/economic signiﬁcance of individual dimensions and/or sources. While possible, such

analyses require the researcher to impose considerably more structure and are left open for future research.

Our method differs from the CART method in Henry (2006b) in that we do not sequentially add measures of constructs to partition the data. Rather, the entire set of words is

used to construct a model.

For instance, Asquith et al. (2006) examine the information content of qualitative analysis provided by equity analysts.

As an example, consider a model that tests the ability of ﬁlm reviews to predict box ofﬁce receipts. We can then identify the reviewers whose reviews consistently out-

perform reviews by other reviewers. Studying this sub-sample of reviews then can help us understand the features that make a review more predictive of box ofﬁce success.

Similarly, we can use this methodology to ﬁnd sub-sets of ﬁrms whose narrative disclosures are more informative regarding market and/or accounting performance. We can then

study these disclosures to glean the reason why.

The two approaches are complements. The algorithmic approach can potentially help identify the constructs and an outline of the model. We can then employ traditional

statistical methodology to ﬁt the model and identify parameters.

R. Balakrishnan et al. /European Journal of Operational Research 202 (2010) 789–801 791

Author's personal copy

3. Methodology

We focus on whether we could use narrative disclosures to construct measures that predict the ﬁrms’ performance. Constructing such

an algorithm requires that we deﬁne (1) a method to quantitatively represent a document’s narrative disclosure, (2) measures of a ﬁrm’s

performance, and (3) a model that will enable the use of the disclosure measure in step (1) to predict performance as deﬁned in step (2).

We address these issues next. (See Appendix A for a non-technical description of the text classiﬁcation problem; Mahinovs and Tiwari

(2007) provide an accessible review of the literature. See http://videolectures.net/mlas06_cohen_tc/ (accessed on 7/8/08) and Sebastiani

(2002) for an in-depth review of the area.)

Brieﬂy, text classiﬁcation is the task of automatically putting documents into predeﬁned categories. (A ready example is assigning news

articles by topic such as politics, sports or culture.) This classiﬁcation task comprises several steps, the ﬁrst of which is text representation.

For this step, we employ standard text representation techniques used in computer science, with suitable modiﬁcations for ﬁnancial re-

ports. Consistent with the literature (Sebastiani, 2005), we ﬁrst stem the words in a document to their morphological roots (e.g., running

is stemmed to run) and eliminate common words such as aand the. We then represent the document as a vector of stems using the ‘‘bag of

words” approach. The approach is so named because it uses all the terms (stems) in a document regardless of the order or position of the

terms. Loosely, the set of ‘‘independent variables” in the model is the set of all stemmed words. We can then map a document in n-space,

treating each term as a dimension and using a numerical weight for each stem. This weight is usually a function of frequency of the stem in

the document and in the full collection of documents (Hand, 2001).

Naturally, because the method treats each unique term as a separate

dimension, this step leads to a large term space. Accordingly, the next step is to reduce the term space and generate a smaller vocabulary

(loosely, identify the words that have the greatest ability to distinguish among documents).This step is particularly important in our study

because the term space generated from 10K reports is of extremely high dimension. We employ the document frequency (DF) method to re-

duce the term space. This method ranks words by the number of documents that contain the word and uses a threshold level to reduce the

number of words considered. Yang and Pedesen (1997) shows that DF method produces an overall efﬁciency gain by eliminating less infor-

mative terms and reducing the vocabulary size without sacriﬁcing classiﬁcation accuracy. Finally, we use the term frequency * inverse document

frequency (TF*IDF) method (Singhal et al., 1996), the most commonly used weighting scheme, for estimating the term weights for individual

terms identiﬁed by the DF method. Intuitively, this weighting scheme assumes that the best descriptive terms for a given document are those

that occur very often in the given document (term frequency) but not much in other documents (inverse document frequency) (Salton and Buck-

ley, 1988). Note that the document frequency is calculated in the context of our collection of 10K ﬁling documents. Thus, these words will do

well in separating the considered document from other documents. In this way, we represent each document as a point in the n-dimensional

term space.

Step 2 in our method is to identify the predictive attribute of interest. We focus our analysis on size-adjusted returns because the market

return is the metric of most interest to shareholders, analysts and other users.

This performance measure becomes the nþ1 dimension

associated with a document. In this context, note that predicting a speciﬁc value of a certain performance measure is a harder task than pre-

dicting a category of a performance measure, because a real value prediction is more granular than a category prediction. As an exploratory

study, we start with a coarser approach and classify ﬁrms into three classes relative to their peers: under-performing, average, and out-per-

forming. Each year, we rank the ﬁrms by their actual performance for the next year, and use the 25th and the 75th percentiles to deﬁne the

cutoffs for the three classes.

For step 3 in our method, ideally, we could develop a mapping between a ﬁrm’s disclosure vector (as developed in step 1) and the per-

formance measure (in step 2). The classical statistical approach (which includes studies that examine one or more speciﬁed aspects of the

text) then ﬁnds parameters that ﬁt a speciﬁed model to the data. Our approach differs in that we do not adopt a model or specify the attri-

butes of interest. Rather, akin to a neural net, we let the data-driven text-classiﬁcation algorithm ‘‘learn” the potentially non-linear and

multi-faceted relation between the text attributes and future returns. Essentially, the model seeks to construct an n-dimensional hyper-

plane that best separates the data points as per their categories.

Once we ‘‘train” the model, we apply it to a hold-out sample (in our case to the annual reports for the next year). The output from this

analysis is a prediction for each ﬁrm in the hold-out sample as to its category: out-perform, average or under-perform. We then construct

equally-weighted portfolios based on these predictions. That is, we allocate the same $ amount to two sets of ﬁrms – we buy ﬁrms pre-

dicted to outperform and sell ﬁrms expected to under-perform. The size adjusted return earned by the portfolio is our measure of incre-

mental value and the predictive ability of narrative disclosures.

3.1. Design

For our design, a data point represents the results from a particular measure and year. We draw the training document set and the test

document set from adjacent years. We use documents that report performance for the period t1tot(available at time t) to build the

predictive model.

We then apply the model to the documents reporting results for year t(available at time tþ1) to predict the performance

category for the period tþ1totþ2. (Notice that the standard 10-fold validation in text classiﬁcation as in Witten and Frank (2000) is not

In accounting, Smith and Tafﬂer (2000) and Hussainey et al. (2003) show that counts of keywords are related to bankruptcy and the association between current earnings and

future stock returns.

Unlike two accounting metrics, the return metric impounds other information not reﬂected in the ﬁrm’s ﬁnancial statements because market prices are based on forward

looking information (Kothari, 2001). Thus, the market return is the hardest to predict. On the other side, ﬁrm’s management exercises greater control over accounting data. Even

though there is ongoing debate on whether earnings management is generally opportunistic or strategic (e.g., Arya et al., 1998, 2003), there is broad consensus that ﬁrms employ

discretionary accruals to manage reported income. Such practices add noise to the accounting measures we consider.

The method is ideally suited for binary classiﬁcations. Because we have three classes, we perform three two-way classiﬁcations and combine the predictions to generate an

overall classiﬁcation. See last paragraph of Appendix A for details.

The number of years to consider when building a predictive model is an interesting question. We could use all available data to construct the model, weighting recent years

more. We use a conservative approach and only employ the most recently available information. In essence, our approach assumes that the patterns unearthed in last year’s

annual report would hold for the current year’s annual report, and can help predict performance in the forthcoming year.

792 R. Balakrishnan et al. /European Journal of Operational Research 202 (2010) 789–801

Author's personal copy

sensible in this context as we build a single model for each year in the sample.) Based on the classiﬁcation, we examine if an implementable

trading strategy based on predictions from our model earns a positive size adjusted return. Such a test is interesting because predictive accu-

racy is a relatively coarse performance metric. Further, portfolio returns have an endogenous cost of prediction successes and failures. Finally, a

returns test is the appropriate measure to examine if there is incremental information content in the narrative disclosures relative to the infor-

mation impounded into contemporaneous prices.

We calculate a portfolio return as the average size adjusted return difference between the out-performing ﬁrms and the under-perform-

ing ﬁrms for each year. We report results for a 25–50–25% cut-off for deﬁning the three classes of out-performing, average, and under-per-

forming ﬁrms. (We veriﬁed robustness with a 10–80–10 cut-off.) We calculate a portfolio level return for a buy and hold strategy

(see Fig. 1). Speciﬁcally, consider the model constructed using documents for the year ending 12/31/1999 (data available March 2000)

and calendar year 2000 performance (data available in March 2001). We apply this model to documents available in March 2001, make

predictions, and measure the cumulative size adjusted return for the portfolio from April 1, 2001 to April 1, 2002. We veriﬁed that such

a strategy is implementable in that documents are available before 3/31. Further, because SAR for a random portfolio is zero by construc-

tion, this return is the incremental return relative to constructing a random portfolio.

We perform two analyses to understand the source of any excess return. Our ﬁrst approach checks whether our disclosure score is pick-

ing up known document-level features such as the fog index or risk-sentiment. For each document, we add these features to the term space

and construct a new model, and use the predictions of the augmented model to construct portfolios. If these meta-features are incremen-

tally useful, the prediction of the augmented model should exhibit greater returns relative to the base model.

Our second approach employs cross-sectional regressions. We estimate:

SAR ¼

þb

Dummy þb

Size þb

MTB þb

PM þb

Earning Surprise þb

Size Dummy þb

MTB Dummy þb

PM Dummy

þb

Earnings Surprise Dummy þerror;

where SAR = Size adjusted buy-and-hold return for the year; Dummy = 1 if the ﬁrm is classiﬁed as out-performing and 0 for predicted under-

performing ﬁrms. Average ﬁrms are excluded from this analysis; Size = The size of the ﬁrm, measured as the natural logarithm of total assets;

MTB = Market-to-book ratio (a valuation proxy), using the closing market price as of the start of the holding period; PM = Price momentum,

measured as the SAR for the six months preceding the start of the holding period; Surprise = Actual EPS – Forecast EPS, where the forecast is

the latest available consensus analyst forecast.

Our choice of the regressors stems from studies (e.g., Jegadeesh et al., 2004) that examine the incremental information content of ana-

lyst forecast revisions after controlling for factors known to affect returns. In the above regression, a positive coefﬁcient for b

is consistent

with the narrative disclosures providing incremental value-relevant information to market participants. A non-zero interaction term is

consistent with the narrative disclosure altering the conﬁdence market participants place in the numeric estimates.

4. Data and descriptive statistics

The primary data for our experiments include the ﬁrms’ ﬁnancial data, size-adjusted return, and the ﬁrms’ annual reports. To increase

the homogeneity of ﬁrms in the sample, we restrict the sample to ﬁrms in the manufacturing industry (SIC codes 2000 to 3999), having

December as ﬁscal year ending month. The sample period is from 1997 to 2002 (we include return data for 2003 as well).

We ensure data integrity and accuracy by using the values for gvkey from the COMPUSTAT database, permno from the CRSP database,

and OFTIC from I/B/E/S to identify a unique ﬁrm. We collect a total of 1236 unique ﬁrms’ ﬁnancial data. Each annual report has an accession

Overview of Design

Year

t-1 t t+1

SAR SARtSARt+1

Annual

Report Doct-1 Doct

where:

SARt Size adjusted return cumulated from April 1 of year t to March 31 of year t+1

Doct-1 Annual report for year t-1, usually available in March of year t.

A. For firms in year t-1, build predictive model of year t using firms’ SAR in year t (i.e.

size-adjusted return cumulated from April of year t to March of year t+1) and annual re-

ports for year t-1 which are usually published in March of year t.

B. For firms in year t, apply the predictive model built in step A) to the annual reports for

year t which are published in March year t+1, and predict the class of SAR performance

of these firms in year t+1, i.e. the 3-class of SAR (size-adjusted return cumulated from

April of year t+1 to March of year t+2).

C. On March 31st, year t+1, given a set of predicted out-performing firms and a set of pre-

dicted under-performing firms from step B), we sell the under-performing firms’ stocks

at a total value of (for example) 10 million dollars and buy the out-performing firms’

stocks with a total value of 10 million dollars. In both the buying and selling transactions,

we will allocate equal values of stocks among the firms. On March 31st, year t+2, we will

sell the stocks of the out-performing firms and buy the stocks of the under-performing

firms. If our

rediction was correct, this transaction should

enerate non-ne

ative

rofit.

Fig. 1. Overview of design.

R. Balakrishnan et al. /European Journal of Operational Research 202 (2010) 789–801 793

Author's personal copy

code as its unique identiﬁer. We manually download from Mergent Online the accession codes of the annual reports for each ﬁrm from

1997 to 2002. Then we automatically retrieve from the EdgarScan website the annual reports using the downloaded accession code.

There are 10 different submission types for annual reports: 10K (10K ﬁlings), 10K405 (10K ﬁlings where regulation S-K Item 405 box on

the cover page is checked), 10K405A (amendments to 10K405), 10KA (amendments to 10K ﬁlings), 10KSB (10K ﬁlings for small business),

10KSBA (amendments to 10KSB), 10KSB40 (optional form for small business where regulation S-B Item 405 box on the cover page is

checked), 10KSB40A (amendment to 10KSB40), 10KT (10K transition report), 10KTA (amendment to 10K transition report). We focus on

the major submission types of 10K and 10K405. Our ﬁnal useable documents with matching ﬁnancial performance measure values are

4280 annual reports from 1236 ﬁrms published in years 1997 to 2002.

Using the CRSP database, we calculate size-adjusted cumulative return as the size-adjusted buy-and-hold return cumulated for 12

months from April 1 of the ﬁscal year to the next April. We verify that the relevant documents are available, and that the strategy is

implementable.

4.1. Sample description

Table 1, panel A provides the number of observations considered for each analysis. We begin with 4755 documents but make only 3529

predictions because of missing data and because of the lagging nature of the predictive model. We also trim the top and bottom 1% of

observations on size-adjusted returns and other classiﬁcation variables to reduce the inﬂuence of outlying observations.

We use 3070

observations in sub-sample analysis because we could not collect the classiﬁcation data required for 1997.

Panel B provides descriptive data for our sample, with each observation representing a ﬁrm-year. Over all years, the average ﬁrm has

mean sales of $2749 million and median sales of $336 million indicating the presence of several large ﬁrms in the sample. The average ROE

is 3.48%, while the median is 7.77%. The mean and median values for the market-to-book ratio are 1.95 and 1.18, respectively.

Panel C of Table 1 provides the industry breakdown for sample ﬁrms. We do not ﬁnd any signiﬁcant clustering of industries speciﬁc to

our sample. Tests (not reported) do not reveal any systematic difference between the spread of ﬁrms in our sample and the distribution of

Table 1

Descriptive statistics.

Item Number of ﬁrm-years Notes

Panel A: sample selection

Total number of documents (1997–2002) 4755

Do not have SAR data (295)

Potentially useful in SAR exercise 4460

Truncated extreme observations (180) Trimmed top and bottom 1% of observations

Documents used to develop in SAR prediction model 4280 Statistics presented in Table 1

Loss due to 1-year ahead prediction (751) We do not have predictions for 1997, the ﬁrst year with documents

Documents with portfolio experiment results 3529 Results presented in Tables 2, 3, and Panel A of Table 4

No data for sub-sample classiﬁcation (459)

Available for sub-sample analysis 3070

Trimmed for extreme observations (124) Top and bottom 1% of observations removed for each variable

Net available for sub-sample analysis 2946 Results presented in Panel A of Table 3

Used only extreme quintiles in regressions 1473

Lost due to missing data 406

Used in regression 1007 Results presented in Panel B of Table 4

Item N(ﬁrm-years) Mean Median 25th percentile 75th percentile

Panel B: sample characteristics

Sales (millions) 4099 $2749.66 $336.44 $65.93 $1666.91

Net assets (millions) 4099 $3153.43 $385.9 $95.52 $1735.89

ROE 3073 3.48% 7.77% 10.64% 18.09%

EPS 4255 $0.248 $0.63 $0.18 $1.39

Size adjusted return 4280 2.23% 12.91% 41.11% 18.48%

Market-to-book ratio 4098 1.95 1.18 0.64 2.36

SIC codes Number of ﬁrms in sample

Panel C: industry composition

20–25 99

26 38

27 32

28 276

29–32 59

33 43

34 38

35 171

36 185

37 55

38 199

39 23

Total 1236

In the accounting and ﬁnance literatures, such trimming is standard when dealing with security returns. The average return in the bottom (top) 1% is close to (well over)

100% (+100%) which is not representative of average returns.

794 R. Balakrishnan et al. /European Journal of Operational Research 202 (2010) 789–801

Author's personal copy

all COMPUSTAT ﬁrms from the relevant SIC codes. We also do not ﬁnd systematic differences in key ﬁrm characteristics. Thus, our results

appear generalizable, albeit only to manufacturing ﬁrms.

We use the size-adjusted cumulative return as the key metric of ﬁrms’ ﬁnancial performance. As noted earlier, this metric contains the

market response information that is generally not reﬂected in the ﬁnancial statements.

5. Results

The dependent variable in our analysis is a portfolio size-adjusted return, rebalanced each year. We construct an equally weighted buy-

and-hold portfolio that sells the under-performing ﬁrms and invests in the predicted out-performing ﬁrms. We ensure that we employ an

implementable strategy by verifying that all of the documents were available before April. We calculate annual returns for the prediction

period (April to April). For robustness, we replicate the analysis both for the 25–50–25 partition (reported), and for the 10–80–10 partitions

of the sample for identifying out- and under-performing ﬁrms.

Panel A of Table 2 presents results on the cumulative size-adjusted return by year. For both partitions, we ﬁnd a signiﬁcant return for

every year except for 1998 and 2000 (when we ﬁnd a signiﬁcantly negative portfolio return). One reason for this anomaly might be the

considerable turbulence experienced by ﬁnancial markets during 2000 (see, for example, Barber et al., 2003). On average, we ﬁnd an annual

excess return of 12.16% using the 25–50–25 partition and 6.59% using the 10–80–10 partition for developing the model. These estimates

are consistent with earlier research that hints at the considerable information content of narrative disclosures. These results also suggest

that the market has difﬁculty in immediately parsing the information content of the disclosures, meaning that this information shows up in

the return for the next year.

In panel B, we report the number of ﬁrms classiﬁed as out- and under-performing, by year, for each of our partitions. These data show,

while the predictive model was constructed using a 25–50–25 partition of actual performance, the actual number predicted to out or un-

der-perform is not 25% of the hold-out sample of ﬁrms. For instance, only 608 ﬁrm-years predicted to out-perform when a naı

¨ve expec-

tation is for 882 = 3529 total observations classiﬁed * 0.25. (Using a proportions test, this difference is statistically signiﬁcant.) Thus, as

is intuitive, our predictive model is better able to pick up ‘‘extreme” differences from the average ﬁrm relative to smaller differences.

Panel C of Table 2 presents results for sub-samples of ﬁrms. We investigate two partitions based on market-to-book and on ﬁrm size. For

the ﬁrst set of results, we determined the median market-to-book value for each year. We then partitioned sample ﬁrm-years into the value

Table 2

Average return difference between predicted out-performing and under -performing ﬁrms model based on documents for year tand performance for year t+ 1. Tested on

documents for year t+ 1 and performance for year t+2.

Year With 25–50–25% performance class deﬁnition (%) With 10–80–10 performance class deﬁnition (%)

Panel A: portfolio returns

1998 2.46 4.69

1999 63.68 42.62

2000 36.84 45.73

2001 19.6 19.11

2002 16.82 12.25

Average 12.16 6.59

Year With 25–50–25% performance class deﬁnition With 10–80–10 performance class deﬁnition

Predicted out-performing Predicted under-performing Predicted out-performing Predicted under-performing

Panel B: number of out-performing and under-performing ﬁrms predicted as in panel A

1998 98 126 45 54

1999 110 83 61 41

2000 206 110 78 46

2001 97 170 55 54

2002 97 112 56 37

Total 608 601 295 232

Year Large size (%) Small size (%) Value (%) Glamour (%)

Panel C: portfolio returns (sub-sample analysis)

1999 18.08 88.4 7.76 83.36

2000 18.78 41.08 17.49 46.66

2001 16.61 0 3.68 15.04

2002 11.08 9.58 11.9 13.85

Average 6.75 14.23 1.46 16.39

Notes:

1. Cell entries represent the portfolio level buy-and-hold size-adjusted return for a year beginning April 1 and ending March 31. The portfolio is long on the predicted out-

perform ﬁrms and short on the predicted under-performers.

2. The performance class speciﬁcation relates to the performance cutoffs used to deﬁne classes in the training sample.

Inspection suggests that general market volatility affects participant’s ability to gainfully use narrative disclosures to predict future performance. Systematic analysis of this

inference is hampered because we only have 5 observations. Extending the analysis to more years (and/or using quarterly reports) is one way to obtain enough data to test this

conjecture.

R. Balakrishnan et al. /European Journal of Operational Research 202 (2010) 789–801 795

Author's personal copy

or glamour categories based on their value relative to the median value for the relevant year. We then re-estimated the textual model for

each of the sub-samples separately. We repeated the exercise for size, using total assets as the measure.

Our textual model indicates greater value relevance in the disclosures made by glamour ﬁrms and by small ﬁrms. Portfolios based on the

model predictions have a size-adjusted return of 14.23% on average for small ﬁrms but only 6.75% for large ﬁrms. Similarly, expectations

about future growth drive the valuations of glamour ﬁrms more so than for value ﬁrms. Again, we ﬁnd size adjusted returns of 16.39%

(1.46%) when we form portfolios for glamour (value) ﬁrms. In other words, our ﬁndings show that ﬁrms grouped on readily observable met-

rics such as size and market-to-book ratio follow detectably different text disclosure strategies. (However, our analysis does not speak to the

dimensions in which the disclosures differ, a matter for additional research in this area.)

Table 3

Supplementary analysis.

Firms partitioned by market-to-book, separate

analysis for each group

Firms partitioned by size, separate analysis

for each group

Glamour ﬁrms Value ﬁrms Large ﬁrms Small ﬁrms

Panel A: semantics values by ﬁrm partitions

Mean values of attribute

All ﬁrms

N(ﬁrm years) 3529 1473 1472 1473 1472

Fog index 18.41 18.41 17.97 17.82 18.57

Risk sentiment 28.81 29.23 24.37 27.30 26.20

Tone 0.401 0.404 0.392 0.401 0.391

Firms-predicted to out-perform

N(ﬁrm years) 608 249 129 217 192

Fog index 18.515 18.59 17.49 18.02 18.65

Risk sentiment 31.939 33.76 30.92 37.21 30.92

Tone 0.399 0.406 0.384 0.404 0.394

Firms predicted to under-perform

N(ﬁrm years) 601 141 181 203 167

Fog index 18.740 18.86 17.96 18.30 18.81

Risk sentiment 34.600 37.95 32.41 35.77 34.70

Tone 0.393 0.400 0.383 0.383 0.387

t-Tests of differences

Firms predicted to out-perform versus all ﬁrms

Fog index 2.34

1.06 0.68 0.75 0.02

Risk sentiment 4.65

***

3.01

***

3.32

***

3.31

***

2.84

Tone 1.99 0.34 2.46 0.22 1.42

Firms predicted to under-perform versus all ﬁrms

Fog index 6.19

***

3.44

***

1.02 2.75

1.83

Risk sentiment 7.40

***

3.69

***

4.25

***

5.06

***

4.38

***

Tone 4.25

***

1.19 1.02 2.73

2.65

Predicted out performers versus predicted under-performers

Fog index 2.46

1.95 1.19 1.53 1.29

Risk sentiment 1.57 1.82 0.65 1.30 1.80

Tone 2.78

0.80 0.15 1.47 0.87

Panel B: average return difference between predicted out-performing and under-performing ﬁrms. Model (augmented with three

document-level meta-features) based on documents for year t and performance for year t + 1. Tested on documents for year t + 1 and

performance for year t + 2.

Year With 25–50–25% performance class

deﬁnition (%)

1998 1.85

1999 64.65

2000 37.48

2001 17.01

2002 16.39

Average 11.74

Notes:

1. Variable deﬁnitions are as follows:

Risk Sum of risk term frequency

Tone (Optimism term frequency pessimism term frequency)/(optimism term frequency + pessimism term frequency))

Fog index A measure of readability. Calculated as

0:4word

sentences



þ100 wordswith 2syllables

words



2. Entries in panel A are the raw values. We performed a log transformation when including the three textual feature deﬁnitions in the model.

3. Cell entries in Panel B represent the portfolio level buy-and-hold size-adjusted return for a year beginning April 1 and ending March 31. The portfolio is long on the pre-

dicted out-perform ﬁrms and short on the predicted under-performers.

4. The performance class speciﬁcation relates to the performance cutoffs used to deﬁne classes in the training sample.

p< 0.05.

p< 0.01.

***

p< 0.001.

796 R. Balakrishnan et al. / European Journal of Operational Research 202 (2010) 789–801

Author's personal copy

5.1. Link to meta-features

It is possible that our text classiﬁcation model is merely replicating the previously known association between document-level meta-

features (e.g., clarity, tone) and future performance. Panel A of Table 3 presents descriptive data for the three features we study, both for the

full sample and for the sub-samples we consider. (In this table, each ﬁrm-year is a separate observation.) We report relevant t-statistics at

the bottom of this panel.

The ﬁrst column of this panel reports data relating to the entire sample for which we obtain performance predictions. Relative to the

average ﬁrm, we ﬁnd that ﬁrms predicted to out-perform have a denser text (fog index of 18.515 versus 18.41, t= 2.34), have more words

expressing risk (sentiment 31.93 versus 28.81, t=4.65) but have a similar tone. We ﬁnd the similar pattern for ﬁrms predicted to under-

perform, with even the tone turning more pessimistic. Thus, ﬁrms in the ‘‘tails” of the distribution of predicted performance differ relative to

average ﬁrm. A ﬁrst conclusion is that the attributes that our classiﬁcation models pick up ﬁrms that differ systematically on the meta-fea-

tures of tone, clarity and risk sentiment. However, we have weaker evidence for this conclusion when we compare the feature scores for

Table 4

Sample characteristics and incremental returns model based on documents for year tand performance for year t+ 1 Tested on documents for year t+ 1 and performance for year

t+2.

Sample ﬁrm/year from SAR implementable experiment

Predicted under-perform Predicted average-perform Predicted outperform

Mean (Median) Mean (Median) Mean (Median)

N= 601 N= 2320 N= 608

Panel A: sample characteristics

Assets ($ million) $1736.93 $3625.59 $3715.88

(170.88) (488.60) (482.53)

Sales ($ million) $1389.62 $3136.83 $3125.95

(91.72) (473.51) (376.17)

EPS $0.305 $0.746 $0.699

(0.231) (0.74) (0.72)

Market-to-book 1.636 1.718 3.124

(1.042) (1.012) (1.953)

Leverage 0.441 0.493 0.447

(0.396) (0.505) (0.445)

Earnings Surprise 0.153 0.249 0.099

(0.175) (0.1) (0)

Price Momentum 0.332 0.017 0.696

(0.39) (0.067) (0.315)

Size adjusted return (annual) 0.057 0.009 0.019

(0.173) (0.097) (0.126)

SAR ¼

þb

Dummy þb

Size þb

MTB þb

PM þb

Earning Surprise þb

Size Dummy

þb

MTB Dummy þb

PM Dummy þb

Earning Surprise Dummy þerror

Item Regression model 1 Regression model 2

Estimate t-Value Estimate t-Value

Panel B: incremental information content

Intercept 0.068 0.87 0.042 0.36

Dummy for model prediction 0.054 1.08 0.043 0.26

Log (total assets) 0.014 1.17 0.011 0.56

Log (market-to-book) 0.003 0.12 0.027 0.75

Earnings surprise 0.001 0.36 0.004 0.68

Price momentum 0.083 3.11

***

0.035 1.23

Dummy log (total assets) 0.002 0.08

Dummy log (market-to-book) 0.071 1.65

Dummy earning surprise 0.011 1.39

Dummy price momentum 0.291 4.00

***

N1007 1007

Adjusted R-square 0.009 0.023

F-value 2.71

3.54

***

Notes:

1. Variable deﬁnitions are as follows:

SAR = Size adjusted buy-and-hold return for the year.

Dummy = 1 if the ﬁrm is classiﬁed as out-performing and 0 for predicted under-performing ﬁrms. Average ﬁrms are excluded from this analysis.

Size = The size of the ﬁrm, measured as the natural logarithm of total assets.

MTB = Market-to-book ratio (a valuation proxy), using the closing market price as of the start of the holding period.

PM = Price momentum, measured as the SAR for the six months preceding the start of the holding period.

Surprise = Actual EPS Forecast EPS, where the forecast is the latest available consensus analyst forecast.

2. Test statistics employ cluster-adjusted standard errors to control for multiple observations from the same ﬁrm.

p< 0.01.

***

p< 0.001.

R. Balakrishnan et al. / European Journal of Operational Research 202 (2010) 789–801 797

Author's personal copy

ﬁrms predicted to out-perform with those predicted to under-perform. We ﬁnd that predicted under-performers have a marginally less

readable document and have slightly greater pessimism. The two groups express similar risk proﬁles. These comparisons suggest that text

disclosures do have information content relating to performance, and that meta-features can help us identify the extremes.

The comparisons in the columns highlight that we cannot simply use the meta-features to replace the model. This inability arises be-

cause meta-features are of less use in distinguishing the direction of the performance differential, the key attribute of interest. Data in the

next four columns (for the sub-samples of glamour, value, large and small ﬁrms) provide additional evidence that support our inference.

For all four sub-samples, the predicted under- and out-performing ﬁrms have greater scores for risk sentiment relative to the average ﬁrm.

However, we do not ﬁnd differences between the sets of ﬁrms predicted to out- and under-perform, for every sub-sample and for every

measure. We conclude that while meta-features are picking up differences in style and tone that are systematically related to performance

differentials predicted by our model, they seem unable to distinguish the nature of the performance differential.

For an additional test of whether our predictive model captures more than the meta-features, we reﬁt the predictive model including

the meta-features in the term space. As shown in Panel B of Table 3, we obtain a similar (11.74%) return from the augmented model. More

importantly, we also ﬁt a model using only the meta-features. Such a model should, in theory, produce the same predictive ability, if the

meta-features contained all of the information in the documents. However, we ﬁnd that such a parsimonious representation of a document

(as 3 meta-features of clarity, tone and risk sentiment) has no explanatory power at all (results not tabled). Overall, we conclude that our

text classiﬁcation model is capturing features not picked up by the selected meta-features.

5.2. Incremental information content

Panel A of Table 4 returns to the full sample analysis. This table provides descriptive data on the ﬁrms predicted in the three classes, for

the 25–50–25 classiﬁcation. Relative to the average ﬁrm in the out-perform sample, the ﬁrms in the under-perform sample are reliably

smaller, have lower market-to-book ratios, and are less proﬁtable (as measured by EPS). This distinction provides additional evidence about

the information content of disclosures because the classiﬁcation does not use any numeric item. The text in the annual reports is enough to

identify a distinct sample of under- and out-performing ﬁrms.

Panel B of Table 4 provides results that speak to the relation between the information in the narratives and the information in quan-

titative disclosures. In particular, it is of interest to know whether the information in the narrative disclosure is subsumed by or is incre-

mental to the information in the quantitative disclosure.

The ﬁrst column reports results for a model that considers main effects only. We ﬁnd that the coefﬁcient for the model prediction (for

‘‘dummy”) is not reliably different than zero. Thus, the text disclosure does not appear to provide value-relevant information incremental

to that provided by known factors but rather captures known features. We ﬁnd that large ﬁrms earn smaller returns (Reinganum, 1981),

and that a high market-to-book ratio presages lower returns as well (Fama and French, 1992). However, once we account for all other fac-

tors, our data do not show the expected relation between price momentum and excess returns (Jegadeesh and Titman, 1993). Interestingly,

we note that the univariate comparison in Panel A is signiﬁcant at the 5% level (the price momentum is 0.019 for predicted out-performers

versus 0.332 for the predicted under-performing ﬁrms). The regression estimate, however, indicates that the incremental effect (after

accounting for other factors such as market-to-book, size and earnings momentum) is negative.

The second column in this panel reports results for a complete model that includes interaction terms for the model prediction (a binary

variable) with established causes. We continue to ﬁnd an insigniﬁcant main effect for our model’s prediction. However, as indicated by the

signiﬁcant interaction terms, the disclosure score is weakly informative as to whether the glamour/value partition will continue to yield

excess returns for the next period. Moreover, the interaction with price momentum is signiﬁcant. That is, the disclosure score indicates that

the effect of price momentum for ﬁrms predicted to out-perform is reliably smaller than for the average ﬁrm. Thus, while Jegadeesh and

Titman (1993) shows abnormal returns to buying winners and selling losers, our results suggest the possibilities of ﬁner partitions.

Thus,

one interpretation of our results is that narrative disclosures can help identify ﬁrms with negative price momentum that reverses over the next

year. Stated differently, narrative disclosures could help identify if the price momentum will sustain into the next period or will reverse.

6. Conclusions

This paper is part of a nascent literature that explores the narrative disclosures made by ﬁrms and complements established literature

that considers the ability of numeric data to predict market performance (e.g., the accrual or the post earnings announcement drift anom-

alies). Most of the prior studies that studied textual disclosures have largely relied on expert classiﬁcation thereby limiting sample sizes

and the kinds of questions that could be asked. This study demonstrates a methodology for large-scale text mining of the narrative disclo-

sures in annual reports. Even a relatively simple model, when applied to the narrative data alone, successfully predicts future accounting

and market performance.

There are several limitations of our approach. Our methodology only allows for limited economic insight into what characteristics of the

disclosure lead to certain predictions (see e.g., Li, 2006). We also employ a simple ‘bag of words’ approach, without paying attention to the

context of the usage of speciﬁc words. Further, we limit ourselves to the disclosures in the annual report and thus restrict the information

that market participants would employ. However, these limitations can be addressed using some of the emerging techniques in text mining

(see, e.g., Pant and Srinivasan, 2005).

We could expand this study along several dimensions. The ﬁrst is the use of text mining models that consider attributes such as tone,

phrasing and so on. The second avenue is to augment the disclosures with additional disclosures such as press releases. We also could over-

weight economic predictions such as the news releases from the Federal Reserve, and sector-speciﬁc forecasts by trade associations. A third

We note that, in a changes analysis, Li (2006) shows the predictive ability of the fog index (we replicate this ﬁnding as well). The other studies do not focus on market

performance in an implementable way. We also note that, considering all ﬁrms, the group of glamour ﬁrms has greater risk sentiment and a more optimistic tone (p<0:001 for

both comparisons) relative to the scores recorded for value ﬁrms. We also ﬁnd that larger ﬁrms tend to have greater readability but their count of risk-related words is also higher.

These results hold even if we discard the data for the year 2000 in the analysis. We also note that portfolios formed on price momentum generate abnormal negative returns

after an initial holding period (Jegadeesh and Titman, 1993).

798 R. Balakrishnan et al. / European Journal of Operational Research 202 (2010) 789–801

Author's personal copy

avenue is to identify the nature of the differences in disclosures by large versus small ﬁrms, and glamour versus value ﬁrms as our results

demonstrate that these sub-sets follow differing strategies. We also could study extreme observations (e.g., high positive score but high

negative return) to identify features that diminish the informativeness of text disclosures. Finally, it is of interest to examine the time it

takes market participants to impound textual information. While we have focused on annual returns, we conjecture that studies that exam-

ined shorter-time frames might provide sharper differences while the additional economic noise would wipe out the effect if we considered

longer time frames. However, the relation is not likely monotonic because quantitative data likely dominates the returns for very short

(intra-day or a few days) return intervals.

Acknowledgements

We thank Mort Pincus, Cristi Gleason, Paul Hribar, the editor, two anonymous reviewers, and workshop participants at the University of

Iowa and Christopher Newport University for helpful comments. Xin Ying Qiu also acknowledges contributions from members of her thesis

committee.

Appendix A. Building text classiﬁers for prediction

Text classiﬁcation is a core activity in information science. The goal is to assign each text to one (or more) of a given set of categories. As

an example, we may be interested in classifying a news article using the categories of sports, health, famous persons, entertainment, gar-

dening, real estate or ﬁnance. It could be that the article belongs to sports or it could be that the article belongs to both the famous persons

and sports categories. Trained individuals may perform text classiﬁcation manually. Alternately classiﬁcation may be accomplished using

computational tools called text classiﬁers.

The design and evaluation of algorithms for automatic text classiﬁcation has been the basis of a highly active ﬁeld of research for several

decades. The ﬁeld is now mature to the point that we are seeing text classiﬁers used to decide not only conceptual categories (as in the

above example) but also to capture more subtle human phenomenon such as sentiment; classiﬁers are being used to identify sentences

that are speculative (versus presenting ideas with conﬁdence), to identify sentence tone as positive, negative or neutral and so on. Devel-

opments in these more subtle realms in part motivate our current research on text classiﬁers to predict market performance.

The automatic methods employed in text classiﬁcation derive predominantly from research in machine learning, a subﬁeld of artiﬁcial

intelligence. Major examples of these derivations include text classiﬁcation algorithms based on support vector machine (as in this paper),

neural nets, decision trees and association rules. Of these the Support Vector Machine (SVM) based algorithms are at least amongst the most

effective algorithmic methods Sebastianip. 49, 2002.

A given classiﬁcation problem generally (but not always) starts with some training data that has been classiﬁed by some reliable mech-

anism, such as an expert, into one of two classes. Alternatively, we can use a known outcome such as next period’s return to classify the

text. An SVM represents each example in the training data as a vector in an n-dimensional space and proceeds to ﬁnd an n1 dimensional

hyperplane that separates the two classes. This strategy produces a linear classiﬁer. Here, the parameter ‘‘n” represents the number of fea-

tures considered. Thus, in text classiﬁcation problems, ncan be fairly large consisting of every nontrivial word in the collection of texts

being classiﬁed. Because many candidate hyperplanes are likely to exist, SVMs are additionally designed to achieve the best or maximum

separation (also called margin) between the two classes of the training data. That is, the nearest distance between a point in one separated

hyperplane and a point in the other separated hyperplane is maximized. The ‘‘trained” classiﬁer may then be applied to new data, classi-

fying each new text into one of the two classes.

Several key extensions have been made to the basic linear SVM. For instance, when a clean separation between the two classes of points

is not possible, soft margins allow for some amount of classiﬁcation error through the use of slack variables. SVM then aims at maximizing

margin while minimizing error. In addition, researchers often employ one of several functions to transform the initial n-dimensional space.

The classiﬁer then looks for a separating hyperplane in this transformed space, a hyperplane that may be non-linear in the original space.

This strategy may be useful in cases where linear classiﬁers are not sufﬁciently effective. Several such ‘‘kernel functions” to transform the

initial space are available in implementations of SVM tools, including polynomial and Sigmoid functions. In this paper we use the base lin-

ear SVM classiﬁer.

SVMs are designed mainly for solving binary or two-class classiﬁcation problems. Since our research problem is to classify documents

into three classes, we consider some options to extend SVMs to multi-class problems. We perform one-against-rest classiﬁcation for each

class, and combine the results to make a ﬁnal decision. The computing time for this option is linear in n(the number of classes). That is we

produce a total of three binary (one-against-rest) SVM models. We use the highest predictive scores generated by the three models to as-

sign a class label to the document.

A.1. Document representation

In information retrieval and text-classiﬁcation research, the most common approach to encode (or represent) a text document is to

model a document as a vector of weighted of terms. There are generally three aspects to consider when constructing such a document

model:

(1) What are the terms in the vector? Are they all the words from the document set, or phrases, or some transformation of the words or

phrases?

(2) How many terms do we need to construct the document representation? Do we use all the deﬁned terms, or a subset of the terms?

And, if we want only a subset of the terms, how do we select this subset and why?

We build our classiﬁers using the SVM-Light implementation of Support Vector Machines with default parameter settings and linear kernel function. (See http://

svmlight.joachims.org/.)

R. Balakrishnan et al. / European Journal of Operational Research 202 (2010) 789–801 799

Author's personal copy

(3) How do we construct a weighting scheme for the terms in the document vector, to best indicate the terms’ relative informativeness

and importance with respect to representing the document?

In addressing the ﬁrst question of deﬁning the terms to represent document, the most widely used ‘‘bag of words” approach starts with

the complete vocabulary in the training corpus (the set of words used as ‘‘independent variables” in the model). Functional or connective

words, such as ‘‘a, hence, and, the,” are considered as stop words and are generally removed since they are assumed to have no information

content. Stemming (e.g., connecting or connected is the same as connect) is sometimes performed to remove the sufﬁxes and to map words

to their morphological roots. Researchers have explored other more complex textual representations (e.g. Peng and Schuurmans, 2003;

Dumais et al., 1998; Apte et al., 1994). While each method has its strengths and weaknesses, more complex deﬁnitions have not been

shown to be superior to the basic ‘‘bag of words” approach in solving classiﬁcation problems. In this study, we use the stemmed words

of the document corpus to construct the document vector representation.

Since the term space generated from our 10K report collection is of extremely high dimension, we will need to reduce the term space

and generate smaller vocabularies. The beneﬁts of such a reduced term space include better generalization ability of the model, saving of

computing time, and possibly better interpretation and understanding of the predictive features. Most term selection methods either com-

pute statistical feature scores to select high-scoring terms or apply simpler feature selection algorithms from machine learning research

(e.g. Yang and Liu, 1999; Larkey, 1998; Yang and Pedesen, 1997).

We use the document frequency (DF) threshold method for reducing the term space. Relative to other methods, this method (which

employs a count of the number of 10K ﬁling documents in our collection that use a given word) shows an overall efﬁciency in eliminating

less informative terms and reducing the vocabulary size without sacriﬁcing classiﬁcation accuracy.

Researchers have used many ways to calculate term weights in document vectors. The term frequency * inverse document frequency or

TF*IDF is the most commonly used weighting scheme for estimating the usefulness of a given term as a descriptor of a document. Its inter-

pretation is that the best descriptive terms of a given document are those that occur very often in the given document (high term frequency

or TF) but not much in the other documents (IDF). In our previous study, we explored several constructions of TF*IDF weights. The best

performer is the atn weight formulated as:

atn ¼0:5þ0:5tf

max tf



ln N



;

where tf is raw term frequency; max tf is the maximum term frequency for term in the document collection; Nis the total number of doc-

uments in the collection; nis the number of documents containing the given term i. Therefore, we report results only using atn as our weight-

ing scheme for the terms in the document vector.

References

Apte, C., Damerau, F.J., Weiss, S.M., 1994. Automated learning of decision rules for text categorization. ACM Transaction on Information Systems 12 (3), 233–251.

Arya, A., Glover, J., Sunder, S., 1998. Earnings management and the revelation principle. Review of Accounting Studies, 7–34.

Arya, A., Glover, J., Sunder, S., 2003. Are unmanaged earnings always better for shareholders? Accounting Horizons 17, 111–116.

Association for Investment Management and Research (AIMR), 2000. AIMR Corporate Disclosure Survey: A Report to AIMR. Fleishman-Hillard Research, St. Louis, MO.

Asquith, P., Mikhail, M., Au, A., 2006. Information content of equity analyst’s reports. Journal of Financial Economics 75, 245–282.

Ball, R., Brown, P., 1968. An empirical evaluation of accounting income numbers. Journal of Accounting Research 6 (2), 159–178.

Barber, B., Lehavy, R., McNichols, M., Trueman, B., 2003. Reassessing the returns to analysts’ stock recommendations’. Financial Analysts Journal 59 (2), 16–18.

Barron, O., Kile, C., O’Keefe, T., 1999. MD&A quality as measured by the SEC and analysts’ earnings forecasts. Contemporary Accounting Research 16 (Spring), 75–109.

Botosan, C., 1997. Disclosure level and the cost of equity capital. The Accounting Review 72, 323–349.

Botosan, C., Plumlee, M., 2000. Disclosure level and expected cost of equity capital: An examination of analysts’ rankings of corporate disclosures and alternative methods for

estimating the cost of capital. Working paper, The University of Utah.

Bryan, S.H., 1997. Incremental information content of required disclosures contained in management discussion and analysis. The Accounting Review 72 (2), 285–301.

Clarkson, P., Kao, J., Richardson, G., 1999. Evidence that management discussion and analysis (MD&A) is a part of a ﬁrm’s overall disclosure package. Contemporary Accounting

Research 61, 111–134.

Core, J.E., 2001. Firm’s disclosure and their cost of capital: A discussion of a review of the empirical disclosure literature. Journal of Accounting and Economics 31, 441–456.

Dave, D., Lawrence, S., Pennock, D.M., 2003. Mining the peanut gallery: Opinion extraction and semantic classiﬁcation of product reviews. In: Proceedings of the 12th

International World Wide Web Conference, (WWW 2003), ACM, pp. 519–528.

Davis, A., Piger, J., Sedor, L., 2006. Beyond the numbers: An analysis of optimistic and pessimistic language in earnings press releases. Working paper, Washington University

in St. Louis.

Dumais, S.T., Platt, J., Heckerman, D., Sahami, M., 1998. Inductive learning algorithms and representations for text categorization. In: Proceedings of CIKM-98, Seventh ACM

International Conference on Information and Knowledge Management, pp. 148–155.

Fama, E., French, K., 1992. The cross section of expected stock returns. Journal of Finance 47, 427–465.

Fields, T., Lys, T., Vincent, L., 2001. Empirical research in accounting choice. Journal of Accounting and Economics 31 (1–3).

Firtel, K., 1999. Plain English: A reappraisal of the intended audience of disclosure under the securities at of 1933. Southern California Law Review 72, 851–889.

Hand, D., Mannila, H., Smyth, P., 2001. Principles of Data Mining. MIT Press, Cambridge, MA.

Healy, P., Palepu, K.G., 2001. Information asymmetry, corporate disclosure, and the capital markets: A review of the empirical disclosure literature. Journal of Accounting and

Economics 31 (1–3), 405–440.

Henry, E., 2006a. Market reaction to verbal components of earnings press releases: Event study using a predictive algorithm. Journal of Emerging Technologies in Accounting 3

(1–19).

Henry, E., 2006b. Are Investors inﬂuenced by how earnings releases are written?. Working paper, University of Miami.

Hussainey, K., Schleicher, T., Walker, M., 2003. Undertaking large-scale disclosure studies when AIMR-FAF ratings are not available: The case for prices leading earnings.

Accounting and Business Research 33 (4), 275–294.

Jegadeesh, N., Kim, J., Krische, S.D., Lee, C.M.C., 2004. Analyzing the analysts: When do recommendations add value? The Journal of Finance LIX (3), 1083–1124.

Jegadeesh, N., Titman, S., 1993. Returns to buying winners and selling losers: Implications for stock market efﬁciency. Journal of Finance 48, 65–91.

Kothari, S.P., 2001. Capital markets research in accounting. Journal of Accounting and Economics 31 (1–3).

Kohut, G., Segars, A., 1992. The president’s letter to stockholders: An examination of corporate communication strategy. Journal of Business Communication 29 (1), 7–21.

Lang, M., Lundholm, R., 2000. Voluntary disclosure during equity offerings: Reducing information asymmetry or hyping the stock? Contemporary Accounting Research 17,

623–662.

Larkey, L.S., 1998. Automatic essay grading using text categorization techniques. In: Proceedings of ICML-98, 12th International Conference on Machine Learning, pp. 90–95.

Li, F., 2006. Annual Report readability, current earnings and earnings persistence. Working paper, University of Michigan, Ann Arbor.

Mahinovs, A., Tiwari, A., 2007. Text Classiﬁcation Method Review. In: Roy, R., Baxter, D. (Eds.), Decision Engineering Report Series. Miemo. Cranﬁeld University, UK.

Pant, G., Srinivasan, P., 2005. Learning to crawl: Comparing classiﬁer schemes. ACM Transactions on Information Systems 23 (4), 430–462.

800 R. Balakrishnan et al. / European Journal of Operational Research 202 (2010) 789–801

Author's personal copy

Peng, F., Schuurmans Dales, 2003. Combining naive Bayes and n-gram language models for text categorization. In: Proceedings of the 25th European Conference on

Information Retrieval Research (ECIR03).

Pérez-Sancho, C., Iñesta, J.M., Calera-Rubio, J., 2005. A text categorization approach for music style recognition: Pattern recognition and image analysis. Lecture Notes in

Computer Science 3523, 649–657.

Popa, S., Zeitouni, K., Gardarin, G., Nakache, D., Métai, E., 2007. Text categorization for multi-label documents and many categories. In: Proceedings of the 12th IEEE

International Symposium on Computer-Based Medical Systems (CBMS’07), IEEE Computer Society, Washington, DC, pp. 421–426.

Reinganum, M., 1981. Misspeciﬁcation of the capital asset pricing: Empirical anomalies based on earnings’ yield and market values. Journal of Financial Economics 9, 19–46.

Rogers, K., Grant, J., 1997. Content analysis of information cited in reports of sell-side ﬁnancial analysts. Journal of Financial Statement Analysis 3, 17–30.

Salton, G., Buckley, C., 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5), 513–523.

Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys 34 (1), 1–47.

Sebastiani, F., 2005. Text categorization. In: Zanasi, Alessandro (Ed.), Text Mining and its Applications to Intelligence. CRM and Knowledge Management, WIT Press,

Southampton, UK, pp. 109–129.

Singhal, A., Buckley, C., Mitra, M., 1996. Pivoted document length normalization. In: Proceedings of the 1996 ACM SIGIR Conference on Research and Development in

Information Retrieval, pp. 21–29.

Smith, M., Tafﬂer, R.J., 2000. The chairman’s statement: A content analysis of discretionary narrative disclosures. Accounting Auditing & Accountability Journal 13 (5), 624–

646.

Subramanian, R., Insley, R.G., Blackwell, R.D., 1993. Performance and readability: A comparison of annual reports of proﬁtable and unproﬁtable corporations. Journal of

Business Communication 30, 50–61.

Tetlock, P., Saar-Tsechansky, M., Macsakssy, S. 2006. More than words: Quantifying language to measure ﬁrm’s fundamentals. Working paper, University of Texas at Austin.

Thompson, P., 2001. Automatic categorization of case law. In: Proceedings of the 8th international conference on Artiﬁcial intelligence and law, ACM, pp. 70–71.

Witten, I., Frank, E., 2000. Data Mining. Morgan Kaufmann Publishers, San Francisco.

Yang, Y., Liu, X., 1999. A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in

Information Retrieval, pp. 42–49.

Yang, Y., Pedesen, J.O., 1997. A comparative study in feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp.

412–420.

R. Balakrishnan et al. / European Journal of Operational Research 202 (2010) 789–801 801

Closing the Affordable Housing Gap: Identifying the Barriers Hindering the Sustainable Design and Construction of Affordable Homes

Article

Full-text available

May 2023

Alasdair Reid

Despite the commitment of the United Nations (UN) to provide everyone with equal access to basic services, the construction sector still fails to reach the production capacity and quality standards which are needed to meet the fast-growing demand for affordable homes. Whilst innovation measures are urgently needed to address the existing inefficiencies, the identification and development of the most appropriate solutions require a comprehensive understanding of the barriers obstructing the design and construction phase of affordable housing. To identify such barriers, an exploratory data mining analysis was conducted in which agglomerative hierarchical clustering made it possible to gather latent knowledge from 3566 text-based research outputs sourced from the Web of Science and Scopus. The analysis captured 83 supply-side barriers which impact the efficiency of the value chain for affordable housing provision. Of these barriers, 18 affected the design and construction phase, and after grouping them by thematic area, seven key matters of concern were identified: (1) design (not) for all, (2) homogeneity of provision, (3) unhealthy living environment, (4) inadequate construction project management, (5) environmental unsustainability, (6) placemaking, and (7) inadequate technical knowledge and skillsets. The insights which resulted from the analysis were seen to support evidence-informed decision making across the affordable housing sector. The findings suggest that fixing the inefficiencies of the affordable housing provision system will require UN Member States to accelerate the transition towards a fully sustainable design and construction process. This transition should prioritize a more inclusive and socially sensitive approach to the design and construction of affordable homes, capitalizing on the benefits of greater user involvement. In addition, transformative actions which seek to deliver more resource-efficient and environmentally friendly homes should be promoted, as well as new investments in the training and upskilling of construction professionals.

Short-term prediction of bank deposit flows: do textual features matter?

Article

Full-text available

May 2024
ANN OPER RES

Motivated by the successful usage of machine learning around computer science and its wide acceptance from the finance literature, we utilize monthly data spanning the period 2008–2018 for the Euro area peripheral countries, in order to embark on a two-fold mission. First, to construct short-term prediction models for bank deposit flows in the Euro area peripheral countries, employing machine learning techniques. Second, to examine whether textual features enhance the predictive ability of our models. From the variety of models tested, we find that Random Forest models including both textual features and macroeconomic variables outperform models including only macro factors or textual features. Monetary policy authorities or macroprudential regulators could adopt our approach to timely predict potential excessive bank deposit outflows and assess the resilience of the whole banking sector in the Euro area peripheral countries.

Corporate social responsibility disclosure prediction using LSTM neural network

Article

Full-text available

Nov 2023
COMPUTING

Corporate social responsibility (CSR) has gained a great deal of interest in recent years due to the need for information that can help many stakeholders (e.g., governments, investors, professional organizations, researchers, etc.) understand companies’ contributions to the environment and society. CSR disclosure (CSRD) is now the key source of such information when analyzing, for example, an institution’s future performance. In the current body of CSRD literature, the majority of quantitative CSRD studies have relied on traditional statistical approaches for the correlation analysis of CSRD influencing factors. In this paper, we intend to quantitatively analyze firms’ characteristics related to CSRD in Saudi Arabia, understand CSRD and its influencing factors, and predict CSRD patterns. This study lays the groundwork to help companies make informed decisions. It also helps many other stakeholders better understand CSRD’s impacts. To achieve this, we propose a deep learning framework based on long short-term memory (LSTM) for identifying and predicting CSRD patterns. Moreover, a correlation-based technique is also used to visualize the relationships between variables and identify the significant features. The dataset used in this study was collected from annual reports, CSR reports, and firms’ websites between 2015 and 2018. It contains a variety of variables to explain the CSR behaviour of 117 companies. The proposed framework is evaluated with several approaches, including logistic regression (LR), K-nearest neighbours (KNN), support vector machines (SVM), random forests (RF), and decision trees (DT). Compared to other machine learning models, experiment results show that LSTM achieved acceptable results with the highest accuracy of $88_\%$.

Incorporating media news to predict financial distress: Case study on Chinese listed companies

Article

Feb 2024
J FORECASTING

Financial distress prediction has been a prominent research field for several decades. Accurate prediction of financial distress not only helps to safeguard the interests of investors but also improves the ability of managers to manage financial risks. Prior studies predominantly rely on accounting metrics derived from financial statements to predict financial distress. Our research takes a step further by incorporating media news to enhance the accuracy of financial distress prediction. Based on the data from Chinese listed companies, seven classifiers are established to verify the additional value of media news in improving the financial distress prediction performance of models. Experimental results demonstrate that the inclusion of media news in predictive models is effective as it contributes to better performance compared with models that solely rely on accounting features. Moreover, random forest model is a reliable tool in financial distress prediction due to its superior ability to capture complex feature relationships. Evaluation indicators, statistical tests, and Bayesian A/B tests further confirm that the inclusion of media news can significantly improve the identification of financially distressed companies.

The effect of the Level of forward-looking information disclosure on the forward-looking profitability of the firms

Article

Full-text available

Feb 2024

The aim of this study the effect The Impact of The effect of the Level of forward looking information disclosure on the forward-looking profitability in firms Listed on Tehran Stock Exchange. Statistical population of the present study is consisted of firms listed on Tehran Stock Exchange during the time period of 2017 to 2021 and sample volume is equal to 112 firms by using screening method. The present study is an applied study in terms of goal and in terms of method, it is a descriptive - correlation study. On the other hand, it is based on panel data analysis as well. In this study, in which panel data with fixed effects were used, results obtained from firm data analysis by using multivariate regression test at 95% confidence level indicated that there is not a direct and significant Level of forward looking information disclosure on the forward-looking profitability.

Machine learning in bank merger prediction: A text-based approach

Article

Jan 2024
EUR J OPER RES

This paper investigates the role of textual information in a U.S. bank merger prediction task. Our intuition behind this approach is that text could reduce bank opacity and allow us to understand better the strategic options of banking firms. We retrieve textual information from bank annual reports using a sample of 9,207 U.S. bank-year observations during the period 1994-2016. To predict bidders and targets, we use textual information along with financial variables as inputs to several machine learning models. We find that when we jointly use textual information and financial variables as inputs, the performance of our models is substantially improved compared to models using a single type of input. Furthermore, we find that the performance improvement due to the inclusion of text is more noticeable in predicting future bidders, a task which is less explored in the relevant literature. Therefore, our findings highlight the importance of textual information in a bank merger prediction task.

A Deep Review on Analytics of Big Data in Deployment and Effect on Organizational Decision-Making

Conference Paper

May 2023

Natural language processing and multimedia applications in finance

Chapter

Jun 2023

Early warning of corporate financial crisis based on sentiment analysis and AutoML

Conference Paper

May 2023

Using Text Mining in Financial Reporting: To Predict the Companies' Corporate Governance Qualifications

Chapter

Full-text available

May 2023

Empirical Research on Accounting Choice

Article

Jan 2001

Annual Report Readability, Current Earnings, and Earnings Persistence

Article

Jan 2006

Feng Li

The cross-section of expected stock returns

Article

Jan 1992

Incremental information content of required disclosures contained in management discussion and analysis

Article

Apr 1997

S.H. Bryan

This paper analyzes seven mandated disclosures contained in Management Discussion and Analysis (MD&A) to assess their information content. Generally, the results show that certain MD&A disclosures, particularly the discussions of future operations and planned capital expenditures, are associated with future (short-term) performance measures and investment decisions, after controlling for information contained in financial-statement-based ratios. However, the associations with longer-term results are generally not significant. The study illustrates that, in conjunction with the financial statements, the MD&A disclosures, especially prospective disclosures, can assist in assessing firms' future (short-term) prospects.

Text categorization

Chapter

May 2005

Fabrizio Sebastiani

Content analysis of information cited in reports of sell-side financial analysts

Article

Jan 1997

Misspecification of Capital Asset Pricing: Empirical Anomalies Based on Earnings Yields and Market Values

Article

Jan 1981
J FINANC ECON

M.C. Reinganum

Plain English: A reappraisal of the intended audience of disclosure under the securities act of 1933

Article

Jan 1999

K.B. Firtel

The Cross-Section of Expected Stock Returns

Article

Jan 1992

ABSTRACT Two easily measured variables, size and book-to-market equity, combine to capture the cross-sectional variation in average stock returns associated with market {3, size, leverage, book-to-market equity, and earnings-price ratios. Moreover, when the tests allow for variation in {3 that is unrelated to size, the relation between market {3 and average return is flat, even when {3 is the only explanatory variable. THE ASSET-PRICING MODEL OF Sharpe (1964), Lintner (1965), and Black (1972)

Performance and Readability: A Comparison of Annual Reports of Profitable and Unprofitable Corporations

Article

Jan 1993
J Bus Comm

This study tested the relationship between performance and the readability of annual reports. Style analysis of 60 annual reports using a computer style analyzer revealed that the annual reports of good performers were easier to read than those of poor per formers. Good performers used strong writing in their annual reports unlike poor performers, but did not use significantly more jargon or modifiers.

On the predictive ability of narrative disclosures in annual reports

Abstract and Figures

Recommended publications

Breaking barriers in brain research

Building on a century of hydroscience research

Iowa dives into the future of water research

Leading writing faculty make the difference at Iowa

On building predictive models with company annual reports

Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language

Automatic Assessment of Information Disclosure Quality in Chinese Annual Reports

Towards Building Ranking Models with Annual Reports.