ArticlePDF Available

Handwriting Classification Based on Support Vector Machine with Cross Validation

Authors:

Abstract and Figures

Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussian radial basis function kernel are choosen to determine pupils who have difficulties in writing. The 10-fold cross-validation method for training and validating is introduced. The aim of this paper is to compare the performance of support vector machine with RBF and polynomial kernel used for classifying pupils with or without handwriting difficulties. Experimental results showed that the performance of SVM with RBF kernel is better than the one with polynomial kernel.
Content may be subject to copyright.
Engineering, 2013, 5, 84-87
doi:10.4236/eng.2013.55B017 Published Online May 2013 (http://www.scirp.org/journal/eng)
Handwriting Classification Based on Support Vector
Machine with Cross Validation
Anith Adibah Hasseim, Rubita Sudirman, Puspa Inayat Khalid
Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
Email: rubita@fke.utm.my
Received 2013
ABSTRACT
Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the
basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussian radial basis func-
tion kernel are choosen to determine pupils who have difficulties in writing. The 10-fold cross-validation method for
training and validating is introduced. The aim of this paper is to compare the performance of support vector machine
with RBF and polynomial kernel used for classifying pupils with or without handwriting difficulties. Experimental re-
sults showed that the performance of SVM with RBF kernel is better than the one with polynomial kernel.
Keywords: Support Vector Machine; Handwriting Difficulties; Cross-Validation
1. Introduction
The field of handwriting has been of interest from a vari-
ety of aspects; its entity, indications and aesthetic. In the
beginning, the development of handwriting and the fac-
tors that affect handwriting performance were investi-
gated [1,2], but later whole words were addressed. Most
of the systems reported in the literature until today in-
volved screening measures in identifying pupils who are
at risk of handwriting difficulties and also addressed the
absence of an appropriate tool for monitoring beginning
handwriting development. More importantly, automated
handwriting analysis has been given more attention in the
hunt for quantitative features and key indicators in
monitoring beginning handwriting skill development.
Such automated handwriting analysis include recogniz-
ing the writer (e.g. [3]), the text written (e.g. [4]), move-
ment and procedure (e.g. [5,6]), or even semantic content
of the text (e.g. [7]). More or less each of these issues can,
and have been investigated either offline or online related
to the available data.
Up to sixty percent of children’s typical school day is
allocated to fine motor activities, with writing being the
predominant task during these time periods [8]. These
tasks all require the foundational skill of basic handwrit-
ing proficiency to allow teachers to accurately assess
students’ understanding and comprehension of instruc-
tional material. If students do not possess basic hand-
writing proficiency, it can limit their ability to success-
fully complete a majority of classroom tasks. In addition,
it has also been suggested that students with handwriting
problems need to focus more attention on the physical
process of writing, thus limiting use of higher order cog-
nitive skills, planning and generation of content [9]. Thus,
handwriting proficiency is an important foundation upon
which success with later writing tasks depends. Due to
the number of every day school tasks which involve
writing, unsuccessful mastery of handwriting skill can
negatively influence later success in school.
1.1. Support Vector Machine
Support Vector Machine (SVM) is a new classification
technique based on the statistical learning theory pro-
posed by Vapnik in 1995 [10]. It can successfully solve
over-fitting, local optimal problem and is especially
suitable for small-sample and high-dimensional nonlinear
case. Besides, it already showed good results in the
medical diagnostics, optical character recognition, elec-
tric load forecasting and other fields.
Kernel Fuction
In general, a radial basis function is one of the most
popular kernel and reasonable first choice. The reason
why is, this kernel nonlinearly
Given the linearly separability sample set (xi, yi) where
i = 1,…, n. If taking the simplest case; 2 class classifica-
tion, then xRn, y { + 1, 1} is the classes number.
The commonly form of the linear decision function is:
f(x) = w. x + b (1)
Sometimes linear classifiers are not complex enough;
therefore SVM maps the data into a higher dimensional
space, unlike the linear kernel which can handle the case
Copyright © 2013 SciRes. ENG
A. A. HASSEIM ET AL. 85
when the relation between class labels and attributes is
nonlinear [11]. Formally, pre-process the data with:
x →φ(x) (2)
and then learn the map from φ(x) to y:
f(x) = w. φ(x) + b (3)
However, the dimensionality of φ(x) can be very large,
making w hard to represent explicitly in memory, and
hard to solve. The Representer theorem (Kimeldorf &
Wahba, 1971) shows that (for SVMs as a special case):
1
()
m
ii
i
x
w

(4)
for some variables α. Instead of optimizing w directly we
can thus optimize α. The decision rule is now:

1
() ( ).
m
ii
i
f
xxx
 
b
(5)
If the dot product (x. xi) is replaced by the kernel func-
tion K(x, x), the optimal decision function is as follows:

1
() ,
m
ii
i
fx Kxx b
(6)
In this project, 2 kinds of common kernel function are
used. The first one is Gaussian radial basis function
(RBF):
2
2
(,) exp( )
i
i
x
x
Kx x g
 (7)
and the other one is polynomial kernel:

(,) . 1
d
ii
Kx x xx

(8)
Classical techniques utilizing radial basis functions
employ some method of determining a subset of centre.
Typically a method of clustering is first employed to se-
lect a subset of centre. An attractive feature of the SVM
is that this selection is implicit, with each support vectors
contributing one local Gaussian function, center at that
data point.
1.2. Cross Validation (CV)
Currently, cross-validation has been widely used for es-
timating the performance of neural networks and other
applications such as support vector machine and k-nearest
neighbor. Cross-validation is a statistical method of
evaluating and comparing learning algorithms. The basic
idea of cross-validation is splitting the data, which is
consists of dividing the available training data into two
sets. The first set is used to train the network, while the
other is used to evaluate the performance of the trained
network. In typical cross-validation, the training and
validation sets must cross-over in successive rounds such
that each data point has a chance of being validated
against. The basic form of cross-validation is k-fold
cross-validation. Other forms of cross-validation are spe-
cial cases of k-fold cross-validation or involve repeated
rounds of k-fold cross-validation.
Advantages of this method are as follows: 1) Average
classification accuracies of k SVM classifiers are used to
evaluate the SVM classifier parameters performance
which can improve the generalization ability of the SVM
classifier with the optimized parameters; 2) k-fold
cross-validation method can ensure all the sample data be
involved in the SVM classifier training and validation, it
can make full use of the limited sample data; 3) no matter
how the data gets divided, every data point is used as a
test set exactly once, and gets to be in a training set k-1
times. The disadvantage of this method is that the train-
ing algorithm has to be rerun from scratch k times, which
means it takes k times as much computation to make an
evaluation.
2. Methodology
The data was obtained from Khalid et al in [13]. The data
is composed of 120 samples which contain 2 features
(that is The standard deviation of pen pressure when
drawing RU, p-value < 0.0001 and z-value = minus
4.319 and Ratio of time taken to draw HR and HL,
p-value < 0.0001 and z-value = minus 5.205.) and two
group of writers (that is below average printers (test
group) and above average printers (control group)).
Firstly, the data is portioned into k equally sizes seg-
ments or folds. In this project, we used 10-fold cross
validation (k = 10) as it is the most common used for data
mining and machine learning. As shown in Figure 1, the
darker section of the data are used for training while the
remaining data; lighter sections are used for validate the
model. This process is repeated 10 times until all sections
have been validated.
Model Parameter Selection
Two models; SVM of polynomial kernel function and
RBF kernel are chose in looking for performance com-
parison. Performance of the SVM depends on the choice
of parameters. The optimal selection of these parameters
is a nontrivial issue. According to study, the important of
RBF kernel is need to find parameter C and g. SVM of
polynomial kernel function chooses different parameter
C and d. The penalty factor C, is used to improve gener-
alized capability when C is increasing while g and d are
the adjustable parameter of study machine in the experi-
ment and they are used to adjust experienced error value.
The parameter slightly influences classification result
when a smaller amount of training samples are used [12].
After training SVM, the best value C and g can be
used to classify children with handwriting problems. For
Copyright © 2013 SciRes. ENG
A. A. HASSEIM ET AL.
Copyright © 2013 SciRes. ENG
86
Trai n
Trai n
Model
Model
Te st Tes t
Result
Result
Train
Model
Tes t
Result
Figure 1. Procedure of 10-fold cross validation.
Table 1. Accuracy of Prediction based on SVM with RBF
Kernel.
the SVM with polynomial kernel, there are two parame-
ters: C and d. The SVM with RBF kernel has also two
parameters: g and C. In order to know different per-
formance each parameter produces to outputs, we select
three values for each parameter just like choosing the
number of hidden nodes in the neural networks.
Accuracy of Predictions (%)
Feature g/C
0.01 0.1 1
1 83.33 91.67 83.33
10 93.33 91.67 91.67
Feature 1
100 91.67 91.67 91.67
1 91.67 92.80 91.67
10 91.67 91.67 83.33
Feature 2
100 91.67 86.67 83.33
3. Results and Discussion
Table 1 and Table 2 present the recognition results using
the SVM with polynomial kernel, RBF kernel respec-
tively. The classification was considered correct if the
output from the model was similar to the one that had
been judged by the teachers (using Handwriting Profi-
ciency Screening Questionnaire (HPSQ)). In this paper,
we used the classification error (rejection of genuine
category) as the metric.
Table 2. Accuracy of Prediction based on SVM with Poly-
nomial Kernel.
According to Table 1, as can be seen the percentage of
correct prediction of feature 1 is in decreasing when the
variation g varies from 0.01 to 0.1. While it is in reverse
direction when the variation g varies from 0.1 to 1.The
results confirmed that the best value of the variation g
near 0.1. When the coefficient of penalty C is increased,
the accuracy of prediction is in decreasing. Different
from feature 1, feature 2 is seen to be decreasing in per-
centage of correct prediction when g varies from 0.01 to
1 and when C increases in the value.
Accuracy of Predictions (%)
Feature g/d
0.01 0.1 1
3 86.67 86.67 91.67
5 83.33 86.67 83.33
Feature 1
10 66.67 71.67 83.33
3 86.67 86.67 86.67
5 83.33 83.33 86.67
Feature 2
10 61.67 71.67 86.67
In the other hand, the result from Table 2 shows in
different from Table 1. It is clear that, when the variation
g increases in the value from 0.01 to 1, both percentage
of corrects prediction for feature 1 and feature trends to
decrease. While when the variation d varies from 3 to 10,
the accuracy of prediction is increasing. This exhibits
SVM good generalization performance.
The results reported here have shown that the per-
formance of SVM with RBF kernel is better than SVM
with polynomial kernel. We use SVM (RBF kernel) with
A. A. HASSEIM ET AL. 87
changing C and g to simulate and to classify children
with and without handwriting problem based on drawing
tasks.
4. Conclusions
SVM RBF and polynomial have been used in this study
to select those who are at risk of handwriting difficulty
due to the improper use of graphic rules. Cross-validation
method is adopted to choose parameter in order to gain
preferable classificatory result. In this paper, we have
testified that the performance of SVM with RBF kernel is
better than the one with polynomial kernel. Experiment
simulative results indicate: average accuracy of classifi-
catory testing based on SVM RBF algorithm reaches
more than 93%. The data is apparently high compared
with SVM polynomial algorithm.
5. Acknowledgements
This work was supported by the Malaysia Ministry of
Higher Education and Universiti Teknologi Malaysia
under Vote Q.J130000.2623.09J28.
REFERENCES
[1] V. Berninger, A. Cartwright, C. Yates, H. L. Swanson
and R. Abbott, “Developmental Skills Related to Writing
and Reading Acquisition in the Intermediate Grades:
Shared and Unique Variance,” Reading and Writing: An
Interdisciplinary Journal, Vol. 6, 1994, 161-196.
doi:10.1007/BF01026911
[2] S. Graham, V. W. Berninger, N. Weintraub and W.
Schafer, “Development of Handwriting Speed and Legi-
bility in Grades 1-9,” Journal of Educational Research,
Vol. 92, 1997, pp. 42-52.
doi:10.1080/00220679809597574
[3] Z. Yong, T. Tan and Y. Wang, “Biometric Personal Iden-
tification Based on Handwriting,” Pattern Recognition,
Proceedings. 15th International Conference on, Vol. 2,
2000, pp. 797-800.
[4] L. M. Lorigo and V. Govindaraju, “Offline Arabic Hand-
Writing Recognition: A Survey,” Pattern Analysis and
Machine Intelligence, IEEE Transactions on, Vol. 28, pp.
712-724, 2006. doi:10.1109/TPAMI.2006.102
[5] H. Ishida, et al., “A Hilbert Warping Method for Hand-
Writing Gesture Recognition,” Pattern Recogn., Vol. 43,
2010, pp. 2799-2806. doi:10.1016/j.patcog.2010.02.021
[6] H. Bezine, A. D. Alimi and N. Sherkat, “Generation and
Analysis of Handwriting Script with the Beta-Elliptic
Model,” Proceedings of the Ninth International Work-
shop on Frontiers in Handwriting Recognition, 2004, pp.
515-520. doi:10.1109/IWFHR.2004.45
[7] S. Srihari, J. Collins, R. Srihari, H. Srinivasan, S. Shetty
and J. B. Griffler, “Automatic Scoring of Short Hand-
Written Essays in Reading Comprehension Tests,” Artifi-
cial Intelligence, Vol. 172, 2008, pp. 300-324.
doi:10.1016/j.artint.2007.06.005
[8] K. McHale and S. A. Cermak, “Fine Motor Activities in
Elementary School: Preliminary Findings and Provisional
Implications for Children with Fine Motor Problems,”
American Journal of Occupational Therapy, Vol. 46,
No.10, 1992, pp. 898-903. doi:10.5014/ajot.46.10.898
[9] V. Berninger, “Coordinating Transcription and Text Gen-
eration in Working Memory During Composing: Auto-
matic and Constructive Processes,” Learning Disability
Quarterly, Vol. 22, 1999, pp. 99-112.
doi:10.2307/1511269
[10] V. N. Vapnik, “The Nature of Statistical Learning The-
ory,” Springer, New York, 1995.
doi:10.1007/978-1-4757-2440-0
[11] M. Hong, G. Yanchun, W. Yujie and L. Xiaoying, “Study
on Classification Method Based on Support Vector Ma-
chine,” 2009 First International Workshop on Education
Technology and Computer Science, Wuhan, China, 7-8
March 2009, pp.369-373.
[12] S. S. Keerthi and C. J. Lin, “Asymptotic Behaviors of
Support Vector Machines with Gaussian Kernel,” Neural
Computation, Vol. 15, No. 7, 2003, 1667-1689.
doi:10.1162/089976603321891855
[13] S. S. Keerthi and C. J. Lin, “Asymptotic Behaviors of
Support Vector Machines with Gaussian Kernel,” Neural
Computation, Vol. 15, No. 7, 2003, pp. 1667-1689.
doi:10.1162/089976603321891855
Copyright © 2013 SciRes. ENG
... SVM additionally proves effective on data with large dimensions [13]. SVM has also been used satisfactorily in a variety of applications, including covid-19 forecasting [14], identifying leaf diseases [15] [16], forecasting stock prices [17], classifying handwriting [18], identifying fraudulent banking [19], identifying fraudulent credit cards [20], recognition of faces [21], and numerous others. Based on the results of the previous research, this research will develop the SVM method to predict health insurance claims. ...
Article
Full-text available
Health insurance industry is very much needed by the community in handling the financial risks in the health sector. The number of claims greatly affects the achievement of profits and the sustainability of the health insurance industry. Therefore, filing claims by insurance users from year to year is important to be predicted in insurance firm. The Machine Learning (ML) method promises to be a good solution for predicting health insurance claims compared to conventional data analytics methods. Support Vector Machine (SVM) is one of the superior ML approaches. Nonetheless, SVM performance is controlled by the suitable selection of SVM parameters. The SVM parameters is typically selected by trial and error, sometimes resulting in not optimal performance and taking a long time to complete. Swarm intelligence-based algorithms can be used to select the best parameters from SVM. This method is capable of locating the global best solution, is simple to implemented, and doesn't involve derivatives. One of the best swarm intelligence algorithms is the Bat Algorithm (BA). BA has a faster convergence rate than other algorithms, for example Particle Swarm Optimization (PSO). Based on this situation, this paper offers the new classification model for predicting health insurance claim based on SVM and BA. The metrics utilized for evaluation are accuracy, recall, precision, f1-score, and computing time. The experimental outcomes show that the proposed approach is superior to the conventional SVM and the hybrid of SVM and PSO in forecasting health insurance claims. In addition, the proposed method has a substantially shorter computing time than the hybrid of SVM and PSO. The outcomes of the experiments also indicate that the new classification model for predicting health insurance claim based on the SVM and BA can avoid over-fitting condition.
... SVM also works well in high dimensional data [7]. SVM has been implemented successfully in many applications, such as covid-19 prediction [8], leaf diseases detection [9], [10], stock price prediction [11], handwriting classification [12], bank fraud detection [13], credit card fraud detection [14], face recognition [15], and many other applications. However, the SVM performance is determined by the parameters and the features [16]. ...
Article
Full-text available
The number of claims plays an important role the profit achievement of health insurance companies. Prediction of the number of claims could give the significant implications in the profit margins generated by the health insurance company. Therefore, the prediction of claim submission by insurance users in that year needs to be done by insurance companies. Machine learning methods promise the great solution for claim prediction of the health insurance users. There are several machine learning methods that can be used for claim prediction, such as the Naïve Bayes method, Decision Tree (DT), Artificial Neural Networks (ANN) and Support Vector Machine (SVM). The previous studies show that the SVM has some advantages over the other methods. However, the performance of the SVM is determined by some parameters. Parameter selection of SVM is normally done by trial and error so that the performance is less than optimal. Some optimization algorithms based heuristic optimization can be used to determine the best parameter values of SVM, for example Particle Swarm Optimization (PSO) and Genetic Algorithm (GA). They are able to search the global optimum, easy to be implemented. The derivatives aren’t needed in its computation. Several researches show that PSO give the better solutions if it is compared with GA. All particles in the PSO are able to find the solution near global optimal. For these reasons, this article proposes the health claim insurance prediction using SVM with PSO. The experimental results show that the SVM with PSO gives the great performance in the health claim insurance prediction and it has been proven that the SVM with PSO give better performance than the SVM standard.
... Bouadjenek et al. [17] introduced two N AL-Q C Y. S gradient features to classify the writer's age, gender and handedness: the histogram of oriented gradients and gradient local binary patterns. They used the Support Vector Machine (SVM) [18], method to classify the documents. IAM and Khatt datasets were used to evaluate the system. ...
Article
Handwriting analysis is the science of determining an individual’s personality from his or her handwriting by assessing features such as slant, pen pressure, word spacing, and other factors. Handwriting analysis has a wide range of uses and applications, including dating and socialising, roommates and landlords, business and professional, employee hiring, and human resources. This study used the ResNet and GoogleNet CNN architectures as fixed feature extractors from handwriting samples. SVM was used to classify the writer’s gender and age based on the extracted features. We built an Arabic dataset named FSHS to analyse and test the proposed system. In the gender detection system, applying the automatic feature extraction method to the FSHS dataset produced accuracy rates of 84.9% and 82.2% using ResNet and GoogleNet, respectively. While the age detection system using the automatic feature extraction method achieved accuracy rates of 69.7% and 61.1% using ResNet and GoogleNet, respectively
... Support Vector Machine (SVM) is a discriminative type of classifier used for both regression as well as classification purpose. SVM has been used for the recognition of handwritten digits [13], speaker identification [14], to detect faces in the images [15] and so on. ...
Conference Paper
Full-text available
This paper aims to recognize emotions from speech on a more realistic database using various classifiers. For this purpose, experiments are conducted using the standard 6373 dimensional Computational Paralinguistic Challenge (ComParE) feature set. The features extracted are modeled using Support Vector Machine (SVM) and Deep Neural Network (DNN) classifiers. The effectiveness of the proposed system has been validated on the Emotional Sensitivity Assistance System for People with Disabilities (EmotAsS) database, provided as part of the INTERSPEECH 2018 Computational Paralinguistics Challenge. Besides, experiments have also been performed on a reduced subset of the standard ComParE acoustic feature set consisting of 873 prosodic features. Experimental results suggest that the reduced prosodic feature set provides comparable performance with the original feature set. It is also observed that DNN classifier provides better performance than SVM.
... The data set is assigned to a class which generates the maximum conditional posterior probability with available attributes as input using Bayes rule [15]. K-fold cross validation method is implemented to split the features extracted in the proposed model into training and testing set [16]. This method has an advantage that it makes full use of the limited sample dataset for classification so as to evaluate performance of proposed feature set for glioma grade identification. ...
Article
Gliomas are most common brain tumor in children and adults worldwide and accounts for 80% of all malignant tumors. In this work, we proposed a novel method for glioma grade classification using texture feature set extracted from T2-weighted magnetic resonance images (MRI). Gray-level co-occurrence matrix (GLCM) parameters are computed from local Optimal Oriented Pattern (LOOP) transformed images to differentiate low grade and high grade glioma. Classification is carried out using support vector machine (SVM), Naive Bayes and k-nearest neighbor (k-NN) classifier and their performance for glioma grade classification is accessed. SVM classifier outperforms other classifiers and achieved an accuracy of 95%, sensitivity of 93% and specificity of 100% for classifying gliomas using proposed LOOP transformed based GLCM texture features.
Chapter
Age detection from handwritten documents is a crucial research area in many disciplines such as forensic analysis and medical diagnosis. Furthermore, this task is challenging due to the high similarity and overlap between individuals’ handwriting. The performance of the document recognition and analysis systems, depends on the extracted features from handwritten documents, which can be a challenging task as this depends on extracting the most relevant information from row text. In this paper, a set of age-related features suggested by a graphologist, to detect the age of the writers, have been proposed. These features include irregularity in slant, irregularity in pen pressure, irregularity in textlines, and the percentage of black and white pixels. Support Vector Machines (SVM) classifier has been used to train, validate and test the proposed approach on two different datasets: the FSHS and the Khatt dataset. The proposed method has achieved a classification rate of 71% when applied to FSHS dataset. Meanwhile, our method outperformed state-of-arts methods when applied to the Khatt dataset with a classification rate of 65.2%. Currently, these are the best rates in this field.KeywordsAge detectionMachine learningImage processingHandwriting analysis
Article
Gender detection from handwritten documents is a crucial research area in many disciplines such as psychology, pyelography, graphology, and forensic analysis. Furthermore, this task is challenging due to the high similarity and overlap between individuals’ handwriting. The performance of the document recognition and analysis systems, depends on the extracted features from handwritten documents, which can be a challenging task as this depends on extracting the most relevant information from row text. In this paper, a set of gender-related features suggested by a graphologist, to detect the gender of the writers, have been proposed. These features include margins, space between words, pen-pressure and handwriting irregularity. Both SVM and ANN classifiers have been used to train, validate and test the proposed approach on two different data sets: our data set FSHS and ICDAR2013 dataset. The proposed method has achieved high classification rates of 94.7% and 97.1% using SVM and ANN respectively. Meanwhile, our method outperformed state-of-arts methods when applied to the ICDAR2013 dataset with classification rates of 91.4% and 92.5% using SVM and ANN respectively.
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
Article
Research evidence is reviewed to show (a) that transcription and working memory processes constrain the development of composition skills in students with and without learning disabilities; and (b) that in turn other processes constrain the development of transcription and working memory skills. The view of working memory as a resource-limited process is contrasted with a view of working memory as a resource-coordination process that integrates transcription and constructive processes, which may be on different time scales, in real time. Theory-driven, research-validated interventions for transcription are discussed with a focus on how training transcription transfers to improved composition. Five theoretical explanations for why the spelling component of transcription is more difficult to learn than the word recognition component of reviewing are also considered with a focus on the instructional implications of each for improving spelling. Finally, a rationale is presented for directing writing instruction to the simultaneous goals of (a) automaticity of low-level transcription and (b) high-level construction or meaning for purposeful communication.
Article
Classification experiments are made with neural network algorithm and support vector machine method separately. The samples are divided into three groups and two kinds of support vector machines based on polynomial kernel and radial basis function are applied by changing the parameter values. The simulated results show that, as for the dataset with less training samples, using simple structure learning function will avoid the over fitting problem. In contrast, the learning function with slightly simple structure will reduce the generalization ability. In the experiment, the Penalty factor C is introduced in order to allow the training samples to be classified wrongly. Increasing the value of C , generalization ability of the learning machine can be improved. Using cross-validation method to choose parameter values can improve the classification accuracy. The experimental results show that the support vector machine method is superior to the neural network algorithm.
Article
The development of handwriting speed and legibility in 900 children in Grades 1–9 was examined. Each student completed 3 writing tasks: copying a paragraph, writing a narrative, and writing an essay. The children's speed of handwriting on the copying task typically increased from one grade to the next, but the pace of development was uneven during the intermediate grades and leveled off in Grade 9 as speed began to approximate adult speeds. In contrast, improvement in handwriting legibility on the 3 writing tasks was primarily limited to the intermediate grades. Girls' handwriting was more legible than boys' handwriting, and the girls wrote faster in Grades 1, 6, and 7. Right-handers were also faster than left-handers, but there was no difference in the legibility of their written products. Finally, handwriting speed contributed significantly to the prediction of legibility on the narrative and expository writing tasks, but the contribution was small, accounting for only 1 % of the variance.
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.
Article
Multiple measures of the fine motor system, the orthographic system, the phonological system, the working memory system, the verbal intelligence system, the writing system, and the reading system were administered to 300 students in grades 4, 5, and 6. Results showed that the writing system and the reading system share many of the same orthographic, phonological, and working memory sub-processes but thepatterns of concurrent relation between these sub-processes and writing and between these subprocesses and reading differ. These results are consistent with the view that writing and reading draw upon the same as well as unique cognitive systems.
Article
Reading comprehension is largely tested in schools using handwritten responses. The paper describes computational methods of scoring such responses using handwriting recognition and automatic essay scoring technologies. The goal is to assign to each handwritten response a score which is comparable to that of a human scorer even though machine handwriting recognition methods have high transcription error rates. The approaches are based on coupling methods of document image analysis and recognition together with those of automated essay scoring. Document image-level operations include: removal of pre-printed matter, segmentation of handwritten text lines and extraction of words. Handwriting recognition is based on a fusion of analytic and holistic methods together with contextual processing based on trigrams. The lexicons to recognize handwritten words are derived from the reading passage, the testing prompt, answer rubric and student responses. Recognition methods utilize children's handwriting styles. Heuristics derived from reading comprehension research are employed to obtain additional scoring features. Results with two methods of essay scoring—both of which are based on learning from a human-scored set—are described. The first is based on latent semantic analysis (LSA), which requires a reasonable level of handwriting recognition performance. The second uses an artificial neural network (ANN) which is based on features extracted from the handwriting image. LSA requires the use of a large lexicon for recognizing the entire response whereas ANN only requires a small lexicon to populate its features thereby making it practical with current word recognition technology. A test-bed of essays written in response to prompts in statewide reading comprehension tests and scored by humans is used to train and evaluate the methods. End-to-end performance results are not far from automatic scoring based on perfect manual transcription, thereby demonstrating that handwritten essay scoring has practical potential.
Conference Paper
In this paper, we describe a new method to identify the writer of Chinese handwriting documents. There are many methods for signature verification or writer identification, but most of them require segmentation or connected component analysis. They are the kinds of content dependent identification methods as signature verification requires the writer to write the same text (e.g. his name). In our new method, we take the handwriting as an image containing some special texture, and writer identification is regarded as texture identification. This is a content independent method. We apply the well- established 2-D Gabor filtering technique to extract features of such textures and a Weighted Euclidean Distance classifier to fulfil the identification task. Experiments are made using Chinese handwritings from 17 different people and very promising results were achieved.