Conference PaperPDF Available

Domain Specific word Embedding Matrix for Training Neural Networks

October 2019

October 2019

DOI:10.1109/IC-AIAI48757.2019.00022

Conference: 2019 International Conference on Artificial Intelligence: Applications and Innovations (IC-AIAI)
At: Vrdnik, Serbia

Authors:

Đorđe K. Petrović

ITS – Information Technology School

Stefana Janicijevic

Singidunum University

The text represents one of the most widespread sequential models and as such is well suited to the application of deep learning models from sequential data. Deep learning through natural language processing is pattern recognition, applied to words, sentences, and paragraphs. This study describes the process of creating a pre-trained word embeddings matrix and its subsequent use in various neural network models for the purposes of domain-specific texts classification. Embedding words is one of the popular ways to associate vectors with words. Creating a word embedding matrix maps imply well semantic relationship between words, which can vary from task to task.

Content uploaded by Đorđe K. Petrović

Content may be subject to copyright.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Domain specific word embedding matrix for

training neural networks

Đorđe Petrović

Computer Science

Faculty of Electronic Engineering

Niš, Serbia

petrovicdj@gmail.com

Stefana Janićijević

Information Technology School

Comtrade

Belgrade, Serbia

stefana.janicijevic@its.edu.rs

Abstract— The text represents one of the most widespread

sequential models and as such is well suited to the application of

deep learning models from sequential data. Deep learning

through natural language processing is pattern recognition,

applied to words, sentences, and paragraphs. This study

describes the process of creating a pre-trained word embeddings

matrix and its subsequent use in various neural network models

for the purposes of domain-specific texts classification.

Embedding words is one of the popular ways to associate vectors

with words. Creating a word embedding matrix maps imply well

semantic relationship between words, which can vary from task

to task.

Keywords— embedding matrix, word embeddings, text mining,

neural networks, deep learning

I. INTRODUCTION

Deep learning models have achieved incredible results in

the field of Computer Vision in recent years. Computer Vision

is a field of research which primary task is to develop

techniques that assist computers to see and understand content

of digital images or video recordings. Part of Computer Vision

is a field of Speech Recognition and subfield of Natural

Language Processing. Lots of work in deep learning models

methods included vector representation learning of words

through neural language models and performing composition

over learned word vectors for classification [3]. The author

states that word vectors are projected onto a lower dimension

vector space over a hidden layer and that these are essential

property extractors that encode the semantic properties of

word in their dimensions.

Text is one of the most widely used sequential data and as

such it is well suited to the application of deep learning models

from sequential data. Natural Language Processing imply

deep learning that creates pattern recognition applied to

words, sentences and paragraphs in much the same way as

pattern recognition to pixels [2]. The same author explains that

like all other neural networks, deep learning models do not

take raw text as input, but only work with numerical tensors.

Text vectorization is the process of converting text to

numerical tensors and can be done in many ways [2]:

 Segmentation of text into words and transformation of

each word into a vector

 Text segmentation into characters and transformation of

each character into a vector

 Separate multiple consecutive words or characters and

transform them into a vector

The common title that is further used for all these units that

can be divided into text is Tokens and the division process is

Tokenization.

This study describe the use of neural network models for

the classification of texts using pre-trained word embedding.

The aim of this is to classify texts automatically into one or

more predefined classes, using several different approaches,

where the implementation is based on the Keras open source

library (https://keras.io/). In order to use the Keras library for

textual data, this data must be processed first. For the purpose

of Tokenization the Keras class “Tokenizer” is used. This

object takes as an argument the maximum number of words

that are stored after Tokenization, based on their frequency:

MAX_NB_WORDS = 50000

tokenizer = Tokenizer

(num_words=MAX_NB_WORDS)

tokenizer.fit_on_texts(texts)

Once the tokenizer s applied to the data, it could be used

to convert texts to strings of numbers. These numbers

represent the position of each word in the dictionary.

II. WORD EMBEDDING MATRIX

One popular way of associating vectors with words is to

use dense word vectors, called Word Embedding. These items

are vectors whose values are floating point with low

dimensions unlike rare vectors, they pack more information

into lower dimensions and they are learned from the data [2].

The same author states that there are two ways to obtain these

vectors:

 To learn word insertion along with the main research

task. In this setup, it is started with random word

vectors and then the word vector is learned in the same

way as weights from neural network are learned.

 To load the word insertion into your model which is

previously calculated, using some other machine

learning task and this is called Pre-trained Word

Embedding.

The simplest way to associate a dense vector with a word

is to select a vector with random values. The problem with this

approach is that the resulting embedding space lacks structure

and it is difficult for the deep neural network to grasp such a

noisy and unstructured space [2]. The same author goes on to

state that what constitutes a good space for word insertion

depends largely on the research that needs to be conducted.

The perfect word insertion space for movie-related taste

analysis may look different from the perfect embedding space

for example a legal document model, as the importance of

certain semantic relationships varies from task to task and it is

understood that new embedding space is learned for each new

assignment [2]. Fortunately, backpropagation and Keras

library make this easier. It is important for the embedding

layer to learn weights, for example:

from keras.layers import Embedding

embedding_layer = Embedding(1000, 64)

The embedding layer has two arguments, and in the

previous example is:

1000 – number of possible tokens

64 – dimensionality of installation

According to [2] the embedding layer takes as input the

2D integer tensor, shapes of sequence length, with each input

being a series of integers. All sequences in a group must have

the same length because they must be packed into one tensor,

so sequences that are shorter than the others should be filled

with zeros and longer sequences should be truncated. This

layer returns a 3D tensor with floating point values, shapes of

sequence length and dimensionality embedding. Created 3D

tensor can be processed in a recurrent neural network layer or

in a 1D convolutional layer.

Initially, there are random weights in the embedding layer,

just like any other layer. During training, these word vectors

gradually adapt and transform the space into something that

the lower layers can take advantage of. After complete

training the embedding layer will display a large structure –

the type of structure that specializes in the specific problem

the model is training for [2]. The same author further explains

that in situations where there is a small set of training data that

cannot be used to teach the proper vocabulary for a particular

task, then embedded vectors from a pre-calculated embedded

space can be loaded with highly structure. It performs useful

properties and captures the generic aspects of language

structure. In such cases, it is possible to reuse the properties

learned on the second problem. Such word insertion properties

are usually calculated using word occurrence statistics, using

different techniques, some of which involve neural networks

and some of the other techniques not including them [2]. There

are various pre-calculated databases with embedded word

properties, which can be downloaded and used, to create an

index that maps words as strings into their vector presentation

as numerical vectors [2]. The embedding matrices are then

created, which can be loaded into the embedding layer. It must

be a matrix whose dimensions are maximum number of words

and embeddedness dimension. Each entry contains a

dimensions vector for the word with the index “i” in the index

of the reference words [2]. In addition, the embedding layer

can be “frozen” by setting its trainable attribute to be “False”

[2]. This prevents the weights of that layer from being updated

during model training. If this is not done then the previously

learned weights will be modified during training. Also, the

model can be trained without loading pre-trained embedding

words and without freezing the embedding layer. In this case,

during the training process, the properties of embedding

tokens that are specific to a particular task are learned. This

style while lot of data is available is generally more powerful

than the pre-embedded properties of embedding words [2].

There are several reasons for creating a word embedding

matrix, some of which are:

 When there is a large amount of textual data available

that is specific to the area under study, this makes it

possible to create such a matrix

 Creating an embedding matrix well maps the semantic

relationship between words which can vary from task to

task

 When training neural networks, with the embedded

word loading matrix and the embedding layer freezing,

the number of training parameters is significantly

reduced, thus speeding up model training

A. Methodology of Word Embedding Matrix creation

The process of creating a word embedding matrix consists

of the following stages:

1. Loading text data

2. Creating word vectors

3. Convert word vectors to numerical matrix, which is

suitable for TensorFlow and Keras models

4. Saving the created matrix model

Domain specific texts abound with specific language

forms and links between them. This study used a set of legal

texts in Serbian, which served as training data. It is a marked

data set, which is made up of larger texts segmented into

smaller units, each of which is represented by a single record

in the database. This set was distinguished by the fact that each

of these segments was assigned a corresponding designation,

or segments which were classified into 5 classes.

Creating word vectors could be done in the following way:

w2v = Word2Vec(data, size=emb_dim, window,

min_count, negative,

iter, workers=multiprocessing.cpu_count())

word_vectors = w2v.wv

Converting word vectors into numerical matrix which is

suitable for TensorFlow and Keras models could be performed

at following steps:

embedding_matrix =

np.zeros((len(w2v.wv.vocab), emb_dim))

for i in range(len(w2v.wv.vocab)):

embedding_vector =

w2v.wv[w2v.wv.index2word[i]]

if embedding_vector is not None:

embedding_matrix[i] =

embedding_vector

Created matrix model could be saved by following

instruction:

np.savetxt('embedding_matrix.txt',

embedding_matrix, delimiter=' ',

encoding='utf-8-sig')

In the experiment we conducted, the following parameters

were used when creating the word embedding matrix:

 The dimensions of the embedding matrix is emb_dim =

400

 Number of words that are observed before and after

indexed word is 5

 The embedding matrix will only contain words that

appear at least 5 times in the texts

 In relation to the observed word, there are maximum of

15 other words that have negative properties

 Number of iterations is 5

 The number of processes is as large as the number of

processors

 As a result of this and on the dataset that was the

subject of this research, a pre-trained embedding matrix

was obtained; matrix has 43654x400 dimensions which

means that a vector of dimension 400 was created for

43645 words

B. The process of loading a pre-trained embedded matrix

The process of loading a pre-trained word embedding

matrix into a neural network could be performed as follows:

embeddings_index = {}

f =

open('embedding_matrix.txt',encoding='utf-

8-sig')

for line in f:

values = line.split()

word = values[0]

coefs = np.asarray(values[1:],

dtype='float32')

embeddings_index[word] = coefs

f.close()

embedding_matrix =

np.random.random((len(word_index) + 1,

EMBEDDING_DIM))

for word, i in word_index.items():

embedding_vector =

embeddings_index.get(word)

if embedding_vector is not None:

# words not found in embedding

index will be all-zeros.

embedding_matrix[i] =

embedding_vector

III. APPLICATION OF EMBEDDING MATRICES FOR NEURAL

NETWORK TRAINING PURPOSES

Recurrent Neural Networks (RNNs) are designed for

sequential data such as text sentences, time series, and other

discrete arrays, such as biological sequences. Working with

textual data are also the most common use cases of recurrent

neural networks, and these are some examples of the

application of [1]:

 The input can be a series of words, and the output can

be the same series of words, plus one word, which

allows the next word to be predicted at any point in the

text. It is a classic language model in which one tries to

predict the next word, based on the sequential history of

the word;

 The input can be a sentence in one language, and the

output can be a sentence in another language. In this

case, two recurrent neural networks can be connected to

learn models of translation between two languages.

Further, recurrent neural networks can be linked to

another type of network (eg, convolutional neural

networks), to learn image titles;

 The input can be a series of words (eg sentences) and

the output can be a vector of probability of belonging to

a class. This approach is used for classification

purposes, such as to analyze feelings. This model will

be used in this research.

According to the same author [1], for these networks, the

input is of the form, where the d-dimensional point received

at some time t. In working with text, the vector contains one

"one-hot encoded word" at time t. This term refers to a vector

whose length is equal to the size of the dictionary, in which

the component relating to the relevant word is 1, and all other

components are 0. The key point of these neural networks is

the existence of a loop of its own that causes the hidden state

of the neural network change after each entry.

Weighting matrices (or weights) of links are shared with

multiple links in a time-stratified network to ensure that the

same function is used at all times. This sharing is the key to

domain-specific insights that the network learns. The

backpropagation algorithm takes split and temporary length

into account when updating weights, during the learning

process, and is used to determine whether each weighting

should increase or decrease. Due to the recursive nature,

recurrent neural networks have the ability to calculate

functions for variable length inputs.

A general way to apply a word embedding matrix in

Recurrent neural networks over an embedding layer is to:

embedding_layer =

Embedding(len(word_index) + 1,

EMBEDDING_DIM,

weights=[embedding_matrix],

input_length=MAX_SEQUENCE_LENGTH,

trainable=False)

sequence_input =

Input(shape=(MAX_SEQUENCE_LENGTH,),

dtype='int32')

embedded_sequences =

embedding_layer(sequence_input)

l_lstm =

Bidirectional(LSTM(units))(embedded_sequen

ces)

Although the recommended mode of machine learning

from text sequences is the one occurring in Recurrent neural

networks, the use of Convolutional neural networks has

become increasingly popular in recent years [1]. The same

author goes on to explain why Convolutional neural networks

do not at first glance seem naturally suited to texts research

posts:

 When Convolutional neural networks are used to work

with images, the shapes found in the images are

interpreted in the same way, regardless of where they

are in the image. This is not the case with texts, because

the position of words in sentences is quite important.

 Issues such as position translation and editing cannot be

treated in the same way in textual data, as is the case

with pictures. The adjacent pixels in the image are

usually very similar, while the neighboring words in the

text are never finished.

Despite these differences, systems based on

Convolutionary neural networks have shown improved

performance in recent years. Just as an image is represented as

a two-dimensional object with an additional dimension of

depth, defined by the number of color channels, the text

sequence is represented as a one-dimensional object with a

depth determined by its dimensionality of representation [1].

The same author further explains that when working with

texts, instead of the three-dimensional "boxes" used in image

manipulation, text data filters are two-dimensional "boxes"

whose dimensions are the length of the sequence, the sliding

along the text, and the depth defined for lexicon. The only

challenge in this approach is that the number of channels

increases and, consequently, the number of parameters in the

filters in the first layer increases.

A general way to apply a word embedding matrix in

Convolutional neural networks over an embedding layer is to:

embedding_layer =

Embedding(len(word_index) + 1,

EMBEDDING_DIM,

weights=[embedding_matrix],

input_length=MAX_SEQUENCE_LENGTH,

trainable=False)

sequence_input =

Input(shape=(MAX_SEQUENCE_LENGTH,),

dtype='int32')

embedded_sequences =

embedding_layer(sequence_input)

l_conv1= Conv1D(filters, kernel_size,

activation='relu')(embedded_sequences)

In addition to Recurrent and Convolutional Neural

Networks, Hierarchical Attention Networks have achieved

outstanding performance for classifying documents in a given

language [4]. This type of neural networks has been proposed

by [5] as a model that has the following characteristics:

 It has a hierarchical structure, which reflects the

hierarchical structure of documents;

 It has two levels of attention mechanisms, applied at the

sentence level, that allow different presence for more or

less important content when constructing a document

representation.

The intuition underlying this model is that not all sections

of the document are equally relevant to the answer to the query

and that determining the relevant sections involves modeling

the interaction of words, not just their presence [5]. The

designed architecture gathers two basic insights into the

structure of the document. First, since documents have a

hierarchical structure (words form sentences and sentences

form a document), in this model, the document view is

constructed by first constructing a sentence view and then

integrating those views into a document view.

Secondly, it is noted that different words and sentences in

the documents are of different informative nature. Moreover,

the significance of words and sentences depends on the

context, ie. the same word or sentence may have different

meanings in a different context [5]. To include sensitivity to

this fact, this model includes two levels of attention

mechanism, one at the word level and the other at the sentence

level. This allows the model to convert more or less attention

to particular words or sentences when constructing a

document representation.

The key novelty in this approach is that this system uses

context to detect when the token order is relevant, rather than

simply filtering token sequences, taken out of context.

Experiments conducted by the authors of [5] have shown that

the proposed architecture far outweighs the previous methods.

Visualization of the layers of attention illustrates that the

model chooses qualitatively informative words and sentences.

The general way to apply the word embedding matrix in

hierarchical neural networks with a rigging mechanism over

the embedding layer is to:

embedding_layer =

Embedding(len(word_index) + 1,

EMBEDDING_DIM,

weights=[embedding_matrix],

input_length=MAX_SENT_LENGTH,

trainable=False)

sentence_input =

Input(shape=(MAX_SENT_LENGTH,),

dtype='int32')

embedded_sequences =

embedding_layer(sentence_input)

l_lstm =

Bidirectional(LSTM(units))(embedded_sequen

ces)

IV. EVALUATION

In the Recurrent neural network model, that we trained in

our experiment and using a word embedding matrix, the total

number of model parameters was 48,565,705 and the number

of training parameters was only 180,905, or 0.372%. Thus, in

this way the number of training parameters was reduced by

over 99.627%.

In the Convolutional neural network model, that we

trained in our experiment and using a word embedding matrix,

the total number of model parameters was 48,822,181 and the

number of training parameters was only 437,381, or 0.896%.

Thus, in this way the number of training parameters was

reduced by over 99.104%.

In the Hierarchical Attention Network model, that we

trained in our experiment and using a word embedding matrix,

the total number of model parameters was 48,584,765 and the

number of training parameters was only 199,965, or 0.412%.

Thus, in this way the number of training parameters was

reduced by over 99.588%.

Based on the examples above, it can be seen that the

number of parameters of all these neural network models has

been reduced by over 99%, so training of these models is also

accelerated.

V. CONCLUSION

The goal of machine learning is the ability to later apply

trained models for prediction purposes on unlabeled data, and

in areas for which training has been performed. Domain-

specific texts abound with language forms and links between

them, which is why special attention has been given in this

research to the process of associating vectors with words.

As the "Placing the words" one of the popular ways to do

this, creating a matrix for embedding words can be mapped to

a good relationship between words in texts that are specific to

a domain. Subsequent application of the matrix to embed

words in other models of machine learning from text data from

the same domain, significantly reduces the number of

parameters for training and thus can accelerate the training of

these models.

VI. REFERENCES

[1] C. C. Aggarwal, “Neural Networks and Deep Learning”, s.l.:Springer

International Publishing, 2018.

[2] F. Chollet, “Deep Learning with Python”, s.l.:Manning Publications

Co., 2018.

[3] Y. Kim, “Convolutional Neural Networks for Sentence Classification”,

Doha, Qatar, Conference on Empirical Methods in Natural Language

Processing, 2014.

[4] N. Pappas & A. Popescu-Belis, “Multilingual Hierarchical Attention

Networks for Document Classification”, Taipei, Taiwan, Proceedings

of the 8th International Joint Conference on Natural Language

Processing (IJCNLP), 2017.

[5] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, “Hierarchical

Attention Networks for Document Classification”, San Diego,

California, Proceedings of the 2016 Conference of the North American

Chapter of the Association for Computational Linguistics: Human

Language Technologies, 2016.

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

Preprint

Full-text available

Apr 2023

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.

CNN-VAE: An intelligent text representation algorithm

Article

Full-text available

Mar 2023
J SUPERCOMPUT

Collecting and analyzing data from all devices to improve the efficiency of business processes is an important task of Industrial Internet of Things (IIoT). In the age of data explosion, extensive text data generated by the IIoT have given birth to a variety of text representation methods. The task of text representation is to convert the natural language to a form that computer can understand with retaining the original semantics. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in natural language. Combining the advantages of convolutional neural network (CNN) and variational autoencoder (VAE), this paper proposes an intelligent CNN-VAE text representation algorithm as an advanced learning method for social big data within next-generation IIoT, which help users identify the information collected by sensors and perform further processing. This method employs the convolution layer to capture the local features of the context and uses the variational technique to reconstruct feature space to make it conform to the normal distribution. In addition, the improved word2vec model based on topical word embedding (TWE) is utilized to add topical information to word vectors to distinguish polysemy. This paper takes the social big data as an example to illustrate the way of the proposed algorithm applied in the next-generation IIoT and utilizes Cnews dataset to verify the performance of proposed method with four evaluating metrics (i.e., recall, accuracy, precision, and F1-score). Experimental results indicate that the proposed method outperforms word2vec-avg and CNN-AE in K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) classifiers and distinguishes polysemy effectively.

HNTSumm: Hybrid text summarization of transliterated news articles

Article

Full-text available

Mar 2023

Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.

An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

Preprint

Aug 2020

In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.

Multilingual Hierarchical Attention Networks for Document Classification

Article

Full-text available

Jul 2017

Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language entails linear parameter growth and lack of cross-language transfer. Learning a single multilingual model with fewer parameters is therefore a challenging but potentially beneficial objective. To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or attention mechanisms across languages, using multi-task learning and an aligned semantic space as input. We evaluate the proposed models on multilingual document classification with disjoint label sets, on a large dataset which we provide, with 600k news documents in 8 languages, and 5k labels. The multilingual models outperform strong monolingual ones in low-resource as well as full-resource settings, and use fewer parameters, thus confirming their computational efficiency and the utility of cross-language transfer.

Hierarchical Attention Networks for Document Classification

Conference Paper

Full-text available

Jan 2016

Neural Networks and Deep Learning: A Textbook

Book

Jan 2018

Charu C. Aggarwal

This book covers both classical and modern models in deep learning. The chapters of this book span three categories: The basics of neural networks: Many traditional machine learning models can be understood as special cases of neural networks. An emphasis is placed in the first two chapters on understanding the relationship between traditional machine learning and neural networks. Support vector machines, linear/logistic regression, singular value decomposition, matrix factorization, and recommender systems are shown to be special cases of neural networks. These methods are studied together with recent feature engineering methods like word2vec. Fundamentals of neural networks: A detailed discussion of training and regularization is provided in Chapters 3 and 4. Chapters 5 and 6 present radial-basis function (RBF) networks and restricted Boltzmann machines. Advanced topics in neural networks: Chapters 7 and 8 discuss recurrent neural networks and convolutional neural networks. Several advanced topics like deep reinforcement learning, neural Turing machines, Kohonen self-organizing maps, and generative adversarial networks are introduced in Chapters 9 and 10. The book is written for graduate students, researchers, and practitioners. Numerous exercises are available along with a solution manual to aid in classroom teaching. Where possible, an application-centric view is highlighted in order to provide an understanding of the practical uses of each class of techniques.

Convolutional Neural Networks for Sentence Classification

Article

Aug 2014

Yoon Kim

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We first show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static word vectors. The CNN models discussed herein improve upon the state-of-the-art on 4 out of 7 tasks, which include sentiment analysis and question classification.

Deep Learning with Python

Jan 2018

F Chollet

F. Chollet, "Deep Learning with Python", s.l.:Manning Publications Co., 2018.

Domain Specific word Embedding Matrix for Training Neural Networks

Abstract

Recommended publications

Employing Siamese MaLSTM Model and ELMO Word Embedding for Quora Duplicate Questions Detection

Improve Document Embedding for Text Categorization Through Deep Siamese Neural Network

Tibetan text classification based on RNN

Deep Learning Sentiment Analysis for MOOC Course Reviews