Conference PaperPDF Available

Domain Specific word Embedding Matrix for Training Neural Networks

Authors:

Abstract

The text represents one of the most widespread sequential models and as such is well suited to the application of deep learning models from sequential data. Deep learning through natural language processing is pattern recognition, applied to words, sentences, and paragraphs. This study describes the process of creating a pre-trained word embeddings matrix and its subsequent use in various neural network models for the purposes of domain-specific texts classification. Embedding words is one of the popular ways to associate vectors with words. Creating a word embedding matrix maps imply well semantic relationship between words, which can vary from task to task.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Domain specific word embedding matrix for
training neural networks
Đorđe Petrović
Computer Science
Faculty of Electronic Engineering
Niš, Serbia
petrovicdj@gmail.com
Stefana Janićijević
Information Technology School
Comtrade
Belgrade, Serbia
stefana.janicijevic@its.edu.rs
AbstractThe text represents one of the most widespread
sequential models and as such is well suited to the application of
deep learning models from sequential data. Deep learning
through natural language processing is pattern recognition,
applied to words, sentences, and paragraphs. This study
describes the process of creating a pre-trained word embeddings
matrix and its subsequent use in various neural network models
for the purposes of domain-specific texts classification.
Embedding words is one of the popular ways to associate vectors
with words. Creating a word embedding matrix maps imply well
semantic relationship between words, which can vary from task
to task.
Keywords— embedding matrix, word embeddings, text mining,
neural networks, deep learning
I. INTRODUCTION
Deep learning models have achieved incredible results in
the field of Computer Vision in recent years. Computer Vision
is a field of research which primary task is to develop
techniques that assist computers to see and understand content
of digital images or video recordings. Part of Computer Vision
is a field of Speech Recognition and subfield of Natural
Language Processing. Lots of work in deep learning models
methods included vector representation learning of words
through neural language models and performing composition
over learned word vectors for classification [3]. The author
states that word vectors are projected onto a lower dimension
vector space over a hidden layer and that these are essential
property extractors that encode the semantic properties of
word in their dimensions.
Text is one of the most widely used sequential data and as
such it is well suited to the application of deep learning models
from sequential data. Natural Language Processing imply
deep learning that creates pattern recognition applied to
words, sentences and paragraphs in much the same way as
pattern recognition to pixels [2]. The same author explains that
like all other neural networks, deep learning models do not
take raw text as input, but only work with numerical tensors.
Text vectorization is the process of converting text to
numerical tensors and can be done in many ways [2]:
Segmentation of text into words and transformation of
each word into a vector
Text segmentation into characters and transformation of
each character into a vector
Separate multiple consecutive words or characters and
transform them into a vector
The common title that is further used for all these units that
can be divided into text is Tokens and the division process is
Tokenization.
This study describe the use of neural network models for
the classification of texts using pre-trained word embedding.
The aim of this is to classify texts automatically into one or
more predefined classes, using several different approaches,
where the implementation is based on the Keras open source
library (https://keras.io/). In order to use the Keras library for
textual data, this data must be processed first. For the purpose
of Tokenization the Keras class Tokenizer” is used. This
object takes as an argument the maximum number of words
that are stored after Tokenization, based on their frequency:
MAX_NB_WORDS = 50000
tokenizer = Tokenizer
(num_words=MAX_NB_WORDS)
tokenizer.fit_on_texts(texts)
Once the tokenizer s applied to the data, it could be used
to convert texts to strings of numbers. These numbers
represent the position of each word in the dictionary.
II. WORD EMBEDDING MATRIX
One popular way of associating vectors with words is to
use dense word vectors, called Word Embedding. These items
are vectors whose values are floating point with low
dimensions unlike rare vectors, they pack more information
into lower dimensions and they are learned from the data [2].
The same author states that there are two ways to obtain these
vectors:
To learn word insertion along with the main research
task. In this setup, it is started with random word
vectors and then the word vector is learned in the same
way as weights from neural network are learned.
To load the word insertion into your model which is
previously calculated, using some other machine
learning task and this is called Pre-trained Word
Embedding.
The simplest way to associate a dense vector with a word
is to select a vector with random values. The problem with this
approach is that the resulting embedding space lacks structure
and it is difficult for the deep neural network to grasp such a
noisy and unstructured space [2]. The same author goes on to
state that what constitutes a good space for word insertion
depends largely on the research that needs to be conducted.
The perfect word insertion space for movie-related taste
analysis may look different from the perfect embedding space
for example a legal document model, as the importance of
certain semantic relationships varies from task to task and it is
understood that new embedding space is learned for each new
assignment [2]. Fortunately, backpropagation and Keras
library make this easier. It is important for the embedding
layer to learn weights, for example:
from keras.layers import Embedding
embedding_layer = Embedding(1000, 64)
The embedding layer has two arguments, and in the
previous example is:
1000 – number of possible tokens
64 – dimensionality of installation
According to [2] the embedding layer takes as input the
2D integer tensor, shapes of sequence length, with each input
being a series of integers. All sequences in a group must have
the same length because they must be packed into one tensor,
so sequences that are shorter than the others should be filled
with zeros and longer sequences should be truncated. This
layer returns a 3D tensor with floating point values, shapes of
sequence length and dimensionality embedding. Created 3D
tensor can be processed in a recurrent neural network layer or
in a 1D convolutional layer.
Initially, there are random weights in the embedding layer,
just like any other layer. During training, these word vectors
gradually adapt and transform the space into something that
the lower layers can take advantage of. After complete
training the embedding layer will display a large structure –
the type of structure that specializes in the specific problem
the model is training for [2]. The same author further explains
that in situations where there is a small set of training data that
cannot be used to teach the proper vocabulary for a particular
task, then embedded vectors from a pre-calculated embedded
space can be loaded with highly structure. It performs useful
properties and captures the generic aspects of language
structure. In such cases, it is possible to reuse the properties
learned on the second problem. Such word insertion properties
are usually calculated using word occurrence statistics, using
different techniques, some of which involve neural networks
and some of the other techniques not including them [2]. There
are various pre-calculated databases with embedded word
properties, which can be downloaded and used, to create an
index that maps words as strings into their vector presentation
as numerical vectors [2]. The embedding matrices are then
created, which can be loaded into the embedding layer. It must
be a matrix whose dimensions are maximum number of words
and embeddedness dimension. Each entry contains a
dimensions vector for the word with the index “i” in the index
of the reference words [2]. In addition, the embedding layer
can be “frozen” by setting its trainable attribute to be “False”
[2]. This prevents the weights of that layer from being updated
during model training. If this is not done then the previously
learned weights will be modified during training. Also, the
model can be trained without loading pre-trained embedding
words and without freezing the embedding layer. In this case,
during the training process, the properties of embedding
tokens that are specific to a particular task are learned. This
style while lot of data is available is generally more powerful
than the pre-embedded properties of embedding words [2].
There are several reasons for creating a word embedding
matrix, some of which are:
When there is a large amount of textual data available
that is specific to the area under study, this makes it
possible to create such a matrix
Creating an embedding matrix well maps the semantic
relationship between words which can vary from task to
task
When training neural networks, with the embedded
word loading matrix and the embedding layer freezing,
the number of training parameters is significantly
reduced, thus speeding up model training
A. Methodology of Word Embedding Matrix creation
The process of creating a word embedding matrix consists
of the following stages:
1. Loading text data
2. Creating word vectors
3. Convert word vectors to numerical matrix, which is
suitable for TensorFlow and Keras models
4. Saving the created matrix model
Domain specific texts abound with specific language
forms and links between them. This study used a set of legal
texts in Serbian, which served as training data. It is a marked
data set, which is made up of larger texts segmented into
smaller units, each of which is represented by a single record
in the database. This set was distinguished by the fact that each
of these segments was assigned a corresponding designation,
or segments which were classified into 5 classes.
Creating word vectors could be done in the following way:
w2v = Word2Vec(data, size=emb_dim, window,
min_count, negative,
iter, workers=multiprocessing.cpu_count())
word_vectors = w2v.wv
Converting word vectors into numerical matrix which is
suitable for TensorFlow and Keras models could be performed
at following steps:
embedding_matrix =
np.zeros((len(w2v.wv.vocab), emb_dim))
for i in range(len(w2v.wv.vocab)):
embedding_vector =
w2v.wv[w2v.wv.index2word[i]]
if embedding_vector is not None:
embedding_matrix[i] =
embedding_vector
Created matrix model could be saved by following
instruction:
np.savetxt('embedding_matrix.txt',
embedding_matrix, delimiter=' ',
encoding='utf-8-sig')
In the experiment we conducted, the following parameters
were used when creating the word embedding matrix:
The dimensions of the embedding matrix is emb_dim =
400
Number of words that are observed before and after
indexed word is 5
The embedding matrix will only contain words that
appear at least 5 times in the texts
In relation to the observed word, there are maximum of
15 other words that have negative properties
Number of iterations is 5
The number of processes is as large as the number of
processors
As a result of this and on the dataset that was the
subject of this research, a pre-trained embedding matrix
was obtained; matrix has 43654x400 dimensions which
means that a vector of dimension 400 was created for
43645 words
B. The process of loading a pre-trained embedded matrix
The process of loading a pre-trained word embedding
matrix into a neural network could be performed as follows:
embeddings_index = {}
f =
open('embedding_matrix.txt',encoding='utf-
8-sig')
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],
dtype='float32')
embeddings_index[word] = coefs
f.close()
embedding_matrix =
np.random.random((len(word_index) + 1,
EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector =
embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding
index will be all-zeros.
embedding_matrix[i] =
embedding_vector
III. APPLICATION OF EMBEDDING MATRICES FOR NEURAL
NETWORK TRAINING PURPOSES
Recurrent Neural Networks (RNNs) are designed for
sequential data such as text sentences, time series, and other
discrete arrays, such as biological sequences. Working with
textual data are also the most common use cases of recurrent
neural networks, and these are some examples of the
application of [1]:
The input can be a series of words, and the output can
be the same series of words, plus one word, which
allows the next word to be predicted at any point in the
text. It is a classic language model in which one tries to
predict the next word, based on the sequential history of
the word;
The input can be a sentence in one language, and the
output can be a sentence in another language. In this
case, two recurrent neural networks can be connected to
learn models of translation between two languages.
Further, recurrent neural networks can be linked to
another type of network (eg, convolutional neural
networks), to learn image titles;
The input can be a series of words (eg sentences) and
the output can be a vector of probability of belonging to
a class. This approach is used for classification
purposes, such as to analyze feelings. This model will
be used in this research.
According to the same author [1], for these networks, the
input is of the form, where the d-dimensional point received
at some time t. In working with text, the vector contains one
"one-hot encoded word" at time t. This term refers to a vector
whose length is equal to the size of the dictionary, in which
the component relating to the relevant word is 1, and all other
components are 0. The key point of these neural networks is
the existence of a loop of its own that causes the hidden state
of the neural network change after each entry.
Weighting matrices (or weights) of links are shared with
multiple links in a time-stratified network to ensure that the
same function is used at all times. This sharing is the key to
domain-specific insights that the network learns. The
backpropagation algorithm takes split and temporary length
into account when updating weights, during the learning
process, and is used to determine whether each weighting
should increase or decrease. Due to the recursive nature,
recurrent neural networks have the ability to calculate
functions for variable length inputs.
A general way to apply a word embedding matrix in
Recurrent neural networks over an embedding layer is to:
embedding_layer =
Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input =
Input(shape=(MAX_SEQUENCE_LENGTH,),
dtype='int32')
embedded_sequences =
embedding_layer(sequence_input)
l_lstm =
Bidirectional(LSTM(units))(embedded_sequen
ces)
Although the recommended mode of machine learning
from text sequences is the one occurring in Recurrent neural
networks, the use of Convolutional neural networks has
become increasingly popular in recent years [1]. The same
author goes on to explain why Convolutional neural networks
do not at first glance seem naturally suited to texts research
posts:
When Convolutional neural networks are used to work
with images, the shapes found in the images are
interpreted in the same way, regardless of where they
are in the image. This is not the case with texts, because
the position of words in sentences is quite important.
Issues such as position translation and editing cannot be
treated in the same way in textual data, as is the case
with pictures. The adjacent pixels in the image are
usually very similar, while the neighboring words in the
text are never finished.
Despite these differences, systems based on
Convolutionary neural networks have shown improved
performance in recent years. Just as an image is represented as
a two-dimensional object with an additional dimension of
depth, defined by the number of color channels, the text
sequence is represented as a one-dimensional object with a
depth determined by its dimensionality of representation [1].
The same author further explains that when working with
texts, instead of the three-dimensional "boxes" used in image
manipulation, text data filters are two-dimensional "boxes"
whose dimensions are the length of the sequence, the sliding
along the text, and the depth defined for lexicon. The only
challenge in this approach is that the number of channels
increases and, consequently, the number of parameters in the
filters in the first layer increases.
A general way to apply a word embedding matrix in
Convolutional neural networks over an embedding layer is to:
embedding_layer =
Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input =
Input(shape=(MAX_SEQUENCE_LENGTH,),
dtype='int32')
embedded_sequences =
embedding_layer(sequence_input)
l_conv1= Conv1D(filters, kernel_size,
activation='relu')(embedded_sequences)
In addition to Recurrent and Convolutional Neural
Networks, Hierarchical Attention Networks have achieved
outstanding performance for classifying documents in a given
language [4]. This type of neural networks has been proposed
by [5] as a model that has the following characteristics:
It has a hierarchical structure, which reflects the
hierarchical structure of documents;
It has two levels of attention mechanisms, applied at the
sentence level, that allow different presence for more or
less important content when constructing a document
representation.
The intuition underlying this model is that not all sections
of the document are equally relevant to the answer to the query
and that determining the relevant sections involves modeling
the interaction of words, not just their presence [5]. The
designed architecture gathers two basic insights into the
structure of the document. First, since documents have a
hierarchical structure (words form sentences and sentences
form a document), in this model, the document view is
constructed by first constructing a sentence view and then
integrating those views into a document view.
Secondly, it is noted that different words and sentences in
the documents are of different informative nature. Moreover,
the significance of words and sentences depends on the
context, ie. the same word or sentence may have different
meanings in a different context [5]. To include sensitivity to
this fact, this model includes two levels of attention
mechanism, one at the word level and the other at the sentence
level. This allows the model to convert more or less attention
to particular words or sentences when constructing a
document representation.
The key novelty in this approach is that this system uses
context to detect when the token order is relevant, rather than
simply filtering token sequences, taken out of context.
Experiments conducted by the authors of [5] have shown that
the proposed architecture far outweighs the previous methods.
Visualization of the layers of attention illustrates that the
model chooses qualitatively informative words and sentences.
The general way to apply the word embedding matrix in
hierarchical neural networks with a rigging mechanism over
the embedding layer is to:
embedding_layer =
Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SENT_LENGTH,
trainable=False)
sentence_input =
Input(shape=(MAX_SENT_LENGTH,),
dtype='int32')
embedded_sequences =
embedding_layer(sentence_input)
l_lstm =
Bidirectional(LSTM(units))(embedded_sequen
ces)
IV. EVALUATION
In the Recurrent neural network model, that we trained in
our experiment and using a word embedding matrix, the total
number of model parameters was 48,565,705 and the number
of training parameters was only 180,905, or 0.372%. Thus, in
this way the number of training parameters was reduced by
over 99.627%.
In the Convolutional neural network model, that we
trained in our experiment and using a word embedding matrix,
the total number of model parameters was 48,822,181 and the
number of training parameters was only 437,381, or 0.896%.
Thus, in this way the number of training parameters was
reduced by over 99.104%.
In the Hierarchical Attention Network model, that we
trained in our experiment and using a word embedding matrix,
the total number of model parameters was 48,584,765 and the
number of training parameters was only 199,965, or 0.412%.
Thus, in this way the number of training parameters was
reduced by over 99.588%.
Based on the examples above, it can be seen that the
number of parameters of all these neural network models has
been reduced by over 99%, so training of these models is also
accelerated.
V. CONCLUSION
The goal of machine learning is the ability to later apply
trained models for prediction purposes on unlabeled data, and
in areas for which training has been performed. Domain-
specific texts abound with language forms and links between
them, which is why special attention has been given in this
research to the process of associating vectors with words.
As the "Placing the words" one of the popular ways to do
this, creating a matrix for embedding words can be mapped to
a good relationship between words in texts that are specific to
a domain. Subsequent application of the matrix to embed
words in other models of machine learning from text data from
the same domain, significantly reduces the number of
parameters for training and thus can accelerate the training of
these models.
VI. REFERENCES
[1] C. C. Aggarwal, “Neural Networks and Deep Learning”, s.l.:Springer
International Publishing, 2018.
[2] F. Chollet, “Deep Learning with Python”, s.l.:Manning Publications
Co., 2018.
[3] Y. Kim, “Convolutional Neural Networks for Sentence Classification”,
Doha, Qatar, Conference on Empirical Methods in Natural Language
Processing, 2014.
[4] N. Pappas & A. Popescu-Belis, “Multilingual Hierarchical Attention
Networks for Document Classification”, Taipei, Taiwan, Proceedings
of the 8th International Joint Conference on Natural Language
Processing (IJCNLP), 2017.
[5] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, “Hierarchical
Attention Networks for Document Classification”, San Diego,
California, Proceedings of the 2016 Conference of the North American
Chapter of the Association for Computational Linguistics: Human
Language Technologies, 2016.
... Petrović proposes utilizing models and neural networks as a potential remedy to meet the demand for machine prediction of links or references within the text of newly enacted laws and other regulations (Petrović and Janičijević, 2019;Petrović, 2020). Training and validation of neural networks (RNN -Recurrent neural networks, CNN -convolutional neural networks, and HAN -hierarchical attention network model) are performed on a labeled data set, which is made by assigning to each segment of the text of the law (each article of the law) a corresponding label on the existence, or non-existence of a link or reference in that segment of the text. ...
Preprint
Full-text available
The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.
... Common applications of neural networks used in NLP are predicting the next word by context to develop models from the perspective of probability. The researchers proposed a word embedding language model based on neural networks (i.e., distributed representation) to settle the problem of word vectors [17]. The model was designed to predict experimental words and the word vectors were just the by-product of the optimized model. ...
Article
Full-text available
Collecting and analyzing data from all devices to improve the efficiency of business processes is an important task of Industrial Internet of Things (IIoT). In the age of data explosion, extensive text data generated by the IIoT have given birth to a variety of text representation methods. The task of text representation is to convert the natural language to a form that computer can understand with retaining the original semantics. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in natural language. Combining the advantages of convolutional neural network (CNN) and variational autoencoder (VAE), this paper proposes an intelligent CNN-VAE text representation algorithm as an advanced learning method for social big data within next-generation IIoT, which help users identify the information collected by sensors and perform further processing. This method employs the convolution layer to capture the local features of the context and uses the variational technique to reconstruct feature space to make it conform to the normal distribution. In addition, the improved word2vec model based on topical word embedding (TWE) is utilized to add topical information to word vectors to distinguish polysemy. This paper takes the social big data as an example to illustrate the way of the proposed algorithm applied in the next-generation IIoT and utilizes Cnews dataset to verify the performance of proposed method with four evaluating metrics (i.e., recall, accuracy, precision, and F1-score). Experimental results indicate that the proposed method outperforms word2vec-avg and CNN-AE in K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) classifiers and distinguishes polysemy effectively.
... 1 Learning embeddings for the corpus by setting up a neural network with random initial weights and then the embedding layer learns weights for all the words from the training dataset. 2 Loading pre-trained word embeddings into the model and freezing the embedding layer of the neural network by disabling the trainable attribute [28]. ...
Article
Full-text available
Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.
... In order to solve the problem of word vector, the researchers proposed the word embedding based on neural network language model (i.e. distributed representation) [6]. The early proposed neural network language model used the current word to predict the next word to build a neural network model from the perspective of probability. ...
Preprint
In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
Article
Full-text available
Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language entails linear parameter growth and lack of cross-language transfer. Learning a single multilingual model with fewer parameters is therefore a challenging but potentially beneficial objective. To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or attention mechanisms across languages, using multi-task learning and an aligned semantic space as input. We evaluate the proposed models on multilingual document classification with disjoint label sets, on a large dataset which we provide, with 600k news documents in 8 languages, and 5k labels. The multilingual models outperform strong monolingual ones in low-resource as well as full-resource settings, and use fewer parameters, thus confirming their computational efficiency and the utility of cross-language transfer.
Book
This book covers both classical and modern models in deep learning. The chapters of this book span three categories: The basics of neural networks: Many traditional machine learning models can be understood as special cases of neural networks. An emphasis is placed in the first two chapters on understanding the relationship between traditional machine learning and neural networks. Support vector machines, linear/logistic regression, singular value decomposition, matrix factorization, and recommender systems are shown to be special cases of neural networks. These methods are studied together with recent feature engineering methods like word2vec. Fundamentals of neural networks: A detailed discussion of training and regularization is provided in Chapters 3 and 4. Chapters 5 and 6 present radial-basis function (RBF) networks and restricted Boltzmann machines. Advanced topics in neural networks: Chapters 7 and 8 discuss recurrent neural networks and convolutional neural networks. Several advanced topics like deep reinforcement learning, neural Turing machines, Kohonen self-organizing maps, and generative adversarial networks are introduced in Chapters 9 and 10. The book is written for graduate students, researchers, and practitioners. Numerous exercises are available along with a solution manual to aid in classroom teaching. Where possible, an application-centric view is highlighted in order to provide an understanding of the practical uses of each class of techniques.
Article
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We first show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static word vectors. The CNN models discussed herein improve upon the state-of-the-art on 4 out of 7 tasks, which include sentiment analysis and question classification.
Deep Learning with Python
  • F Chollet
F. Chollet, "Deep Learning with Python", s.l.:Manning Publications Co., 2018.