Word Embeddings
不论是哪一国语言,如果我们考虑请语言学家,根据语法和规则来学习某门语言,由于每种语言有他的语法,耗费的人力成本和时间成本非常高,并且基于语法和规则的翻译,系统也是很难编写的。Google 就是一个例子,他们每开除一名语言学家,机器翻译准确率就会提升。尽管不同语言有不同的语法,但是他们都有更抽象abstract和更加通用general的结构能够被学习到,那是什么呢,你会发现,大多数语言都是由字符构成单词(词素,词素再构成单词un+fortunate+ly)(汉语,日语,韩语可能是基本笔画构成字),词构成句子,句子构成段落,段落构成文章。然后从语义上来讲,不同语言相同语义的某些词,他必定出现在不同语言相同语义的上下文,什么意思呢?比如,"机器学习是实现人工智能的最好的方法之一","machine learning is one of best ways of achieving artificial intelligience"。不管是哪一种语言,机器学习(machine learning)可能往往出现在人工智能(artificial intelligience)的上下文。这些都是可以学习的隐含pattern。你会发现,一旦你找到了一种可以学习这些pattern的模型,那么你就找到了更加通用的解决方案,不同的语言,使用语料库进行训练即可得到模型,不需要和传统的基于规则和语法的编程一样,每次换一门语言,需要重新学习规则,重新编写系统。
然后,为什么要用向量来表示一个词语呢?有意义么?答案当然是有的!我们知道,向量是一组不同的元素构成的,我们最开始学习的向量,其实是物理中的矢量,就是带方向的,通过(x,y,z) 来分别指明三个方向,向量就是更高维度的矢量了,他的每一个维度都是有具体含义的。一个词是不是有不同的属性和特点,比如他是消极词的概率,他使用的频率,他是副词的概率,他是动词的概率等等。因此,向量可以比较完善的表示出一个词的意义和结构。
word represention
embedding matrix
为什么Word2Vec 不使用正则项?
为什么Word2Vec 隐藏层没有使用激活函数?
Word2Vec 两个矩阵能够使用同一个吗?也就是说 W.T = W'
目前网上没人思考过这个问题,目前的解释认为从one-hot 输入到隐层其实是编码进入词向量空间,第一层映射是真实的拿到词向量,也就是说第一个矩阵才是学习我们需要的词向量的矩阵,从隐层到输出是解码到one-hot,但是第二层映射解码出来的目的不是变会原来输入单词的one-hot,而是尽可能匹配出上下文单词的one-hot。也即是说第二层映射过程中发生的是距离计算和度量,第二个矩阵学习到的是某个词向量和他上下文距离远近的关系。
Learning Word Embeddings
hierarchical softmax & negative sampling for speeding training
GloVe word vectors
sentiment classification
1 - 简单模型
不考虑单词顺序,将词向量简单平均后扔进普通神经网络做多分类。该方法简单粗暴,无法考虑句子中词语的反义即顺序关系。比如,they are lacking good taste, good service and good postion. 这句话很容易在这种网络中被认为是好的类。因为出现了很多 good. 由于未考虑时序关系,lacking good 导致。
$$ z^{(i)} = W . avg^{(i)} + b$$
$$ a^{(i)} = softmax(z^{(i)})$$
$$ \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Yoh^{(i)}_k * log(a^{(i)}_k)$$
2 - 基于RNN的序列模型
考虑单词顺序对句子的影响。many-to-one 的 rnn 模型。rnn 会考虑前后单词之间的关系,因此 rnn 是更好的选择。
由于不同的输入sentence长度不一样,如果不对输入句子做特殊处理,就无法并行计算所有句子。这里采用的方法就是 padding.
# GRADED FUNCTION: sentences_to_indices def sentences_to_indices(X, word_to_index, max_len): """ Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences. The output shape should be such that it can be given to `Embedding()` (described in Figure 4). Arguments: X -- array of sentences (strings), of shape (m, 1) word_to_index -- a dictionary containing the each word mapped to its index max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. Returns: X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len) """ m = X.shape[0] # number of training examples ### START CODE HERE ### # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line) X_indices = np.zeros((m, max_len)) for i in range(m): # loop over training examples # Convert the ith training sentence in lower case and split is into words. You should get a list of words. sentence_words = [word.lower() for word in X[i].strip().split(' ')] # Initialize j to 0 j = 0 # Loop over the words of sentence_words for w in sentence_words: # Set the (i,j)th entry of X_indices to the index of the correct word. X_indices[i, j] = word_to_index[w] if w != '' else 0 # Increment j to j + 1 j = j + 1 ### END CODE HERE ### return X_indices
使用keras 实现 word embedding layer
# GRADED FUNCTION: pretrained_embedding_layer def pretrained_embedding_layer(word_to_vec_map, word_to_index): """ Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors. Arguments: word_to_vec_map -- dictionary mapping words to their GloVe vector representation. word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words) Returns: embedding_layer -- pretrained layer Keras instance """ vocab_len = len(word_to_index) + 1 # adding 1 to fit Keras embedding (requirement) emb_dim = word_to_vec_map["cucumber"].shape[0] # define dimensionality of your GloVe word vectors (= 50) ### START CODE HERE ### # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim) emb_matrix = np.zeros((vocab_len, emb_dim)) # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary for word, index in word_to_index.items(): emb_matrix[index, :] = word_to_vec_map[word] # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. embedding_layer = Embedding(vocab_len, emb_dim, trainable=False) ### END CODE HERE ### # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None". embedding_layer.build((None,)) # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained. embedding_layer.set_weights([emb_matrix]) return embedding_layer # GRADED FUNCTION: Emojify_V2 def Emojify_V2(input_shape, word_to_vec_map, word_to_index): """ Function creating the Emojify-v2 model's graph. Arguments: input_shape -- shape of the input, usually (max_len,) word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words) Returns: model -- a model instance in Keras """ ### START CODE HERE ### # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices). sentence_indices = Input(shape=input_shape, dtype='int32') # Create the embedding layer pretrained with GloVe Vectors (≈1 line) embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index) # Propagate sentence_indices through your embedding layer, you get back the embeddings embeddings = embedding_layer(sentence_indices) # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state # Be careful, the returned output should be a batch of sequences. X = LSTM(128, return_sequences = True)(embeddings) # Add dropout with a probability of 0.5 X = Dropout(0.5)(X) # Propagate X trough another LSTM layer with 128-dimensional hidden state # Be careful, the returned output should be a single hidden state, not a batch of sequences. X = LSTM(128)(X) # Add dropout with a probability of 0.5 X = Dropout(0.5)(X) # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors. X = Dense(5)(X) # Add a softmax activation X = Activation('softmax')(X) # Create Model instance which converts sentence_indices into X. model = Model(inputs=sentence_indices, outputs=X) ### END CODE HERE ### return model ''' maxLen = 20 model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index) model.summary() model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen) Y_train_oh = convert_to_one_hot(Y_train, C = 5) model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True) X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen) Y_test_oh = convert_to_one_hot(Y_test, C = 5) loss, acc = model.evaluate(X_test_indices, Y_test_oh) print() print("Test accuracy = ", acc) '''
