.. _word2vec:

Word2Vec
########

Review: Bag-of-words
====================

* 详情参见 :ref:`bag-of-words <bag-of-words>`

Word2Vec
========

* Word2Vec 是一种广泛使用的基于神经网络的算法，通常称为 “深度学习”（尽管 word2vec 本身相当浅薄）。 word2vec 使用大量未注释的纯文本，自动学习单词之间的关系。输出是向量，每个单词一个向量，具有显着的线性关系，使我们能够执行以下操作::

    vec(“king”) - vec(“man”) + vec(“woman”) =~ vec(“queen”)
    vec("中国北京") – vec("中国") + vec("日本") =~ vec("日本东京").


* Word2Vec 是一种更新的模型，它使用浅层神经网络将单词嵌入到低维向量空间中。
* 结果是一组词向量，其中向量空间中靠近的向量根据上下文具有相似的含义，而彼此远离的词向量具有不同的含义。例如， strong 和 powerful 会很近， strong 和 Paris 会相对较远。
* The are two versions of this model and Word2Vec class implements them both::

    1. Skip-grams (SG)
    2. Continuous-bag-of-words (CBOW)


参考
====

* Original C toolkit and word2vec papers by Google: https://code.google.com/archive/p/word2vec/
* API docs: gensim.models.word2vec: https://radimrehurek.com/gensim/models/word2vec.html#module-gensim.models.word2vec

* Introduces Gensim’s Word2Vec model and demonstrates its use on the `Lee Evaluation Corpus <https://hekyll.services.adelaide.edu.au/dspace/bitstream/2440/28910/1/hdl_28910.pdf>`_