golang implement of Tomas Mikolov's word/document embedding. You may want to feel the basic idea from Mikolov's two orignal papers, word2vec and doc2vec. More recently, Andrew M. Dai etc from Google reported its power in more detail
[@bjsjs_11_83 doc2vec-golang]$ ./control build
traning Exec build ok
build ok
# The training data(data/zhihu_data.1w) is one document per line, two columns divided by tab, 
# the first column is id, and the second column is the segmented document separated by spaces.
[@bjsjs_11_83 doc2vec-golang]$ ./train  data/zhihu_data.1w          
Skip-Gram Iter:48 Alpha: 0.000796  Progress: 96.81%  Words/sec: 24.27k  
2018-03-30 14:53:00.218536235 +0800 CST training end, 1342521 26861
[@bjsjs_11_83 doc2vec-golang]$ ./knn 2.model 
please select operation type:
        0:word2words
        1:doc_likelihood
        2:leave one out key words
        3:sen2words
        4:sen2docs
        5:word2docs
        6:doc2docs
        7:doc2words
0
Enter text:网页
        1       网页
        0.7823723719117796      不让
        0.7651260773728028      浏览
        0.7642516944020028      邮件
        0.7601415883811553      近
        0.7517607921006224      迷恋
        0.7492900066365179      等同
        0.7485966355448261      传说
        0.7463299535930537      基于
        0.7447865182221745      版
please select operation type:
        0:word2words
        1:doc_likelihood
        2:leave one out key words
        3:sen2words
        4:sen2docs
        5:word2docs
        6:doc2docs
        7:doc2words- doc2vec支持CBOW和Skip-Gram两种模型,Negative Sampling和Hierarchical Softmax优化均已实现
- online infer document
- likelihood of document
- doc2words
- doc2docs
- word2words
- word2docs
- wmd
- doc2vec添加同义词语义约束
- 句子提取核心词
- google word2vec 实现
- hiyijian/doc2vec
- word2vec语义约束
- doc2vec添加同义词语义约束