This folder contains several word embeddings files in a text format (one token per line, token followed by the vector).

:: Word Embeddings ::
glove.6B.300d.txt.gz - https://nlp.stanford.edu/projects/glove/
glove.840B.300d.txt.gz - https://nlp.stanford.edu/projects/glove/
GoogleNews-vectors-negative300.txt.gz - word2vec trained on Google News from https://code.google.com/archive/p/word2vec/
komninos_english_embeddings.gz - https://www.cs.york.ac.uk/nlp/extvec/ , cleaned to only include embeddings for words 
levy_english_dependency_embeddings.gz - Dependency-based embeddings by Levy & Goldberg - https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
reimers_german_embeddings.gz - German word2vec embeddings - https://www.informatik.tu-darmstadt.de/ukp/research_6/ukp_in_challenges/germeval_2014/index.en.jsp


:: Word Frequency Files ::
wikipedia_doc_frequencies.txt - Estimates the document frequency for words on the Englisch Wikipedia. First line is the number of documents. Words in decreasing frequency with a frequency of at least 2. Can be used for td-idf computation
wikipedia_word_frequencies.txt - Information how frequent each word is in wikipedia. First line is the total number of words. Words in decreasing frequency with a frequency of at least 2.