This folder contains several word embeddings files in a text format (one token per line, token followed by the vector). :: Word Embeddings :: glove.6B.300d.txt.gz - https://nlp.stanford.edu/projects/glove/ glove.840B.300d.txt.gz - https://nlp.stanford.edu/projects/glove/ GoogleNews-vectors-negative300.txt.gz - word2vec trained on Google News from https://code.google.com/archive/p/word2vec/ komninos_english_embeddings.gz - https://www.cs.york.ac.uk/nlp/extvec/ , cleaned to only include embeddings for words levy_english_dependency_embeddings.gz - Dependency-based embeddings by Levy & Goldberg - https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/ reimers_german_embeddings.gz - German word2vec embeddings - https://www.informatik.tu-darmstadt.de/ukp/research_6/ukp_in_challenges/germeval_2014/index.en.jsp :: Word Frequency Files :: wikipedia_doc_frequencies.txt - Estimates the document frequency for words on the Englisch Wikipedia. First line is the number of documents. Words in decreasing frequency with a frequency of at least 2. Can be used for td-idf computation wikipedia_word_frequencies.txt - Information how frequent each word is in wikipedia. First line is the total number of words. Words in decreasing frequency with a frequency of at least 2.