== Description == ------------------ METHODS ------------------ obtaining pre-trained embeddings: Textual embeddings openly available pre-trained Glove representations (Pennington et al., 2014) http://aclweb.org/anthology/D14-1162 Visual embeddings applying pre-trained VGG19 neural network for image classification (Simonyan and Zisserman, 2014) https://arxiv.org/pdf/1409.1556.pdf Mapped embeddings applying imagined method by Collell et al. (2017) to Textual and Visual embeddings http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14811/14042 two visual datasets for images for verbs: the Google dataset Kiela et al. (2016) https://aclweb.org/anthology/D/D16/D16-1043.pdf , data: http://www.cl.cam.ac.uk/~dk427/cnnexpts.html imSitu dataset Yatskar et al. (2016) https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yatskar_Situation_Recognition_Visual_CVPR_2016_paper.pdf , data: http://imsitu.org/ ------------------ FILES ------------------ File SimVerb-3500.txt contains verb similarity ratings from SimVerb dataset (Gerz et al., 2016) http://www.aclweb.org/anthology/D16-1235 Files kwan_et_al_verb_embodiment_ratings_Verb.txt and kwan_et_al_verb_embodiment_ratings_SCOREMEAN.txt contain embodiment ratings for verbs from (Sidhu et al., 2014) http://iranarze.ir/wp-content/uploads/2017/08/7298-English-IranArze.pdf File out_google-googlenet.w2vt Visual embeddings for Google dataset (1024 dim) File weights_imagined_vanilia_glove_googlenet.w2vt Mapped embeddings for Google dataset (1024 dim) File imSitu_verbs_averagedEmbeddings.w2vt Visual embeddings for imSitu dataset (4096 dim) File weights_imagined_vanilia_glove_imSitu.w2vt Mapped embeddings for imSitu dataset (4096 dim) == Format == Each .w2vt file conforms to the word2vec file text format and has a header with the number of rows (vocabulary size) and the dimensionality of the embeddings, then embeddings as text one per row Each line of the .w2vt files after the header line has the token in the first position, followed by floating point numbers. Words and the individual entries of the vectors are seperated by a space. Using Python, you can read in the data via: from gensim.models.keyedvectors import KeyedVectors vsm = KeyedVectors.load_word2vec_format('path/file.w2vt', binary=False, unicode_errors='ignore') vsm.word_vec('your_test_word') == License == Feel free to distribute these word embeddings under the CC-By License (http://creativecommons.org/licenses/by/4.0/). If you use these word embeddings in your research, please cite: Lisa Beinborn, Teresa Botschen and Iryna Gurevych. 2018. Multimodal Grounding for Language Processing. @inproceedings{beinborn2018multimodal, title = {{Multimodal Grounding for Language Processing}}, author = {Beinborn, Lisa and Botschen, Teresa and Gurevych, Iryna}, publisher = {Association for Computational Linguistics}, booktitle = {Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics: Technical Papers}, pages = {to appear}, month = {aug}, year = {2018}, location = {Santa Fe, USA}, }