== Description ==


------------------
METHODS
------------------
obtaining pre-trained embeddings:
Textual embeddings
	openly available pre-trained Glove representations (Pennington et al., 2014) http://aclweb.org/anthology/D14-1162
Visual embeddings
	applying pre-trained VGG19 neural network for image classification (Simonyan and Zisserman, 2014) https://arxiv.org/pdf/1409.1556.pdf
Mapped embeddings
	applying imagined method by Collell et al. (2017) to Textual and Visual embeddings http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14811/14042

two visual datasets for images for verbs: 
	the Google dataset Kiela et al. (2016) https://aclweb.org/anthology/D/D16/D16-1043.pdf , data: http://www.cl.cam.ac.uk/~dk427/cnnexpts.html
	imSitu dataset Yatskar et al. (2016) https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yatskar_Situation_Recognition_Visual_CVPR_2016_paper.pdf , data: http://imsitu.org/

------------------
FILES
------------------
File SimVerb-3500.txt
	contains verb similarity ratings from SimVerb dataset (Gerz et al., 2016) http://www.aclweb.org/anthology/D16-1235

Files kwan_et_al_verb_embodiment_ratings_Verb.txt and kwan_et_al_verb_embodiment_ratings_SCOREMEAN.txt
	contain embodiment ratings for verbs from (Sidhu et al., 2014) http://iranarze.ir/wp-content/uploads/2017/08/7298-English-IranArze.pdf

File out_google-googlenet.w2vt
	Visual embeddings for Google dataset (1024 dim)

File weights_imagined_vanilia_glove_googlenet.w2vt
	Mapped embeddings for Google dataset (1024 dim)

File imSitu_verbs_averagedEmbeddings.w2vt
	Visual embeddings for imSitu dataset (4096 dim)

File weights_imagined_vanilia_glove_imSitu.w2vt
	Mapped embeddings for imSitu dataset (4096 dim)


== Format ==

Each .w2vt file conforms to the word2vec file text format and has a header with the number of rows (vocabulary size) and the dimensionality of the embeddings, then embeddings as text one per row
Each line of the .w2vt files after the header line has the token in the first position, followed by floating point numbers. Words and the individual entries of the vectors are seperated by a space.

Using Python, you can read in the data via:
from gensim.models.keyedvectors import KeyedVectors
vsm = KeyedVectors.load_word2vec_format('path/file.w2vt', binary=False, unicode_errors='ignore')
vsm.word_vec('your_test_word')


== License ==

Feel free to distribute these word embeddings under the CC-By License (http://creativecommons.org/licenses/by/4.0/). 

If you use these word embeddings in your research, please cite:
Lisa Beinborn, Teresa Botschen and Iryna Gurevych. 2018. Multimodal Grounding for Language Processing.

@inproceedings{beinborn2018multimodal,
  title = {{Multimodal Grounding for Language Processing}},
	author = {Beinborn, Lisa and Botschen, Teresa and Gurevych, Iryna},
	publisher = {Association for Computational Linguistics},
	booktitle = {Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics: Technical Papers},
	pages = {to appear},
	month = {aug},
	year = {2018},
	location = {Santa Fe, USA},
}