• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

technicolor-research/dsve-loc: Deep semantic-visual embedding with localization

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

technicolor-research/dsve-loc

开源软件地址(OpenSource Url):

https://github.com/technicolor-research/dsve-loc

开源编程语言(OpenSource Language):

Python 100.0%

开源软件介绍(OpenSource Introduction):

Deep semantic-visual embedding with localization

Training and evaluation code for the paper Finding beans in burgers: Deep semantic-visual embedding with localization

This code allow training of new model, reproduction of experiments, as well as features extraction for both images and texts.

Author and contact: Martin Engilberge

Main dependencies

This code is written in python. To use it you will need:

  • Python 3.7
  • Pytorch 1.0
  • SRU[cuda]
  • Numpy
  • Scipy
  • Torchvision
  • Ms Coco API (pycocotools)
  • Visual Genome API
  • NLTK
  • opencv

An environment file for conda is available in the repository (environment.yml).

Getting started

You will first need to set the paths to the datasets and word embedding in the file misc/config.py Commentaries in the config file contains links where you can download the data.

To train and run model you will need:

To reproduce experiments in the paper:

Once the required paths have been set in the config file you can start training models using the following command:

python train.py

By default all the scripts use gpu, you can switch to cpu mode by uncommenting device = torch.device("cpu") at the beginning of the script.

Model evaluation

Models can be evaluated on three tasks:

  • cross modal retrieval:
python eval_retrieval.py -p "path/to/model/model.pth.tar" -te
  • pointing game:
python pointing_game.py -p "path/to/model/model.pth.tar"
  • semantic segmentation:
python semantic_seg.py -p "path/to/model/model.pth.tar"

Features extraction

The features space produced by the joint embedding manages to capture semantic property. Two scripts can be used to extract feature from that space for images and texts.

For images the script takes a folder as input and produces the embedding representation for all the jpeg images in the folder.

python image_features_extraction.py -p "path/to/model/model.pth.tar" -d "path/to/image/folder/" -o "path/to/output/file"

For text the script takes a text file and produces the embedding representation for each line.

python text_features_extraction.py -p "path/to/model/model.pth.tar" -d "path/to/text/file/" -o "path/to/output/file"

Reference

If you found this code useful, please cite the following paper:

@inproceedings{engilberge2018finding,
  title={Finding beans in burgers: Deep semantic-visual embedding with localization},
  author={Engilberge, Martin and Chevallier, Louis and P{\'e}rez, Patrick and Cord, Matthieu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3984--3993},
  year={2018}
}

License

by downloading this program, you commit to comply with the license as stated in the LICENSE.md file.




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap