Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation code #82

Open
wants to merge 109 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
7e3435a
Delete FLAGS.md
chenxinpeng Feb 14, 2017
3e8be1c
Delete INSTALL.md
chenxinpeng Feb 14, 2017
a8af4ae
Delete .gitignore
chenxinpeng Feb 14, 2017
f010818
Delete train_opts.lua
chenxinpeng Feb 14, 2017
ad981aa
Delete train.lua
chenxinpeng Feb 14, 2017
ecd45c4
Delete run_model.lua
chenxinpeng Feb 14, 2017
b2d12e2
Delete preprocess.py
chenxinpeng Feb 14, 2017
3992aca
Delete LICENSE.md
chenxinpeng Feb 14, 2017
b05249d
Delete README.md
chenxinpeng Feb 14, 2017
90f0995
Delete models.lua
chenxinpeng Feb 14, 2017
3db44c6
Delete evaluate_model.lua
chenxinpeng Feb 14, 2017
76a6c44
Delete .gitignore
chenxinpeng Feb 14, 2017
3392eee
Delete README.md
chenxinpeng Feb 14, 2017
9396520
Delete eval_utils.lua
chenxinpeng Feb 14, 2017
62738cc
Delete meteor_bridge.py
chenxinpeng Feb 14, 2017
39232b7
Delete chrome_ssl_screen.png
chenxinpeng Feb 14, 2017
aa07462
Delete elephant.jpg
chenxinpeng Feb 14, 2017
3776ad1
Delete resultsfig.png
chenxinpeng Feb 14, 2017
dacf714
Delete README.md
chenxinpeng Feb 14, 2017
b81f840
Delete README.md
chenxinpeng Feb 14, 2017
356ca42
Delete .gitignore
chenxinpeng Feb 14, 2017
b864fac
Delete README.md
chenxinpeng Feb 14, 2017
91b76b7
Delete daemon.lua
chenxinpeng Feb 14, 2017
d6406f1
Delete requirements.txt
chenxinpeng Feb 14, 2017
6b309a8
Delete server.py
chenxinpeng Feb 14, 2017
d1bc212
Delete simple_https_server.py
chenxinpeng Feb 14, 2017
d91c980
Delete single_machine_demo.lua
chenxinpeng Feb 14, 2017
66a28a0
Delete web-client.html
chenxinpeng Feb 14, 2017
044c8f1
Delete web-client.js
chenxinpeng Feb 14, 2017
17c5479
Delete README.md
chenxinpeng Feb 14, 2017
b5b5292
Delete README.md
chenxinpeng Feb 14, 2017
2eddb2a
Delete d3.min.js
chenxinpeng Feb 14, 2017
0a840a6
Delete jquery-1.8.3.min.js
chenxinpeng Feb 14, 2017
5b2f59c
Delete style.css
chenxinpeng Feb 14, 2017
158e654
Delete utils.js
chenxinpeng Feb 14, 2017
066aae8
Delete view_results.html
chenxinpeng Feb 14, 2017
226620e
Delete .gitignore
chenxinpeng Feb 14, 2017
45368ca
Delete ApplyBoxTransform_test.lua
chenxinpeng Feb 14, 2017
b03bf41
Delete BatchBilinearSamplerBHWD_test.lua
chenxinpeng Feb 14, 2017
3c4b594
Delete BilinearRoiPooling_test.lua
chenxinpeng Feb 14, 2017
64c4e7d
Delete BoxIoU_test.lua
chenxinpeng Feb 14, 2017
f58992c
Delete BoxRegressionCriterion_test.lua
chenxinpeng Feb 14, 2017
10ad896
Delete BoxSamplerHelper_test.lua
chenxinpeng Feb 14, 2017
9f50885
Delete BoxSampler_test.lua
chenxinpeng Feb 14, 2017
95de843
Delete BoxToAffine_test.lua
chenxinpeng Feb 14, 2017
17cdc84
Delete BoxToAffine_visual_test.ipynb
chenxinpeng Feb 14, 2017
e9f09fe
Delete DenseCapModel_test.lua
chenxinpeng Feb 14, 2017
f4507b5
Delete InvertBoxTransform_test.lua
chenxinpeng Feb 14, 2017
602f583
Delete LanguageModel_test.lua
chenxinpeng Feb 14, 2017
914a88a
Delete LocalizationLayer_test.lua
chenxinpeng Feb 14, 2017
d4768f0
Delete MakeAnchors_test.lua
chenxinpeng Feb 14, 2017
2ffecb1
Delete MakeBoxes_test.lua
chenxinpeng Feb 14, 2017
c93908b
Delete ReshapeBoxFeatures_test.lua
chenxinpeng Feb 14, 2017
bc40081
Delete box_conversion_test.lua
chenxinpeng Feb 14, 2017
3a354ab
Delete clip_boxes_test.lua
chenxinpeng Feb 14, 2017
c749e4f
Delete evaluation_test.lua
chenxinpeng Feb 14, 2017
7f0fa35
Delete nms_test.lua
chenxinpeng Feb 14, 2017
47ba7bf
Delete run_all.lua
chenxinpeng Feb 14, 2017
75a6c13
Delete setup_eval.sh
chenxinpeng Feb 14, 2017
ec4c02f
Delete download_models.sh
chenxinpeng Feb 14, 2017
62947bf
Delete download_pretrained_model.sh
chenxinpeng Feb 14, 2017
0dc2b3e
Delete densecap_splits.json
chenxinpeng Feb 14, 2017
a4ec086
Create README.md
chenxinpeng Feb 14, 2017
cac1b4e
Create split_dataset.py
chenxinpeng Feb 14, 2017
bef63f7
Update split_dataset.py
chenxinpeng Feb 14, 2017
536b221
Create README.md
chenxinpeng Feb 14, 2017
b0d8a23
Update README.md
chenxinpeng Feb 14, 2017
7ff32e9
Update README.md
chenxinpeng Feb 14, 2017
fe1443a
Update README.md
chenxinpeng Feb 14, 2017
54bbe6e
Update README.md
chenxinpeng Feb 14, 2017
28a11cf
Create parse_json.py
chenxinpeng Feb 14, 2017
5bec63c
Update parse_json.py
chenxinpeng Feb 14, 2017
4b74361
Add files via upload
chenxinpeng Feb 14, 2017
d52c4f8
Update README.md
chenxinpeng Feb 14, 2017
defa8a4
Update README.md
chenxinpeng Feb 14, 2017
e56c194
Delete paragraphs_v1.json
chenxinpeng Feb 14, 2017
0ef1801
Add files via upload
chenxinpeng Feb 14, 2017
bbb5e6a
Update README.md
chenxinpeng Feb 14, 2017
0286216
Create README.md
chenxinpeng Feb 14, 2017
d9e75e0
Add files via upload
chenxinpeng Feb 14, 2017
3bf0cbb
Update README.md
chenxinpeng Feb 14, 2017
75b842f
Update README.md
chenxinpeng Feb 14, 2017
4806349
Add files via upload
chenxinpeng Feb 14, 2017
8a91980
Update README.md
chenxinpeng Feb 14, 2017
1dd751e
Add files via upload
chenxinpeng Feb 14, 2017
45aa345
Update README.md
chenxinpeng Feb 14, 2017
ded9610
Create README.md
chenxinpeng Feb 14, 2017
f513d7a
Add files via upload
chenxinpeng Feb 14, 2017
1d6b652
Add files via upload
chenxinpeng Feb 14, 2017
a344d7b
Create download_pretrained_model.sh
chenxinpeng Feb 14, 2017
679c39b
Update README.md
chenxinpeng Feb 14, 2017
c14fa9b
Update README.md
chenxinpeng Feb 14, 2017
f274bbd
Create get_imgs_val_path.py
chenxinpeng Feb 14, 2017
f475525
Create get_imgs_train_path.py
chenxinpeng Feb 14, 2017
da94041
Create get_imgs_test_path.py
chenxinpeng Feb 14, 2017
75cace8
Add files via upload
chenxinpeng Feb 14, 2017
fe5bdc0
Update README.md
chenxinpeng Feb 14, 2017
4b669c1
Add files via upload
chenxinpeng Feb 14, 2017
110d37f
Create README.md
chenxinpeng Feb 15, 2017
756daf8
Add files via upload
chenxinpeng Feb 15, 2017
8de3f07
Add files via upload
chenxinpeng Feb 15, 2017
77deae5
Add files via upload
chenxinpeng Feb 15, 2017
4bc16f5
Add files via upload
chenxinpeng Feb 15, 2017
9fa1204
Update README.md
chenxinpeng Feb 16, 2017
44ba2ca
fixed bugs
May 5, 2018
f940bda
update
Jul 31, 2018
7acc585
update
Jul 31, 2018
10e1376
updatee
Jul 31, 2018
727c7a7
updatee
Jul 31, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 0 additions & 4 deletions .gitignore

This file was deleted.

651 changes: 651 additions & 0 deletions HRNN_paragraph_batch.py

Large diffs are not rendered by default.

21 changes: 0 additions & 21 deletions LICENSE.md

This file was deleted.

271 changes: 44 additions & 227 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,252 +1,69 @@
#DenseCap
# im2p
## Note
This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some one of you as a jumping-off point.

This is the code for the paper
## Introduction
Tensorflow implement of paper: [A Hierarchical Approach for Generating Descriptive Image Paragraphs](http://cs.stanford.edu/people/ranjaykrishna/im2p/index.html)

**[DenseCap: Fully Convolutional Localization Networks for Dense Captioning](http://cs.stanford.edu/people/karpathy/densecap/)**,
<br>
[Justin Johnson](http://cs.stanford.edu/people/jcjohns/)\*,
[Andrej Karpathy](http://cs.stanford.edu/people/karpathy/)\*,
[Li Fei-Fei](http://vision.stanford.edu/feifeili/),
<br>
(\* equal contribution)
<br>
Presented at [CVPR 2016](http://cvpr2016.thecvf.com/) (oral)
We donot fine-tunning the parameters, but this model can get the following scores:
![metric scores](https://github.com/chenxinpeng/im2p/blob/master/img/metric_scores.png)

The paper addresses the problem of **dense captioning**, where a computer detects objects in images and describes them in natural language. Here are a few example outputs:

<img src='imgs/resultsfig.png'>

The model is a deep convolutional neural network trained in an end-to-end fashion on the [Visual Genome](https://visualgenome.org/) dataset.

We provide:

- A [pretrained model](#pretrained-model)
- Code to [run the model on new images](#running-on-new-images), on either CPU or GPU
- Code to run a [live demo with a webcam](#webcam-demos)
- [Evaluation code](#evaluation) for dense captioning
- Instructions for [training the model](#training)

If you find this code useful in your research, please cite:

```
@inproceedings{densecap,
title={DenseCap: Fully Convolutional Localization Networks for Dense Captioning},
author={Johnson, Justin and Karpathy, Andrej and Fei-Fei, Li},
booktitle={Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition},
year={2016}
}
```

## Installation

DenseCap is implemented in [Torch](http://torch.ch/), and depends on the following packages: [torch/torch7](https://github.com/torch/torch7), [torch/nn](https://github.com/torch/nn), [torch/nngraph](https://github.com/torch/nngraph), [torch/image](https://github.com/torch/image), [lua-cjson](https://luarocks.org/modules/luarocks/lua-cjson), [qassemoquab/stnbhwd](https://github.com/qassemoquab/stnbhwd), [jcjohnson/torch-rnn](https://github.com/jcjohnson/torch-rnn)

After installing torch, you can install / update these dependencies by running the following:

```bash
luarocks install torch
luarocks install nn
luarocks install image
luarocks install lua-cjson
luarocks install https://raw.githubusercontent.com/qassemoquab/stnbhwd/master/stnbhwd-scm-1.rockspec
luarocks install https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/torch-rnn-scm-1.rockspec
```

### (Optional) GPU acceleration

If have an NVIDIA GPU and want to accelerate the model with CUDA, you'll also need to install
[torch/cutorch](https://github.com/torch/cutorch) and [torch/cunn](https://github.com/torch/cunn);
you can install / update these by running:

```bash
luarocks install cutorch
luarocks install cunn
luarocks install cudnn
```

### (Optional) cuDNN

If you want to use NVIDIA's cuDNN library, you'll need to register for the CUDA Developer Program (it's free)
and download the library from [NVIDIA's website](https://developer.nvidia.com/cudnn); you'll also need to install
the [cuDNN bindings for Torch](https://github.com/soumith/cudnn.torch) by running

```bash
luarocks install cudnn
```

## Pretrained model

You can download a pretrained DenseCap model by running the following script:

```bash
sh scripts/download_pretrained_model.sh
```

This will download a zipped version of the model (about 1.1 GB) to `data/models/densecap/densecap-pretrained-vgg16.t7.zip`, unpack
it to `data/models/densecap/densecap-pretrained-vgg16.t7` (about 1.2 GB) and then delete the zipped version.

This is not the exact model that was used in the paper, but is has comparable performance; using 1000 region proposals per image,
it achieves a mAP of 5.70 on the test set which is slightly better than the mAP of 5.39 that we report in the paper.

## Running on new images

To run the model on new images, use the script `run_model.lua`. To run the pretrained model on the provided `elephant.jpg` image,
use the following command:

```bash
th run_model.lua -input_image imgs/elephant.jpg
```

By default this will run in GPU mode; to run in CPU only mode, simply add the flag `-gpu -1`.

This command will write results into the folder `vis/data`. We have provided a web-based visualizer to view these
results; to use it, change to the `vis` directory and start a local HTTP server:

```bash
cd vis
python -m SimpleHTTPServer 8181
```

Then point your web browser to [http://localhost:8181/view_results.html](http://localhost:8181/view_results.html).

If you have an entire directory of images on which you want to run the model, use the `-input_dir` flag instead:
## Step 1
Download the [VisualGenome dataset](http://visualgenome.org/), we get the two files: VG_100K, VG_100K_2. According to the paper, we download the training, val and test splits json files. These three json files save the image names of train, validation, test data.

Running the script:
```bash
th run_model.lua -input_dir /path/to/my/image/folder
$ python split_dataset
```
We will get images from [VisualGenome dataset] which the authors used in the paper.

This run the model on all files in the folder `/path/to/my/image/folder/` whose filename does not start with `.`.

The web-based visualizer is the prefered way to view results, but if you don't want to use it then you can instead
render an image with the detection boxes and captions "baked in"; add the flag `-output_dir` to specify a directory
where output images should be written:

##Step 2
Run the scripts:
```bash
th run_model.lua -input_dir /path/to/my/image/folder -output_dir /path/to/output/folder/
$ python get_imgs_train_path.py
$ python get_imgs_val_path.py
$ python get_imgs_test_path.py
```
We will get three txt files: imgs_train_path.txt, imgs_val_path.txt, imgs_test_path.txt. They save the train, val, test images path.

The `run_model.lua` script has several other flags; you can [find details here](doc/FLAGS.md#run_modellua).


## Training

To train a new DenseCap model, you will following the following steps:

1. Download the raw images and region descriptions from [the Visual Genome website](https://visualgenome.org/api/v0/api_home.html)
2. Use the script `preprocess.py` to generate a single HDF5 file containing the entire dataset
[(details here)](doc/FLAGS.md#preprocesspy)
3. Use the script `train.lua` to train the model [(details here)](doc/FLAGS.md#trainlua)
4. Use the script `evaluate_model.lua` to evaluate a trained model on the validation or test data
[(details here)](doc/FLAGS.md#evaluate_modellua)

For more instructions on training see [INSTALL.md](doc/INSTALL.md) in `doc` folder.


## Evaluation

In the paper we propose a metric for automatically evaluating dense captioning results.
Our metric depends on [METEOR](http://www.cs.cmu.edu/~alavie/METEOR/README.html), and
our evaluation code requires both Java and Python 2.7. The following script will download
and unpack the METEOR jarfile:
After this, we use `dense caption` to extract features. Deploy the running environment follow by [densecap](https://github.com/jcjohnson/densecap) step by step.

Run the script:
```bash
sh scripts/setup_eval.sh
$ ./download_pretrained_model.sh
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_train_path.txt \
-output_h5 ./data/im2p_train_output.h5 -gpu 0 -use_cudnn 1
```
We should download the pre-trained model: `densecap-pretrained-vgg16.t7`. Then, according to the paper, we extract **50 boxes** from each image.

The evaluation code is **not required** to simply run a trained model on images; you can
[find more details about the evaluation code here](eval/README.md).


## Webcam demos

If you have a powerful GPU, then the DenseCap model is fast enough to run in real-time. We provide two
demos to allow you to run DenseCap on frames from a webcam.

### Single-machine demo
If you have a single machine with both a webcam and a powerful GPU, then you can
use this demo to run DenseCap in real time at up to 10 frames per second. This demo depends on a few extra
Lua packages:

- [clementfarabet/lua---camera](https://github.com/clementfarabet/lua---camera)
- [torch/qtlua](https://github.com/torch/qtlua)

You can install / update these dependencies by running the following:

Also, don't forget extract val images and test images features:
```bash
luarocks install camera
luarocks install qtlua
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_val_path.txt \
-output_h5 ./data/im2p_val_output.h5 -gpu 0 -use_cudnn 1

$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_test_path.txt \
-output_h5 ./data/im2p_test_output.h5 -gpu 0 -use_cudnn 1
```

You can start the demo by running the following:

## Step 3
Run the script:
```bash
qlua webcam/single_machine_demo.lua
$ python parse_json.py
```
In this step, we process the `paragraphs_v1.json` file for training and testing. We get the `img2paragraph` file in the **./data** directory. Its structure is like this:
![img2paragraph](https://github.com/chenxinpeng/im2p/blob/master/img/4.png)

### Client / server demo
If you have a machine with a powerful GPU and another machine with a webcam, then
this demo allows you use the GPU machine as a server and the webcam machine as a client; frames will be
streamed from the client to to the server, the model will run on the server, and predictions will be shipped
back to the client for viewing. This allows you to run DenseCap on a laptop, but with network and filesystem
overhead you will typically only achieve 1 to 2 frames per second.

The server is written in Flask; on the server machine run the following to install dependencies:

## Step 4
Finally, we can train and test model, in the terminal:
```bash
cd webcam
virtualenv .env
pip install -r requirements.txt
source .env/bin/activate
cd ..
$ CUDA_VISIBLE_DEVICES=0 ipython
>>> import HRNN_paragraph_batch.py
>>> HRNN_paragraph_batch.train()
```

For technical reasons, the server needs to serve content over SSL; it expects to find SSL key
files and certificate files in `webcam/ssl/server.key` and `webcam/ssl/server.crt` respectively.
You can generate a self-signed SSL certificate by running the following:

After training, we can test the model:
```bash
mkdir webcam/ssl

# Step 1: Generate a private key
openssl genrsa -des3 -out webcam/ssl/server.key 1024
# Enter a password

# Step 2: Generate a certificate signing request
openssl req -new -key webcam/ssl/server.key -out webcam/ssl/server.csr
# Enter the password from above and leave all other fields blank

# Step 3: Strip the password from the keyfile
cp webcam/ssl/server.key webcam/ssl/server.key.org
openssl rsa -in webcam/ssl/server.key.org -out webcam/ssl/server.key

# Step 4: Generate self-signed certificate
openssl x509 -req -days 365 -in webcam/ssl/server.csr -signkey webcam/ssl/server.key -out webcam/ssl/server.crt
# Enter the password from above
>>> HRNN_paragraph_batch.test()
```

You can now run the following two commands to start the server; both will run forever:

```bash
th webcam/daemon.lua
python webcam/server.py
```

On the client, point a web browser at the following page:

```
https://cs.stanford.edu/people/jcjohns/densecap/demo/web-client.html?server_url=SERVER_URL
```

but you should replace SERVER_URL with the actual URL of the server.

**Note**: If the server is using a self-signed SSL certificate, you may need to manually
tell your browser that the certificate is safe by pointing your client's web browser directly
at the server URL; you will get a message that the site is unsafe; for example on Chrome
you will see the following:

<img src='imgs/chrome_ssl_screen.png'>

Afterward you should see a message telling you that the DenseCap server is running, and
the web client should work after refreshing.

### Results
![demo](https://github.com/chenxinpeng/im2p/blob/master/img/HRNN_demo.png)
6 changes: 6 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

This folder saves the `im2p_train_output.h5`, `im2p_val_output.h5`, `im2p_test_output.h5`.

The JSON file `train_split.json`, `val_split.json`, `test_split.json` save the image names which authors used.

The JSON file `paragraphs_v1.json` saves the ground-truth captions of images.
Binary file added data/idx2word_batch.npy
Binary file not shown.
1 change: 1 addition & 0 deletions data/paragraphs_v1.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/test_split.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/train_split.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/val_split.json

Large diffs are not rendered by default.

Loading