Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in run_classifier.py for attention_type=simulated_sparse #14

Open
Amit-GH opened this issue Apr 16, 2021 · 0 comments
Open

Error in run_classifier.py for attention_type=simulated_sparse #14

Amit-GH opened this issue Apr 16, 2021 · 0 comments

Comments

@Amit-GH
Copy link

Amit-GH commented Apr 16, 2021

I am using script base_size.sh to run the class run_classifier.py. I am able to train and evaluate on imdb data for attention_type set as original_full and block_sparse but when I set it to simulated_sparse I see errors in initializing the training itself. The 12 layers are initialized but training doesn't start. The major error log is below:

File "/home/amitghattimare/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3211, in _as_graph_def
    graph.ParseFromString(compat.as_bytes(data))
google.protobuf.message.DecodeError: Error parsing message

I used the below script to run the code in case it helps in investigation. If I change attention_type to the other 2 options, it works fine. I am using only 8 cores because that's the max available in preemptible mode. I have reduced train_batch_size so that it fits in memory. I wonder if that's causing the issue though error logs don't indicate that.

python3 bigbird/classifier/run_classifier.py \
  --data_dir=tfds://imdb_reviews/plain_text \
  --output_dir=gs://bigbird-replication-bucket/classifier/imdb/sim_sparse_attention \
  --attention_type=simulated_sparse \
  --max_encoder_length=4096 \
  --num_attention_heads=12 \
  --num_hidden_layers=12 \
  --hidden_size=768 \
  --intermediate_size=3072 \
  --block_size=64 \
  --train_batch_size=1 \
  --eval_batch_size=2 \
  --do_train=True \
  --do_eval=False \
  --num_train_steps=1000 \
  --use_tpu=True \
  --tpu_name=bigbird \
  --tpu_zone=us-central1-b \
  --gcp_project=bigbird-replication \
  --num_tpu_cores=8 \
  --init_checkpoint=gs://bigbird-transformer/pretrain/bigbr_base/model.ckpt-0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant