In this subpackage we conduct experiments on variants of the Copy Task experiment. Unless noted otherwise, all experiments are performed using vanilla RNNs with 256 hidden neurons.
Please run the following command to see the available options for running Copy Task experiments.
$ python3 train_copy.py --help
We consider a variant of the Copy Task where output patterns are permuted across time. We report results for input sequences of length p=i=5
.
The following run on a multi-head RNN leads to around 100.00% accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.0005 --clip_grad_norm=1 --use_vanilla_rnn --use_cuda --multitask --orthogonal_hh_reg=1.0 --permute_time --input_len_step=0 --input_len_variability=0
The following run on a single-head RNN leads to around 100.00% during accuracy:
$ python3 train_copy.py --train_from_scratch --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.0005 --clip_grad_norm=-1 --use_vanilla_rnn --use_cuda --orthogonal_hh_reg=1.0 --permute_time --input_len_step=0 --input_len_variability=0
Using the recurrent main net only, and fine-tuning all weights for each task on a multi-head RNN (around 99.99% during accuracy):
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --clip_grad_norm=1 --use_vanilla_rnn --use_cuda --permute_time --input_len_step=0 --input_len_variability=0
The following run using a multi-head RNN leads to around 100.00% during and 100.00% final accuracy:
$ python3 train_copy.py --nh_chmlp_chunk_size=2500 --beta=10.0 --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.0005 --clip_grad_norm=1 --rnn_arch="256" --net_act=tanh --use_vanilla_rnn --nh_hnet_type=chunked_hmlp --nh_hmlp_arch="50,50" --nh_cond_emb_size=32 --nh_chunk_emb_size="32" --use_new_hnet --std_normal_temb=1.0 --std_normal_emb=0.1 --use_cuda --hnet_all --hnet_reg_batch_size=-1 --orthogonal_hh_reg=10.0 --first_task_input_len=5 --input_len_step=0 --input_len_variability=0 --permute_time
If regularizing on a single randomly picked task at each loss evaluation, the following run obtains 100.00% final accuracy:
$ python3 train_copy.py --nh_chmlp_chunk_size=2500 --beta=1 --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.0005 --clip_grad_norm=1 --rnn_arch="256" --net_act=tanh --use_vanilla_rnn --nh_hnet_type=chunked_hmlp --nh_hmlp_arch="50,50" --nh_cond_emb_size=32 --nh_chunk_emb_size="16" --use_new_hnet --std_normal_temb=1.0 --std_normal_emb=0.1 --use_cuda --hnet_all --hnet_reg_batch_size=1 --orthogonal_hh_reg=10.0 --first_task_input_len=5 --input_len_step=0 --input_len_variability=0 --permute_time
The following run on a multi-head RNN leads to around 99.93% during and 98.66% final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.001 --clip_grad_norm=1.0 --use_vanilla_rnn --use_cuda --use_ewc --ewc_gamma=1.0 --ewc_lambda=100.0 --n_fisher=-1 --orthogonal_hh_reg=0.01 --permute_time
The following run on a multi-head RNN where the task identity is provided as additional input leads to around 97.61% during and 97.54% final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.01 --clip_grad_norm=1 --rnn_arch="256" --net_act=tanh --use_vanilla_rnn --use_cuda --input_task_identity --orthogonal_hh_init --orthogonal_hh_reg=1 --use_ewc --ewc_lambda=10000000.0 --n_fisher=200 --first_task_input_len=5 --input_len_step=0 --input_len_variability=0 --permute_time
The following run on a multi-head RNN leads to around 98.7% during and 94.5 final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --clip_grad_norm=1 --use_vanilla_rnn --use_cuda --orthogonal_hh_reg=1.0 --use_si --si_lambda=0.01 --si_task_loss_only --permute_time --input_len_step=0 --input_len_variability=0
The following run on a multi-head RNN leads to around 99.93% during and 73.73% final accuracy:
$ python3 train_copy.py --no_context_mod_outputs --dont_softplus_gains --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.005 --clip_grad_norm=100 --use_vanilla_rnn --orthogonal_hh_init --orthogonal_hh_reg=-1 --use_cuda --use_masks --permute_time --input_len_step=0 --input_len_variability=0
The following run on a multi-head RNN leads to around 100.00% during and 100.00% final accuracy:
$ python3 train_copy.py --no_context_mod_outputs --dont_softplus_gains --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.005 --clip_grad_norm=100 --use_vanilla_rnn --orthogonal_hh_init --orthogonal_hh_reg=-1 --use_cuda --use_masks --use_si --si_task_loss_only --permute_time --input_len_step=0 --input_len_variability=0
The following run on a multi-head RNN leads to around 100.00% during and 100.00% final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.0001 --clip_grad_norm=100 --rnn_arch="256" --use_vanilla_rnn --use_cuda --use_replay --orthogonal_hh_init --orthogonal_hh_reg=1.0 --replay_pm_strength=1.0 --replay_rec_strength=10.0 --replay_distill_reg=1.0 --latent_dim=8 --dec_srnn_rec_layers="256" --dec_srnn_rec_type=elman --permute_time --input_len_step=0 --input_len_variability=0
The following run on a multi-head RNN with Coresets of size 100 leads to around 100% final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=10000 --lr=0.0001 --clip_grad_norm=-1 --use_vanilla_rnn --use_cuda --use_replay --orthogonal_hh_init --orthogonal_hh_reg=10.0 --replay_distill_reg=10.0 --coreset_size=100 --permute_time --input_len_step=0 --input_len_variability=0
We consider a variant of the Copy Task where input patterns are padded with zeros, yielding longer input sequences. We report results for input sequences of length i=25
and pattern output sequences of length p=5
.
The following run on a multi-head RNN leads to around 100% final accuracy:
$ python3 train_copy.py --nh_chmlp_chunk_size=4000 --beta=10.0 --multi_head --num_tasks=5 --batch_size=128 --n_iter=10000 --lr=0.001 --clip_grad_norm=10 --net_act=tanh --use_vanilla_rnn --nh_hnet_type=chunked_hmlp --nh_hmlp_arch="60,60,30" --nh_cond_emb_size=16 --nh_chunk_emb_size="32" --use_new_hnet --std_normal_temb=0.1 --std_normal_emb=0.1 --use_cuda --hnet_all --orthogonal_hh_reg=10.0 --first_task_input_len=25 --input_len_step=0 --input_len_variability=0 --pat_len=5
The following run on a multi-head RNN leads to around 98.03% during and 98.07% final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --first_task_input_len=25 --pat_len=5 --batch_size=128 --n_iter=20000 --lr=0.005 --clip_grad_norm=1 --use_vanilla_rnn --use_cuda --orthogonal_hh_init --orthogonal_hh_reg=1.0 --use_ewc --ewc_lambda=10000.0 --n_fisher=200 --permute_time --input_len_step=0 --input_len_variability=0
We consider a variant of the Copy Task where the output is computed from the input pattern by applying a binary XOR operation iteratively with a series of r
fixed permutations.
The following run on a multi-head RNN for r=1
leads to around 100.00 % during and 100.00 % final accuracy:
$ python3 train_copy.py --hyper_chunks=4000 --beta=1.0 --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.005 --clip_grad_norm=1 --use_vanilla_rnn --hnet_arch="64,64,32" --temb_size=32 --emb_size=32 --use_cuda --hnet_all --orthogonal_hh_reg=1.0 --permute_time --input_len_step=0 --input_len_variability=0 --permute_xor --permute_xor_iter=1 --permute_xor_separate
The following run on a multi-head RNN for r=5
leads to around 97.07 % during and 93.93 % final accuracy:
$ python3 train_copy.py --hyper_chunks=4000 --beta=10.0 --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.005 --clip_grad_norm=1 --use_vanilla_rnn --hnet_arch="64,64,32" --temb_size=32 --emb_size=32 --use_cuda --hnet_all --orthogonal_hh_reg=1.0 --permute_time --input_len_step=0 --input_len_variability=0 --permute_xor --permute_xor_iter=5 --permute_xor_separate
The following run on a multi-head RNN for r=1
leads to around 99.65 % during and 95.92 % final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.005 --clip_grad_norm=1 --use_vanilla_rnn --use_cuda --orthogonal_hh_init --orthogonal_hh_reg=10 --use_ewc --ewc_lambda=1000.0 --n_fisher=200 --permute_time --input_len_step=0 --input_len_variability=0 --permute_xor --permute_xor_separate --permute_xor_iter=1
The following run on a multi-head RNN for r=5
leads to around 94.41 % during and 86.39 % final accuracy:
$ python3 train_copy.py --multi_head --num_tasks=5 --batch_size=128 --n_iter=20000 --lr=0.001 --clip_grad_norm=-1 --use_vanilla_rnn --use_cuda --orthogonal_hh_init --orthogonal_hh_reg=10 --use_ewc --ewc_lambda=1000.0 --n_fisher=200 --permute_time --input_len_step=0 --input_len_variability=0 --permute_xor --permute_xor_separate --permute_xor_iter=5