|
| 1 | +# Activation-Beacon |
| 2 | + |
| 3 | +This folder contains the newer code for activation beacon with the support of deepspeed-3 training. This project is under development and subject to change in the future. |
| 4 | + |
| 5 | +## Environment |
| 6 | +The main dependencies are: |
| 7 | +``` |
| 8 | +pytorch==2.1.2 transformers==4.36.1 accelerate==0.25.0 datasets==2.14.7 numpy==1.26.2 flash-attn==2.4.2 |
| 9 | +``` |
| 10 | +You can install our environment with: |
| 11 | +```bash |
| 12 | +conda env create -f environment.yaml --name activation-beacon |
| 13 | +``` |
| 14 | + |
| 15 | + |
| 16 | +## Data |
| 17 | +You should download the data for fine-tuning & evaluation then untar the file at anywhere you prefer, e.g. `/data`, which results in a folder `/data/activation-beacon`: |
| 18 | +```bash |
| 19 | +# feel free to alternate /data to your prefered location |
| 20 | +wget https://huggingface.co/datasets/namespace-Pt/projects/resolve/main/activation-beacon.tar.gz?download=true -O /data/activation-beacon.tar.gz |
| 21 | + |
| 22 | +cd /data |
| 23 | +tar -xzvf activation-beacon.tar.gz |
| 24 | +``` |
| 25 | + |
| 26 | +**IMPORTANT NOTE** |
| 27 | +- For any path specified for `train_data` and `eval_data`: if it is prefixed with `activation-beacon:`, it will be solved to the relative path against [`data_root`](../src/args.py). |
| 28 | + - e.g. `activation-beacon:lm/pg19.json` becomes `${data_root}/lm/pg19.json` |
| 29 | + - you can modify the default value of [`data_root`](../src/args.py), so that you don't need to type it for each command. |
| 30 | + |
| 31 | + |
| 32 | +## Command |
| 33 | +```bash |
| 34 | +cd new |
| 35 | + |
| 36 | +torchrun --nproc_per_node 8 -m main.train \ |
| 37 | +--output_dir data/outputs/activation-beacon-llama2-chat-7b \ |
| 38 | +--model_name_or_path meta-llama/Llama-2-7b-chat-hf \ |
| 39 | +--train_data activation-beacon:pretrain/redpajama-sample.json activation-beacon:finetune/longalpaca.json \ |
| 40 | +--max_length 8192 \ |
| 41 | +--min_length 1200 \ |
| 42 | +--max_train_num_per_data 200000 \ |
| 43 | +--num_train_epochs 1 \ |
| 44 | +--enable_beacon \ |
| 45 | +--beacon_window 1024 \ |
| 46 | +--beacon_stride 1024 \ |
| 47 | +--beacon_attn step-expansion \ |
| 48 | +--beacon_sink_size 1 \ |
| 49 | +--beacon_ratio 2 4 8 16 32 64 128 \ |
| 50 | +--beacon_ratio_mix step-random \ |
| 51 | +--beacon_param q k v o \ |
| 52 | +--gradient_checkpointing \ |
| 53 | +--save_strategy steps \ |
| 54 | +--max_steps 10000 \ |
| 55 | +--save_steps 10000 \ |
| 56 | +--logging_steps 50 \ |
| 57 | +--chat_template llama-2 \ |
| 58 | +--group_by_stride strict \ |
| 59 | +--deepspeed data/deepspeed/stage3.json \ |
| 60 | + |
| 61 | + |
| 62 | +# Evaluation |
| 63 | +for model in data/outputs/activation-beacon-llama2-chat-7b/* |
| 64 | +do |
| 65 | +COMMAND="--beacon_sink_size 1" |
| 66 | + |
| 67 | +# 100K perplexity |
| 68 | +torchrun --nproc_per_node 8 -m main.eval_lm --model_name_or_path $model --max_length 100000 --beacon_ratio 32 --min_length 400000 --enable_beacon --stride 0 $COMMAND |
| 69 | +# 400K perplexity |
| 70 | +torchrun --nproc_per_node 8 -m main.eval_lm --model_name_or_path $model --max_length 400000 --beacon_ratio 128 --min_length 400000 --enable_beacon --stride 0 $COMMAND |
| 71 | +# LongBench |
| 72 | +torchrun --nproc_per_node 8 -m main.eval_longbench --model_name_or_path $model --max_length 15500 --enable_beacon $COMMAND |
| 73 | +# Topic Retrieval |
| 74 | +torchrun --nproc_per_node 8 -m main.eval_longeval --model_name_or_path $model --enable_beacon $COMMAND |
| 75 | +done |
| 76 | +``` |
0 commit comments