|
1 | 1 | # Activation-Beacon |
2 | 2 |
|
3 | | -This folder contains the newer code for activation beacon with the support of deepspeed-3 training. This project is under development and subject to change in the future. |
| 3 | +This folder contains the newer code for activation beacon with the support of **Mistral models**, **Deepspeed Zero3 training**, **chat templates**, and **more evaluation tasks**. The code here are under development and subject to change in the future. |
4 | 4 |
|
5 | 5 | ## Environment |
6 | | -The main dependencies are: |
7 | | -``` |
8 | | -pytorch==2.1.2 transformers==4.36.1 accelerate==0.25.0 datasets==2.14.7 numpy==1.26.2 flash-attn==2.4.2 |
9 | | -``` |
10 | | -You can install our environment with: |
11 | 6 | ```bash |
12 | | -conda env create -f environment.yaml --name activation-beacon |
| 7 | +conda create beacon python=3.10.14 |
| 8 | + |
| 9 | +conda activate beacon |
| 10 | + |
| 11 | +conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia |
| 12 | +pip install transformers==4.39.3 deepspeed accelerate datasets peft pandas seaborn |
| 13 | +pip install flash-attn --no-build-isolation |
| 14 | + |
| 15 | +# these packages are used in evaluation |
| 16 | +pip install rouge fuzzywuzzy jieba |
13 | 17 | ``` |
14 | 18 |
|
| 19 | +## Usage |
| 20 | +```python |
| 21 | +import json |
| 22 | +import torch |
| 23 | +from transformers import AutoModelForCausalLM, AutoTokenizer |
15 | 24 |
|
16 | | -## Data |
17 | | -You should download the data for fine-tuning & evaluation then untar the file at anywhere you prefer, e.g. `/data`, which results in a folder `/data/activation-beacon`: |
18 | | -```bash |
19 | | -# feel free to alternate /data to your prefered location |
20 | | -wget https://huggingface.co/datasets/namespace-Pt/projects/resolve/main/activation-beacon.tar.gz?download=true -O /data/activation-beacon.tar.gz |
| 25 | +model_id = "namespace-Pt/activation-beacon-mistral-7b" |
21 | 26 |
|
22 | | -cd /data |
23 | | -tar -xzvf activation-beacon.tar.gz |
| 27 | +tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| 28 | +model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16) |
24 | 29 |
|
25 | | -# you must download the new longalpaca dataset that was organized into single-turn conversation |
26 | | -wget https://huggingface.co/datasets/namespace-Pt/projects/resolve/main/longalpaca.json?download=true -O /data/activation-beacon/finetune/longalpaca.new.json |
| 30 | +model = model.cuda().eval() |
| 31 | + |
| 32 | +with torch.no_grad(): |
| 33 | + # short context |
| 34 | + text = "Tell me about yourself." |
| 35 | + inputs = tokenizer(text, return_tensors="pt").to("cuda") |
| 36 | + outputs = model.generate(**inputs, max_new_tokens=20) |
| 37 | + print(f"Input Length: {inputs['input_ids'].shape[1]}") |
| 38 | + print(f"Output: {tokenizer.decode(outputs[0], skip_special_tokens=True)}") |
| 39 | + |
| 40 | + # reset memory before new generation task |
| 41 | + model.memory.reset() |
| 42 | + |
| 43 | + # long context |
| 44 | + with open("data/toy/infbench.json", encoding="utf-8") as f: |
| 45 | + example = json.load(f) |
| 46 | + inputs = tokenizer(example["context"], return_tensors="pt").to("cuda") |
| 47 | + outputs = model.generate(**inputs, do_sample=False, top_p=1, temperature=1, max_new_tokens=20)[:, inputs["input_ids"].shape[1]:] |
| 48 | + print("*"*20) |
| 49 | + print(f"Input Length: {inputs['input_ids'].shape[1]}") |
| 50 | + print(f"Answers: {example['answer']}") |
| 51 | + print(f"Prediction: {tokenizer.decode(outputs[0], skip_special_tokens=True)}") |
27 | 52 | ``` |
| 53 | +**NOTE**: It's okay to see warnings like `This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (32768). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` Just ignore it. |
28 | 54 |
|
29 | | -**IMPORTANT NOTE** |
30 | | -- For any path specified for `train_data` and `eval_data`: if it is prefixed with `activation-beacon:`, it will be solved to the relative path against [`data_root`](../src/args.py). |
31 | | - - e.g. `activation-beacon:lm/pg19.json` becomes `${data_root}/lm/pg19.json` |
32 | | - - you can modify the default value of [`data_root`](../src/args.py), so that you don't need to type it for each command. |
| 55 | +## Training |
| 56 | +See [training section](./docs/training.md). **The training script for Mistral will be released in future.** |
33 | 57 |
|
| 58 | +## Evaluation |
| 59 | +See [evaluation section](./docs/evaluation.md). |
34 | 60 |
|
35 | | -## Command |
36 | | -```bash |
37 | | -cd new |
38 | | - |
39 | | -torchrun --nproc_per_node 8 -m main.train \ |
40 | | ---output_dir data/outputs/activation-beacon-llama2-chat-7b \ |
41 | | ---model_name_or_path meta-llama/Llama-2-7b-chat-hf \ |
42 | | ---train_data activation-beacon:pretrain/redpajama-sample.json activation-beacon:finetune/longalpaca.new.json \ |
43 | | ---max_length 8192 \ |
44 | | ---min_length 1200 \ |
45 | | ---max_train_num_per_data 200000 \ |
46 | | ---num_train_epochs 1 \ |
47 | | ---enable_beacon \ |
48 | | ---beacon_window 1024 \ |
49 | | ---beacon_stride 1024 \ |
50 | | ---beacon_attn step-expansion \ |
51 | | ---beacon_sink_size 1 \ |
52 | | ---beacon_ratio 2 4 8 16 32 64 128 \ |
53 | | ---beacon_ratio_mix step-random \ |
54 | | ---beacon_param q k v o \ |
55 | | ---gradient_checkpointing \ |
56 | | ---save_strategy steps \ |
57 | | ---max_steps 10000 \ |
58 | | ---save_steps 10000 \ |
59 | | ---logging_steps 50 \ |
60 | | ---chat_template llama-2 \ |
61 | | ---group_by_stride strict \ |
62 | | ---deepspeed data/deepspeed/stage3.json \ |
63 | | - |
64 | | - |
65 | | -# Evaluation |
66 | | -for model in data/outputs/activation-beacon-llama2-chat-7b/* |
67 | | -do |
68 | | -COMMAND="--beacon_sink_size 1" |
69 | | - |
70 | | -# 100K perplexity |
71 | | -torchrun --nproc_per_node 8 -m main.eval_lm --model_name_or_path $model --max_length 100000 --beacon_ratio 32 --min_length 400000 --enable_beacon --stride 0 $COMMAND |
72 | | -# 400K perplexity |
73 | | -torchrun --nproc_per_node 8 -m main.eval_lm --model_name_or_path $model --max_length 400000 --beacon_ratio 128 --min_length 400000 --enable_beacon --stride 0 $COMMAND |
74 | | -# LongBench |
75 | | -torchrun --nproc_per_node 8 -m main.eval_longbench --model_name_or_path $model --max_length 15500 --enable_beacon $COMMAND |
76 | | -# Topic Retrieval |
77 | | -torchrun --nproc_per_node 8 -m main.eval_longeval --model_name_or_path $model --enable_beacon $COMMAND |
78 | | -done |
| 61 | +## Citation |
| 62 | +If you find this repository useful, please give us a star ⭐. |
| 63 | + |
| 64 | +To cite our work: |
79 | 65 | ``` |
| 66 | +@misc{zhang2024soaring, |
| 67 | + title={Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon}, |
| 68 | + author={Peitian Zhang and Zheng Liu and Shitao Xiao and Ninglu Shao and Qiwei Ye and Zhicheng Dou}, |
| 69 | + year={2024}, |
| 70 | + eprint={2401.03462}, |
| 71 | + archivePrefix={arXiv}, |
| 72 | + primaryClass={cs.CL} |
| 73 | +} |
| 74 | +``` |
0 commit comments