FlagOpen
diff --git a/‎Long_LLM/activation_beacon/.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎Long_LLM/activation_beacon/.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎Long_LLM/activation_beacon/README.md‎
Lines changed: 4 additions & 90 deletions b/‎Long_LLM/activation_beacon/README.md‎
Lines changed: 4 additions & 90 deletions
diff --git a/‎Long_LLM/activation_beacon/new/README.md‎
Lines changed: 59 additions & 64 deletions b/‎Long_LLM/activation_beacon/new/README.md‎
Lines changed: 59 additions & 64 deletions
diff --git a/‎…n/new/data/deepspeed/stage2_offload.json‎ ‎…n/new/data/deepspeed/stage2-offload.json‎Long_LLM/activation_beacon/new/data/deepspeed/stage2_offload.json renamed to Long_LLM/activation_beacon/new/data/deepspeed/stage2-offload.json b/‎…n/new/data/deepspeed/stage2_offload.json‎ ‎…n/new/data/deepspeed/stage2-offload.json‎Long_LLM/activation_beacon/new/data/deepspeed/stage2_offload.json renamed to Long_LLM/activation_beacon/new/data/deepspeed/stage2-offload.json
diff --git a/‎Long_LLM/activation_beacon/new/data/deepspeed/stage2_small.json‎
Lines changed: 0 additions & 43 deletions b/‎Long_LLM/activation_beacon/new/data/deepspeed/stage2_small.json‎
Lines changed: 0 additions & 43 deletions
diff --git a/‎…on_beacon/new/data/deepspeed/stage0.json‎ ‎…data/deepspeed/stage3-offload-optim.json‎Long_LLM/activation_beacon/new/data/deepspeed/stage0.json renamed to Long_LLM/activation_beacon/new/data/deepspeed/stage3-offload-optim.json
Lines changed: 27 additions & 11 deletions b/‎…on_beacon/new/data/deepspeed/stage0.json‎ ‎…data/deepspeed/stage3-offload-optim.json‎Long_LLM/activation_beacon/new/data/deepspeed/stage0.json renamed to Long_LLM/activation_beacon/new/data/deepspeed/stage3-offload-optim.json
Lines changed: 27 additions & 11 deletions
@@ -0,0 +1,3 @@
+outputs
+results
+pretrain
@@ -1,96 +1,10 @@
 <div align="center">
 <h1>Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon [<a href="https://arxiv.org/abs/2401.03462">paper</a>]</h1>
-
-<img src="imgs/impress.png" width="80%" class="center">
 </div>
 
-This is the codebase for Activation Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by **x100** times. Currently we only apply activation beacon to [Llama-2-chat-7b](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). More LLMs will be supported in the future.
-
-## Features
-- **Effectiveness**
-  - significantly improves the performance of Llama-2 on long-context generation (language modeling) and long-context understanding (e.g. long-document QA).
-- **Efficiency**
-  - low memory usage; low inference latency (compeitive against FlashAttention2); inference latency increases linearly with the input length.
-- **Compatibility**
-  - preserve the short-context capability of Llama-2;
-  - can be combined with context window extension techniques for futher context extension (e.g. 1M with NTK-Aware); 
-  - can be combined with retrieval for higher memory accuracy (*ongoing*).
-- **Low-Cost Training**
-  - train with 80000 texts within 9 hours;
-  - most training samples are shorter than 4096.
-
-## Note
-Activation Beacon is a working project. We have released newer code in the [new folder](./new/), which support:
-- deepspeed-3 training
-- fine-tuning and evaluating with chat template
-- needle-in-a-haystack evaluation
-
-You can use code there if you're interested. The code in this current folder will be deprecated in the future.
-
-
-## Environment
-The main dependencies are:
-```
-pytorch==2.1.2 transformers==4.36.1 accelerate==0.25.0 datasets==2.14.7 numpy==1.26.2 flash-attn==2.4.2
-```
-You can install our environment with:
-```bash
-conda env create -f environment.yaml --name activation-beacon
-```
-
-## Usage
-```python
-import json
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
-model_id = "namespace-Pt/activation-beacon-llama2-7b-chat"
-
-tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16)
-
-model = model.cuda().eval()
-
-with torch.no_grad():
-  # short context
-  text = "Tell me about yourself."
-  inputs = tokenizer(text, return_tensors="pt").to("cuda")
-  outputs = model.generate(**inputs, max_new_tokens=20)
-  print(f"Input Length: {inputs['input_ids'].shape[1]}")
-  print(f"Output:       {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
-
-  # reset memory before new generation task
-  model.memory.reset()
-
-  # long context
-  with open("data/toy/narrativeqa.json", encoding="utf-8") as f:
-    example = json.load(f)
-  inputs = tokenizer(example["context"], return_tensors="pt").to("cuda")
-  outputs = model.generate(**inputs, do_sample=False, top_p=1, temperature=1, max_new_tokens=20)[:, inputs["input_ids"].shape[1]:]
-  print("*"*20)
-  print(f"Input Length: {inputs['input_ids'].shape[1]}")
-  print(f"Answer:       {example['answer']}")
-  print(f"Prediction:   {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
-```
-**NOTE**: It's okay to see warnings like `This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` Just ignore it.
-
-## Training
-See [training section](./docs/training.md).
-
-## Evaluation
-See [evaluation section](./docs/evaluation.md).
+This is the codebase for Activation Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by **x100** times. Currently we only apply activation beacon to [Llama-2-chat-7b](https://huggingface.co/namespace-Pt/activation-beacon-llama2-7b-chat) and [Mistral-7B-Instruct-v0.2](https://huggingface.co/namespace-Pt/activation-beacon-mistral-7b). More LLMs will be supported in the future.
 
-## Citation
-If you find this repository useful, please give us a star ⭐.
 
-To cite our work:
-```
-@misc{zhang2024soaring,
-    title={Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon}, 
-    author={Peitian Zhang and Zheng Liu and Shitao Xiao and Ninglu Shao and Qiwei Ye and Zhicheng Dou},
-    year={2024},
-    eprint={2401.03462},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
+## File structure:
+- The `old` folder contains our initial implementation of Activation Beacon for Llama-2. You can use the code in it to reproduce the training/evaluation of the Llama-2 based model shown in our paper.
+- The `new` folder contains **newer** implementation of Activation Beacon for both Llama-2 and Mistral. It also supports more features, including **Deepspeed Zero3 training**, adding **chat template** in training and inference, and **evaluating on more tasks**. However, code in this folder are under development and subject to change in the future.
@@ -1,79 +1,74 @@
 # Activation-Beacon
 
-This folder contains the newer code for activation beacon with the support of deepspeed-3 training. This project is under development and subject to change in the future.
+This folder contains the newer code for activation beacon with the support of **Mistral models**, **Deepspeed Zero3 training**, **chat templates**, and **more evaluation tasks**. The code here are under development and subject to change in the future.
 
 ## Environment
-The main dependencies are:
-```
-pytorch==2.1.2 transformers==4.36.1 accelerate==0.25.0 datasets==2.14.7 numpy==1.26.2 flash-attn==2.4.2
-```
-You can install our environment with:
 ```bash
-conda env create -f environment.yaml --name activation-beacon
+conda create beacon python=3.10.14
+
+conda activate beacon
+
+conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
+pip install transformers==4.39.3 deepspeed accelerate datasets peft pandas seaborn
+pip install flash-attn --no-build-isolation
+
+# these packages are used in evaluation
+pip install rouge fuzzywuzzy jieba
 ```
 
+## Usage
+```python
+import json
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
 
-## Data
-You should download the data for fine-tuning & evaluation then untar the file at anywhere you prefer, e.g. `/data`, which results in a folder `/data/activation-beacon`:
-```bash
-# feel free to alternate /data to your prefered location
-wget https://huggingface.co/datasets/namespace-Pt/projects/resolve/main/activation-beacon.tar.gz?download=true -O /data/activation-beacon.tar.gz
+model_id = "namespace-Pt/activation-beacon-mistral-7b"
 
-cd /data
-tar -xzvf activation-beacon.tar.gz
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16)
 
-# you must download the new longalpaca dataset that was organized into single-turn conversation
-wget https://huggingface.co/datasets/namespace-Pt/projects/resolve/main/longalpaca.json?download=true -O /data/activation-beacon/finetune/longalpaca.new.json
+model = model.cuda().eval()
+
+with torch.no_grad():
+  # short context
+  text = "Tell me about yourself."
+  inputs = tokenizer(text, return_tensors="pt").to("cuda")
+  outputs = model.generate(**inputs, max_new_tokens=20)
+  print(f"Input Length: {inputs['input_ids'].shape[1]}")
+  print(f"Output:       {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
+
+  # reset memory before new generation task
+  model.memory.reset()
+
+  # long context
+  with open("data/toy/infbench.json", encoding="utf-8") as f:
+    example = json.load(f)
+  inputs = tokenizer(example["context"], return_tensors="pt").to("cuda")
+  outputs = model.generate(**inputs, do_sample=False, top_p=1, temperature=1, max_new_tokens=20)[:, inputs["input_ids"].shape[1]:]
+  print("*"*20)
+  print(f"Input Length: {inputs['input_ids'].shape[1]}")
+  print(f"Answers:      {example['answer']}")
+  print(f"Prediction:   {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
 ```
+**NOTE**: It's okay to see warnings like `This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (32768). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` Just ignore it.
 
-**IMPORTANT NOTE**
-- For any path specified for `train_data` and `eval_data`: if it is prefixed with `activation-beacon:`, it will be solved to the relative path against [`data_root`](../src/args.py). 
-  - e.g. `activation-beacon:lm/pg19.json` becomes `${data_root}/lm/pg19.json`
-  - you can modify the default value of [`data_root`](../src/args.py), so that you don't need to type it for each command.
+## Training
+See [training section](./docs/training.md). **The training script for Mistral will be released in future.**
 
+## Evaluation
+See [evaluation section](./docs/evaluation.md). 
 
-## Command
-```bash
-cd new
-
-torchrun --nproc_per_node 8 -m main.train \
---output_dir data/outputs/activation-beacon-llama2-chat-7b \
---model_name_or_path meta-llama/Llama-2-7b-chat-hf \
---train_data activation-beacon:pretrain/redpajama-sample.json activation-beacon:finetune/longalpaca.new.json \
---max_length 8192 \
---min_length 1200 \
---max_train_num_per_data 200000 \
---num_train_epochs 1 \
---enable_beacon \
---beacon_window 1024 \
---beacon_stride 1024 \
---beacon_attn step-expansion \
---beacon_sink_size 1 \
---beacon_ratio 2 4 8 16 32 64 128 \
---beacon_ratio_mix step-random \
---beacon_param q k v o \
---gradient_checkpointing \
---save_strategy steps \
---max_steps 10000 \
---save_steps 10000 \
---logging_steps 50 \
---chat_template llama-2 \
---group_by_stride strict \
---deepspeed data/deepspeed/stage3.json \
-
-
-# Evaluation
-for model in data/outputs/activation-beacon-llama2-chat-7b/*
-do
-COMMAND="--beacon_sink_size 1"
-
-# 100K perplexity
-torchrun --nproc_per_node 8 -m main.eval_lm --model_name_or_path $model --max_length 100000 --beacon_ratio 32 --min_length 400000 --enable_beacon --stride 0 $COMMAND
-# 400K perplexity
-torchrun --nproc_per_node 8 -m main.eval_lm --model_name_or_path $model --max_length 400000 --beacon_ratio 128 --min_length 400000 --enable_beacon --stride 0 $COMMAND
-# LongBench
-torchrun --nproc_per_node 8 -m main.eval_longbench --model_name_or_path $model --max_length 15500 --enable_beacon $COMMAND
-# Topic Retrieval
-torchrun --nproc_per_node 8 -m main.eval_longeval --model_name_or_path $model --enable_beacon $COMMAND
-done
+## Citation
+If you find this repository useful, please give us a star ⭐.
+
+To cite our work:
 ```
+@misc{zhang2024soaring,
+    title={Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon}, 
+    author={Peitian Zhang and Zheng Liu and Shitao Xiao and Ninglu Shao and Qiwei Ye and Zhicheng Dou},
+    year={2024},
+    eprint={2401.03462},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
@@ -1,17 +1,37 @@
 {
+    "zero_optimization": {
+        "stage": 3,
+        "overlap_comm": true,
+        "contiguous_gradients": true,
+        "sub_group_size": 1e9,
+        "reduce_bucket_size": "auto",
+        "stage3_prefetch_bucket_size": "auto",
+        "stage3_param_persistence_threshold": "auto",
+        "stage3_max_live_parameters": 1e9,
+        "stage3_max_reuse_distance": 1e9,
+        "stage3_gather_16bit_weights_on_model_save": true,
+
+        "offload_optimizer": {
+            "device": "cpu",
+            "pin_memory": true
+        }
+    },
     "fp16": {
         "enabled": "auto",
         "loss_scale": 0,
+        "initial_scale_power": 10,
         "loss_scale_window": 1000,
-        "initial_scale_power": 16,
         "hysteresis": 2,
         "min_loss_scale": 1
     },
-
     "bf16": {
-        "enabled": "auto"
+        "enabled": "auto",
+        "loss_scale": 0,
+        "initial_scale_power": 10,
+        "loss_scale_window": 1000,
+        "hysteresis": 2,
+        "min_loss_scale": 1
     },
-
     "optimizer": {
         "type": "AdamW",
         "params": {
@@ -31,15 +51,11 @@
             "total_num_steps": "auto"
         }
     },
-
-    "zero_optimization": {
-        "stage": 0
-    },
-
+  
     "gradient_accumulation_steps": "auto",
     "gradient_clipping": "auto",
-    "steps_per_print": 100,
+    "steps_per_print": 1000,
     "train_batch_size": "auto",
     "train_micro_batch_size_per_gpu": "auto",
     "wall_clock_breakdown": false
-}
+}