FlagOpen
diff --git a/‎Long_LLM/activation_beacon/.gitignore‎
Lines changed: 0 additions & 3 deletions b/‎Long_LLM/activation_beacon/.gitignore‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎Long_LLM/activation_beacon/README.md‎
Lines changed: 97 additions & 7 deletions b/‎Long_LLM/activation_beacon/README.md‎
Lines changed: 97 additions & 7 deletions
diff --git a/‎…ivation_beacon/new/data/config/code.json‎ ‎…/activation_beacon/data/config/code.json‎Long_LLM/activation_beacon/new/data/config/code.json renamed to Long_LLM/activation_beacon/data/config/code.json b/‎…ivation_beacon/new/data/config/code.json‎ ‎…/activation_beacon/data/config/code.json‎Long_LLM/activation_beacon/new/data/config/code.json renamed to Long_LLM/activation_beacon/data/config/code.json
diff --git a/‎…ivation_beacon/new/data/config/even.json‎ ‎…/activation_beacon/data/config/even.json‎Long_LLM/activation_beacon/new/data/config/even.json renamed to Long_LLM/activation_beacon/data/config/even.json b/‎…ivation_beacon/new/data/config/even.json‎ ‎…/activation_beacon/data/config/even.json‎Long_LLM/activation_beacon/new/data/config/even.json renamed to Long_LLM/activation_beacon/data/config/even.json
diff --git a/‎…beacon/new/data/config/fsdp-offload.yaml‎ ‎…ion_beacon/data/config/fsdp-offload.yaml‎Long_LLM/activation_beacon/new/data/config/fsdp-offload.yaml renamed to Long_LLM/activation_beacon/data/config/fsdp-offload.yaml b/‎…beacon/new/data/config/fsdp-offload.yaml‎ ‎…ion_beacon/data/config/fsdp-offload.yaml‎Long_LLM/activation_beacon/new/data/config/fsdp-offload.yaml renamed to Long_LLM/activation_beacon/data/config/fsdp-offload.yaml
diff --git a/‎…ivation_beacon/new/data/config/fsdp.yaml‎ ‎…/activation_beacon/data/config/fsdp.yaml‎Long_LLM/activation_beacon/new/data/config/fsdp.yaml renamed to Long_LLM/activation_beacon/data/config/fsdp.yaml b/‎…ivation_beacon/new/data/config/fsdp.yaml‎ ‎…/activation_beacon/data/config/fsdp.yaml‎Long_LLM/activation_beacon/new/data/config/fsdp.yaml renamed to Long_LLM/activation_beacon/data/config/fsdp.yaml
diff --git a/‎…n_beacon/new/data/config/slimpajama.json‎ ‎…ation_beacon/data/config/slimpajama.json‎Long_LLM/activation_beacon/new/data/config/slimpajama.json renamed to Long_LLM/activation_beacon/data/config/slimpajama.json b/‎…n_beacon/new/data/config/slimpajama.json‎ ‎…ation_beacon/data/config/slimpajama.json‎Long_LLM/activation_beacon/new/data/config/slimpajama.json renamed to Long_LLM/activation_beacon/data/config/slimpajama.json
diff --git a/‎…new/data/config/zero3-infer-offload.yaml‎ ‎…con/data/config/zero3-infer-offload.yaml‎Long_LLM/activation_beacon/new/data/config/zero3-infer-offload.yaml renamed to Long_LLM/activation_beacon/data/config/zero3-infer-offload.yaml b/‎…new/data/config/zero3-infer-offload.yaml‎ ‎…con/data/config/zero3-infer-offload.yaml‎Long_LLM/activation_beacon/new/data/config/zero3-infer-offload.yaml renamed to Long_LLM/activation_beacon/data/config/zero3-infer-offload.yaml
diff --git a/‎…_beacon/new/data/config/zero3-infer.yaml‎ ‎…tion_beacon/data/config/zero3-infer.yaml‎Long_LLM/activation_beacon/new/data/config/zero3-infer.yaml renamed to Long_LLM/activation_beacon/data/config/zero3-infer.yaml b/‎…_beacon/new/data/config/zero3-infer.yaml‎ ‎…tion_beacon/data/config/zero3-infer.yaml‎Long_LLM/activation_beacon/new/data/config/zero3-infer.yaml renamed to Long_LLM/activation_beacon/data/config/zero3-infer.yaml
diff --git a/‎…n/new/data/deepspeed/stage2-offload.json‎ ‎…eacon/data/deepspeed/stage2-offload.json‎Long_LLM/activation_beacon/new/data/deepspeed/stage2-offload.json renamed to Long_LLM/activation_beacon/data/deepspeed/stage2-offload.json b/‎…n/new/data/deepspeed/stage2-offload.json‎ ‎…eacon/data/deepspeed/stage2-offload.json‎Long_LLM/activation_beacon/new/data/deepspeed/stage2-offload.json renamed to Long_LLM/activation_beacon/data/deepspeed/stage2-offload.json
@@ -1,9 +1,99 @@
-<div align="center">
-<h1>Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon [<a href="https://arxiv.org/abs/2401.03462">paper</a>]</h1>
-</div>
+# Activation-Beacon
 
-This is the codebase for Activation Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM through compressing KV cache. 
+[Activation Beacon](https://arxiv.org/abs/2401.03462) is a plug-in module to transformer-based LLMs that enables effective, efficient, and flexible compression of long contexts.
 
-## File structure:
-- The [old](./old/) folder contains our initial implementation of Activation Beacon for Llama-2. You can use the code in it to reproduce the training/evaluation of the Llama-2 based model shown in our paper.
-- The [new](./new/) folder contains **newer** implementation of Activation Beacon. It supports more LLMs, including Mistral, Llama-3, and Qwen-2. It also supports more features, including **Deepspeed Zero3 training**, **Flash-Attention-2**, adding **chat template** in training and inference, and **evaluating on more tasks**. However, code in this folder are under development and subject to change in the future.
+This folder contains the newer code for activation beacon. It supports more LLMs, including Mistral, Llama-3, and Qwen-2. It also supports more features, including **Deepspeed Zero3 training**, **Flash-Attention-2**, adding **chat template** in training and inference, and **evaluating on more tasks**. However, code in this folder are under development and subject to change in the future.
+
+## Environment
+```bash
+conda create beacon python=3.10.14
+
+conda activate beacon
+
+# You may need to adjust the cuda version
+conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
+pip install transformers deepspeed accelerate datasets peft pandas seaborn rouge fuzzywuzzy jieba python-Levenshtein
+pip install flash-attn --no-build-isolation
+```
+
+## Usage
+```python
+import json
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_id = "namespace-Pt/beacon-qwen-2-7b-instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id, 
+    trust_remote_code=True, 
+    torch_dtype=torch.bfloat16, 
+    attn_implementation="flash_attention_2"
+)
+
+model = model.cuda().eval()
+
+with torch.no_grad():
+  # short context
+  messages = [{"role": "user", "content": "Tell me about yourself."}]
+  inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to("cuda")
+  outputs = model.generate(**inputs, max_new_tokens=50)
+  print(f"Input Length: {inputs['input_ids'].shape[1]}")
+  print(f"Output:       {repr(tokenizer.decode(outputs[0], skip_special_tokens=True))}")
+
+  # reset memory before new generation task
+  model.memory.reset()
+
+  # long context
+  with open("data/toy/infbench.json", encoding="utf-8") as f:
+    example = json.load(f)
+  messages = [{"role": "user", "content": example["context"]}]
+  inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to("cuda")
+  outputs = model.generate(**inputs, do_sample=False, top_p=1, temperature=1, max_new_tokens=20)[:, inputs["input_ids"].shape[1]:]
+  print("*"*20)
+  print(f"Input Length: {inputs['input_ids'].shape[1]}")
+  print(f"Answers:      {example['answer']}")
+  print(f"Prediction:   {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
+```
+**NOTE**: It's okay to see warnings like `This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (32768). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` Just ignore it.
+
+
+## Data
+You should download the data for fine-tuning & evaluation then untar the file at anywhere you prefer, e.g. `/data`:
+```bash
+# feel free to alternate /data to your prefered location
+wget https://huggingface.co/datasets/namespace-Pt/projects/resolve/main/long-llm.tar.gz?download=true -O /data/long-llm.tar.gz
+
+cd /data
+tar -xzvf long-llm.tar.gz
+```
+
+**IMPORTANT NOTE**
+
+For any path specified for `train_data` and `eval_data`: if it is prefixed with `long-llm:`, it will be solved to the relative path against [`data_root`](./src/args.py). 
+  - e.g. `long-llm:lm/pg19.json` becomes `${data_root}/lm/pg19.json`
+  - you can modify the default value of [`data_root`](./src/args.py), so that you don't need to type it for each command.
+
+
+## Training
+See [training section](./docs/training.md).
+
+## Evaluation
+See [evaluation section](./docs/evaluation.md). 
+
+
+## Citation
+If you find this repository useful, please give us a star ⭐.
+
+To cite our work:
+```
+@misc{zhang2024soaring,
+    title={Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon}, 
+    author={Peitian Zhang and Zheng Liu and Shitao Xiao and Ninglu Shao and Qiwei Ye and Zhicheng Dou},
+    year={2024},
+    eprint={2401.03462},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```