You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Long_LLM/activation_beacon/README.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,8 @@
2
2
<h1>Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon [<ahref="https://arxiv.org/abs/2401.03462">paper</a>]</h1>
3
3
</div>
4
4
5
-
This is the codebase for Activation Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by **x100** times. Currently we only apply activation beacon to [Llama-2-chat-7b](https://huggingface.co/namespace-Pt/activation-beacon-llama2-7b-chat) and [Mistral-7B-Instruct-v0.2](https://huggingface.co/namespace-Pt/activation-beacon-mistral-7b). More LLMs will be supported in the future.
6
-
5
+
This is the codebase for Activation Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM through compressing KV cache.
7
6
8
7
## File structure:
9
8
- The [old](./old/) folder contains our initial implementation of Activation Beacon for Llama-2. You can use the code in it to reproduce the training/evaluation of the Llama-2 based model shown in our paper.
10
-
- The [new](./new/) folder contains **newer** implementation of Activation Beacon for both Llama-2 and Mistral. It also supports more features, including **Deepspeed Zero3 training**, adding **chat template** in training and inference, and **evaluating on more tasks**. However, code in this folder are under development and subject to change in the future.
9
+
- The [new](./new/) folder contains **newer** implementation of Activation Beacon. It supports more LLMs, including Mistral, Llama-3, and Qwen-2. It also supports more features, including **Deepspeed Zero3 training**, **Flash-Attention-2**, adding **chat template** in training and inference, and **evaluating on more tasks**. However, code in this folder are under development and subject to change in the future.
Copy file name to clipboardExpand all lines: Long_LLM/activation_beacon/new/README.md
+25-9Lines changed: 25 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,26 @@
1
1
# Activation-Beacon
2
2
3
-
This folder contains the newer code for activation beacon with the support of **Mistral models**, **Deepspeed Zero3 training**, **chat templates**, and **more evaluation tasks**. The code here are under development and subject to change in the future.
3
+
[Activation Beacon](https://arxiv.org/abs/2401.03462) compresses the original KV into fewer yet more compact states (a.k.a. beacons) and hence enabling the LLM to perceive longer context given its fixed context window. It is known for the following features:
4
+
-**Effective**
5
+
- there is little information loss given a compression ratio of 2, 4, and 8;
6
+
-**Efficient**
7
+
- it drastically reduces the GPU consumption of KV cache;
8
+
-**Compatible**
9
+
- it can work together with position extrapolation (e.g. YaRN) to further extends the context length; it can also work with grouped query attention to further reduce the KV cache size;
10
+
-**Low-Cost**
11
+
- it is light-weight and can be efficiently trained with roughly 1B tokens.
12
+
13
+
This folder contains the newer code for activation beacon. It supports more LLMs, including Mistral, Llama-3, and Qwen-2. It also supports more features, including **Deepspeed Zero3 training**, **Flash-Attention-2**, adding **chat template** in training and inference, and **evaluating on more tasks**. However, code in this folder are under development and subject to change in the future.
For any path specified for `train_data` and `eval_data`: if it is prefixed with `activation-beacon:`, it will be solved to the relative path against [`data_root`](./src/args.py).
67
-
- e.g. `activation-beacon:lm/pg19.json` becomes `${data_root}/lm/pg19.json`
82
+
For any path specified for `train_data` and `eval_data`: if it is prefixed with `long-llm:`, it will be solved to the relative path against [`data_root`](./src/args.py).
83
+
- e.g. `long-llm:lm/pg19.json` becomes `${data_root}/lm/pg19.json`
68
84
- you can modify the default value of [`data_root`](./src/args.py), so that you don't need to type it for each command.
All evaluation results will be saved at `data/results`.
34
-
35
-
36
-
37
-
## Evaluating Full-Attention Models
38
-
39
-
Full-attention models cannot run with more than 32K context length on a single A800 GPU. Parallel strategies are required. We use [`tensor_parallel`](https://github.com/BlackSamorez/tensor_parallel). You should create anothr environtment while downgrade to `transformers==4.35.1` and install `tensor_parallel`:
40
-
```bash
41
-
conda create full --clone beacon
42
-
pip install transformers==4.35.1 tensor_parallel
43
-
```
44
-
45
-
Then, run the following commands: (feel free to switch `mistralai/Mistral-7B-Instruct-v0.2` to any models on huggingface)
0 commit comments