Skip to content

Commit d477e6f

Browse files
committed
update readme
1 parent 719d971 commit d477e6f

2 files changed

Lines changed: 38 additions & 39 deletions

File tree

Long_LLM/activation_beacon/new/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,44 @@ See [training section](./docs/training.md). **The training script for Mistral wi
5858
## Evaluation
5959
See [evaluation section](./docs/evaluation.md).
6060

61+
The performance of [activation-beacon-mistral-7b](https://huggingface.co/namespace-Pt/activation-beacon-mistral-7b) is shown below.
62+
63+
- [Needle in a Haystack](https://github.com/gkamradt/LLMTest_NeedleInAHaystack):
64+
We evaluate the model on the Needle-In-A-HayStack task using the official setting.
65+
<img src="imgs/needle.png"></img>
66+
67+
68+
- [Longbench](https://arxiv.org/abs/2308.14508): We evaluate the model on LongBench using 32K context length.
69+
70+
|Model|Single Doc QA|Multi Doc QA|Summarization|
71+
|:-:|:-:|:-:|:-:|
72+
|[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|32.70|25.87|27.42|
73+
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|33.71|36.08|23.47|
74+
|Activation-Beacon-Mistral-7B|39.14|43.27|29.52|
75+
76+
- [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf): We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf).
77+
78+
|Model|LongBookQA Eng|LongBookSum Eng|
79+
|:-:|:-:|:-:|
80+
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|9.55|9.09|
81+
|Activation-Beacon-Mistral-7B|26.81|12.49|
82+
83+
- [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/): We evaluate the model on Topic Retrieval task with `[5,10,15,20,25,30,40,50,60,70]` topics.
84+
<img src="imgs/topic.png"></img>
85+
86+
- [PG19 Perplexity](https://arxiv.org/abs/2309.12307): We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
87+
88+
|Model|Perplexity|Latency (s)|Memory (GB)|
89+
|:-:|:-:|:-:|:-:|
90+
|[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
91+
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
92+
|Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
93+
94+
- [Passkey Retrieval](https://arxiv.org/abs/2309.12307): We evaluate the model on Passkey Retrieval task using the official setting.
95+
<img src="imgs/passkey.png"></img>
96+
97+
98+
6199
## Citation
62100
If you find this repository useful, please give us a star ⭐.
63101

Long_LLM/activation_beacon/new/docs/evaluation.md

Lines changed: 0 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -81,42 +81,3 @@ torchrun --nproc_per_node 8 -m main.eval_longbench --data_root $data_root --mode
8181
# infbench
8282
python -m main.eval_infbench --data_root $data_root --model_name_or_path $model --attn_impl flash_attention_2 --chat_template mistral --enable_tp --max_length 128000
8383
```
84-
85-
## For Reference
86-
87-
The performance of [activation-beacon-mistral-7b](https://huggingface.co/namespace-Pt/activation-beacon-mistral-7b) is shown below.
88-
89-
- [Needle in a Haystack](https://github.com/gkamradt/LLMTest_NeedleInAHaystack):
90-
We evaluate the model on the Needle-In-A-HayStack task using the official setting.
91-
<img src="../imgs/needle.png"></img>
92-
93-
94-
- [Longbench](https://arxiv.org/abs/2308.14508): We evaluate the model on LongBench using 32K context length.
95-
96-
|Model|Single Doc QA|Multi Doc QA|Summarization|
97-
|:-:|:-:|:-:|:-:|
98-
|[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|32.70|25.87|27.42|
99-
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|33.71|36.08|23.47|
100-
|Activation-Beacon-Mistral-7B|39.14|43.27|29.52|
101-
102-
- [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf): We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf).
103-
104-
|Model|LongBookQA Eng|LongBookSum Eng|
105-
|:-:|:-:|:-:|
106-
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|9.55|9.09|
107-
|Activation-Beacon-Mistral-7B|26.81|12.49|
108-
109-
- [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/): We evaluate the model on Topic Retrieval task with `[5,10,20,30,40,50,60,70]` topics.
110-
<img src="../imgs/topic.png"></img>
111-
112-
- [PG19 Perplexity](https://arxiv.org/abs/2309.12307): We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
113-
114-
|Model|Perplexity|Latency (s)|Memory (GB)|
115-
|:-:|:-:|:-:|:-:|
116-
|[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
117-
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
118-
|Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
119-
120-
- [Passkey Retrieval](https://arxiv.org/abs/2309.12307): We evaluate the model on Passkey Retrieval task using the official setting.
121-
<img src="../imgs/passkey.png"></img>
122-

0 commit comments

Comments
 (0)