update readme

namespace-Pt · namespace-Pt · commit d477e6fed065 · 2024-04-25T11:21:54.000+08:00
diff --git a/Long_LLM/activation_beacon/new/README.md b/Long_LLM/activation_beacon/new/README.md
@@ -58,6 +58,44 @@ See [training section](./docs/training.md). **The training script for Mistral wi
 ## Evaluation
 See [evaluation section](./docs/evaluation.md). 
 
+The performance of [activation-beacon-mistral-7b](https://huggingface.co/namespace-Pt/activation-beacon-mistral-7b) is shown below.
+
+- [Needle in a Haystack](https://github.com/gkamradt/LLMTest_NeedleInAHaystack):
+We evaluate the model on the Needle-In-A-HayStack task using the official setting.
+<img src="imgs/needle.png"></img>
+
+
+- [Longbench](https://arxiv.org/abs/2308.14508): We evaluate the model on LongBench using 32K context length.
+
+    |Model|Single Doc QA|Multi Doc QA|Summarization|
+    |:-:|:-:|:-:|:-:|
+    |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|32.70|25.87|27.42|
+    |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|33.71|36.08|23.47|
+    |Activation-Beacon-Mistral-7B|39.14|43.27|29.52|
+
+- [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf): We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf).
+
+    |Model|LongBookQA Eng|LongBookSum Eng|
+    |:-:|:-:|:-:|
+    |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|9.55|9.09|
+    |Activation-Beacon-Mistral-7B|26.81|12.49|
+
+- [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/): We evaluate the model on Topic Retrieval task with `[5,10,15,20,25,30,40,50,60,70]` topics.
+<img src="imgs/topic.png"></img>
+
+- [PG19 Perplexity](https://arxiv.org/abs/2309.12307): We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
+
+    |Model|Perplexity|Latency (s)|Memory (GB)|
+    |:-:|:-:|:-:|:-:|
+    |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
+    |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
+    |Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
+
+- [Passkey Retrieval](https://arxiv.org/abs/2309.12307): We evaluate the model on Passkey Retrieval task using the official setting.
+<img src="imgs/passkey.png"></img>
+
+
+
 ## Citation
 If you find this repository useful, please give us a star ⭐.
 
diff --git a/Long_LLM/activation_beacon/new/docs/evaluation.md b/Long_LLM/activation_beacon/new/docs/evaluation.md
@@ -81,42 +81,3 @@ torchrun --nproc_per_node 8 -m main.eval_longbench --data_root $data_root --mode
 # infbench
 python -m main.eval_infbench --data_root $data_root --model_name_or_path $model --attn_impl flash_attention_2 --chat_template mistral --enable_tp --max_length 128000
 ```
-
-## For Reference
-
-The performance of [activation-beacon-mistral-7b](https://huggingface.co/namespace-Pt/activation-beacon-mistral-7b) is shown below.
-
-- [Needle in a Haystack](https://github.com/gkamradt/LLMTest_NeedleInAHaystack):
-We evaluate the model on the Needle-In-A-HayStack task using the official setting.
-<img src="../imgs/needle.png"></img>
-
-
-- [Longbench](https://arxiv.org/abs/2308.14508): We evaluate the model on LongBench using 32K context length.
-
-    |Model|Single Doc QA|Multi Doc QA|Summarization|
-    |:-:|:-:|:-:|:-:|
-    |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|32.70|25.87|27.42|
-    |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|33.71|36.08|23.47|
-    |Activation-Beacon-Mistral-7B|39.14|43.27|29.52|
-
-- [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf): We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf).
-
-    |Model|LongBookQA Eng|LongBookSum Eng|
-    |:-:|:-:|:-:|
-    |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|9.55|9.09|
-    |Activation-Beacon-Mistral-7B|26.81|12.49|
-
-- [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/): We evaluate the model on Topic Retrieval task with `[5,10,20,30,40,50,60,70]` topics.
-<img src="../imgs/topic.png"></img>
-
-- [PG19 Perplexity](https://arxiv.org/abs/2309.12307): We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
-
-    |Model|Perplexity|Latency (s)|Memory (GB)|
-    |:-:|:-:|:-:|:-:|
-    |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
-    |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
-    |Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
-
-- [Passkey Retrieval](https://arxiv.org/abs/2309.12307): We evaluate the model on Passkey Retrieval task using the official setting.
-<img src="../imgs/passkey.png"></img>
-