You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf): We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf).
-[Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/): We evaluate the model on Topic Retrieval task with `[5,10,15,20,25,30,40,50,60,70]` topics.
84
+
<imgsrc="imgs/topic.png"></img>
85
+
86
+
-[PG19 Perplexity](https://arxiv.org/abs/2309.12307): We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
87
+
88
+
|Model|Perplexity|Latency (s)|Memory (GB)|
89
+
|:-:|:-:|:-:|:-:|
90
+
|[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
91
+
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
92
+
|Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
93
+
94
+
-[Passkey Retrieval](https://arxiv.org/abs/2309.12307): We evaluate the model on Passkey Retrieval task using the official setting.
95
+
<imgsrc="imgs/passkey.png"></img>
96
+
97
+
98
+
61
99
## Citation
62
100
If you find this repository useful, please give us a star ⭐.
-[InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf): We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf).
-[Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/): We evaluate the model on Topic Retrieval task with `[5,10,20,30,40,50,60,70]` topics.
110
-
<imgsrc="../imgs/topic.png"></img>
111
-
112
-
-[PG19 Perplexity](https://arxiv.org/abs/2309.12307): We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
113
-
114
-
|Model|Perplexity|Latency (s)|Memory (GB)|
115
-
|:-:|:-:|:-:|:-:|
116
-
|[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
117
-
|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
118
-
|Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
119
-
120
-
-[Passkey Retrieval](https://arxiv.org/abs/2309.12307): We evaluate the model on Passkey Retrieval task using the official setting.
0 commit comments