Skip to content

Commit 29138ee

Browse files
authored
Fix typo in llm_int8.html
langauge -> language
1 parent 1c14551 commit 29138ee

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

docs/neox/utils/llm_int8.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
<h1>LLM.int() on GPT-NeoX</h1>
7575
<p>This implements a utility function to transform a <code class="highlight"><span></span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span></code>
7676
layer to LLM.int8() linear layer.</p>
77-
<p><a href="https://papers.labml.ai/paper/eb2bcaee1d0011edaa66a71c10a887e7">LLM.int8() paper</a> shows you can use int8 quantization while handling outliers to reduce memory footprint without performance degradation in large language models. They convert weights and inputs to scaled 8-bit integers and does matrix multiplication producing int32 results which is then converted back to float16 and rescaled. They show that in large langauge models, some features can give extreme values (outliers) that dominate the model&#x27;s output. These features get clamped in 8-bit integer space which causes the model performance to degrade. As a solution they pick these outliers (greater than a specified threshold) and compute their multiplications separately in float16 space. Since the percentage of outliers is around 0.01% this doesn&#x27;t increase memory usage, and prevents the model from degrading performance.</p>
77+
<p><a href="https://papers.labml.ai/paper/eb2bcaee1d0011edaa66a71c10a887e7">LLM.int8() paper</a> shows you can use int8 quantization while handling outliers to reduce memory footprint without performance degradation in large language models. They convert weights and inputs to scaled 8-bit integers and does matrix multiplication producing int32 results which is then converted back to float16 and rescaled. They show that in large language models, some features can give extreme values (outliers) that dominate the model&#x27;s output. These features get clamped in 8-bit integer space which causes the model performance to degrade. As a solution they pick these outliers (greater than a specified threshold) and compute their multiplications separately in float16 space. Since the percentage of outliers is around 0.01% this doesn&#x27;t increase memory usage, and prevents the model from degrading performance.</p>
7878
<p>The code to transform GPT-NoeX layers is defined in <a href="../model.html#post_load_prepare">model.py</a>.</p>
7979
<p>Here are example uses of GPT-NeoX with int8 quantization.</p>
8080
<ul><li><a href="../samples/llm_int8.html">Generate Text</a> </li>
@@ -240,4 +240,4 @@ <h2>Transform a <code class="highlight"><span></span><span class="n">nn</span><
240240
handleImages()
241241
</script>
242242
</body>
243-
</html>
243+
</html>

0 commit comments

Comments
 (0)