Skip to content

Commit 0546200

Browse files
terrellnrkhuangtao
authored andcommitted
UPSTREAM: lib: Add xxhash module
Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an extremely fast non-cryptographic hash algorithm for checksumming. The zstd compression and decompression modules added in the next patch require xxhash. I extracted it out from zstd since it is useful on its own. I copied the code from the upstream XXHash source repository and translated it into kernel style. I ran benchmarks and tests in the kernel and tests in userland. I benchmarked xxhash as a special character device. I ran in four modes, no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute hashes on the copied data. I also ran it with four different buffer sizes. The benchmark file is located in the upstream zstd source repository under `contrib/linux-kernel/xxhash_test.c` [1]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs` from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large. Run the following commands for the benchmark: modprobe xxhash_test mknod xxhash_test c 245 0 time cp filesystem.squashfs xxhash_test The time is reported by the time of the userland `cp`. The GB/s is computed with 1,536,217,008 B / time(buffer size, hash) which includes the time to copy from userland. The Normalized GB/s is computed with 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)). | Buffer Size (B) | Hash | Time (s) | GB/s | Adjusted GB/s | |-----------------|-------|----------|------|---------------| | 1024 | none | 0.408 | 3.77 | - | | 1024 | xxh32 | 0.649 | 2.37 | 6.37 | | 1024 | xxh64 | 0.542 | 2.83 | 11.46 | | 1024 | crc32 | 1.290 | 1.19 | 1.74 | | 4096 | none | 0.380 | 4.04 | - | | 4096 | xxh32 | 0.645 | 2.38 | 5.79 | | 4096 | xxh64 | 0.500 | 3.07 | 12.80 | | 4096 | crc32 | 1.168 | 1.32 | 1.95 | | 8192 | none | 0.351 | 4.38 | - | | 8192 | xxh32 | 0.614 | 2.50 | 5.84 | | 8192 | xxh64 | 0.464 | 3.31 | 13.60 | | 8192 | crc32 | 1.163 | 1.32 | 1.89 | | 16384 | none | 0.346 | 4.43 | - | | 16384 | xxh32 | 0.590 | 2.60 | 6.30 | | 16384 | xxh64 | 0.466 | 3.30 | 12.80 | | 16384 | crc32 | 1.183 | 1.30 | 1.84 | Tested in userland using the test-suite in the zstd repo under `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the kernel functions. A line in each branch of every function in `xxhash.c` was commented out to ensure that the test-suite fails. Additionally tested while testing zstd and with SMHasher [3]. [1] https://phabricator.intern.facebook.com/P57526246 [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp [3] https://github.com/aappleby/smhasher zstd source repository: https://github.com/facebook/zstd XXHash source repository: https://github.com/cyan4973/xxhash Change-Id: Ibb5ffee816e2593800c07263719bd1d4b802d8de Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Ziyuan Xu <xzy.xu@rock-chips.com> (cherry-picked from 5d2405227a9eaea48e8cc95756a06d407b11f141)
1 parent 6cdbb3d commit 0546200

4 files changed

Lines changed: 740 additions & 0 deletions

File tree

include/linux/xxhash.h

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
/*
2+
* xxHash - Extremely Fast Hash algorithm
3+
* Copyright (C) 2012-2016, Yann Collet.
4+
*
5+
* BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
6+
*
7+
* Redistribution and use in source and binary forms, with or without
8+
* modification, are permitted provided that the following conditions are
9+
* met:
10+
*
11+
* * Redistributions of source code must retain the above copyright
12+
* notice, this list of conditions and the following disclaimer.
13+
* * Redistributions in binary form must reproduce the above
14+
* copyright notice, this list of conditions and the following disclaimer
15+
* in the documentation and/or other materials provided with the
16+
* distribution.
17+
*
18+
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19+
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20+
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
21+
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22+
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
23+
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
24+
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
25+
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
26+
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27+
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28+
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29+
*
30+
* This program is free software; you can redistribute it and/or modify it under
31+
* the terms of the GNU General Public License version 2 as published by the
32+
* Free Software Foundation. This program is dual-licensed; you may select
33+
* either version 2 of the GNU General Public License ("GPL") or BSD license
34+
* ("BSD").
35+
*
36+
* You can contact the author at:
37+
* - xxHash homepage: http://cyan4973.github.io/xxHash/
38+
* - xxHash source repository: https://github.com/Cyan4973/xxHash
39+
*/
40+
41+
/*
42+
* Notice extracted from xxHash homepage:
43+
*
44+
* xxHash is an extremely fast Hash algorithm, running at RAM speed limits.
45+
* It also successfully passes all tests from the SMHasher suite.
46+
*
47+
* Comparison (single thread, Windows Seven 32 bits, using SMHasher on a Core 2
48+
* Duo @3GHz)
49+
*
50+
* Name Speed Q.Score Author
51+
* xxHash 5.4 GB/s 10
52+
* CrapWow 3.2 GB/s 2 Andrew
53+
* MumurHash 3a 2.7 GB/s 10 Austin Appleby
54+
* SpookyHash 2.0 GB/s 10 Bob Jenkins
55+
* SBox 1.4 GB/s 9 Bret Mulvey
56+
* Lookup3 1.2 GB/s 9 Bob Jenkins
57+
* SuperFastHash 1.2 GB/s 1 Paul Hsieh
58+
* CityHash64 1.05 GB/s 10 Pike & Alakuijala
59+
* FNV 0.55 GB/s 5 Fowler, Noll, Vo
60+
* CRC32 0.43 GB/s 9
61+
* MD5-32 0.33 GB/s 10 Ronald L. Rivest
62+
* SHA1-32 0.28 GB/s 10
63+
*
64+
* Q.Score is a measure of quality of the hash function.
65+
* It depends on successfully passing SMHasher test set.
66+
* 10 is a perfect score.
67+
*
68+
* A 64-bits version, named xxh64 offers much better speed,
69+
* but for 64-bits applications only.
70+
* Name Speed on 64 bits Speed on 32 bits
71+
* xxh64 13.8 GB/s 1.9 GB/s
72+
* xxh32 6.8 GB/s 6.0 GB/s
73+
*/
74+
75+
#ifndef XXHASH_H
76+
#define XXHASH_H
77+
78+
#include <linux/types.h>
79+
80+
/*-****************************
81+
* Simple Hash Functions
82+
*****************************/
83+
84+
/**
85+
* xxh32() - calculate the 32-bit hash of the input with a given seed.
86+
*
87+
* @input: The data to hash.
88+
* @length: The length of the data to hash.
89+
* @seed: The seed can be used to alter the result predictably.
90+
*
91+
* Speed on Core 2 Duo @ 3 GHz (single thread, SMHasher benchmark) : 5.4 GB/s
92+
*
93+
* Return: The 32-bit hash of the data.
94+
*/
95+
uint32_t xxh32(const void *input, size_t length, uint32_t seed);
96+
97+
/**
98+
* xxh64() - calculate the 64-bit hash of the input with a given seed.
99+
*
100+
* @input: The data to hash.
101+
* @length: The length of the data to hash.
102+
* @seed: The seed can be used to alter the result predictably.
103+
*
104+
* This function runs 2x faster on 64-bit systems, but slower on 32-bit systems.
105+
*
106+
* Return: The 64-bit hash of the data.
107+
*/
108+
uint64_t xxh64(const void *input, size_t length, uint64_t seed);
109+
110+
/*-****************************
111+
* Streaming Hash Functions
112+
*****************************/
113+
114+
/*
115+
* These definitions are only meant to allow allocation of XXH state
116+
* statically, on stack, or in a struct for example.
117+
* Do not use members directly.
118+
*/
119+
120+
/**
121+
* struct xxh32_state - private xxh32 state, do not use members directly
122+
*/
123+
struct xxh32_state {
124+
uint32_t total_len_32;
125+
uint32_t large_len;
126+
uint32_t v1;
127+
uint32_t v2;
128+
uint32_t v3;
129+
uint32_t v4;
130+
uint32_t mem32[4];
131+
uint32_t memsize;
132+
};
133+
134+
/**
135+
* struct xxh32_state - private xxh64 state, do not use members directly
136+
*/
137+
struct xxh64_state {
138+
uint64_t total_len;
139+
uint64_t v1;
140+
uint64_t v2;
141+
uint64_t v3;
142+
uint64_t v4;
143+
uint64_t mem64[4];
144+
uint32_t memsize;
145+
};
146+
147+
/**
148+
* xxh32_reset() - reset the xxh32 state to start a new hashing operation
149+
*
150+
* @state: The xxh32 state to reset.
151+
* @seed: Initialize the hash state with this seed.
152+
*
153+
* Call this function on any xxh32_state to prepare for a new hashing operation.
154+
*/
155+
void xxh32_reset(struct xxh32_state *state, uint32_t seed);
156+
157+
/**
158+
* xxh32_update() - hash the data given and update the xxh32 state
159+
*
160+
* @state: The xxh32 state to update.
161+
* @input: The data to hash.
162+
* @length: The length of the data to hash.
163+
*
164+
* After calling xxh32_reset() call xxh32_update() as many times as necessary.
165+
*
166+
* Return: Zero on success, otherwise an error code.
167+
*/
168+
int xxh32_update(struct xxh32_state *state, const void *input, size_t length);
169+
170+
/**
171+
* xxh32_digest() - produce the current xxh32 hash
172+
*
173+
* @state: Produce the current xxh32 hash of this state.
174+
*
175+
* A hash value can be produced at any time. It is still possible to continue
176+
* inserting input into the hash state after a call to xxh32_digest(), and
177+
* generate new hashes later on, by calling xxh32_digest() again.
178+
*
179+
* Return: The xxh32 hash stored in the state.
180+
*/
181+
uint32_t xxh32_digest(const struct xxh32_state *state);
182+
183+
/**
184+
* xxh64_reset() - reset the xxh64 state to start a new hashing operation
185+
*
186+
* @state: The xxh64 state to reset.
187+
* @seed: Initialize the hash state with this seed.
188+
*/
189+
void xxh64_reset(struct xxh64_state *state, uint64_t seed);
190+
191+
/**
192+
* xxh64_update() - hash the data given and update the xxh64 state
193+
* @state: The xxh64 state to update.
194+
* @input: The data to hash.
195+
* @length: The length of the data to hash.
196+
*
197+
* After calling xxh64_reset() call xxh64_update() as many times as necessary.
198+
*
199+
* Return: Zero on success, otherwise an error code.
200+
*/
201+
int xxh64_update(struct xxh64_state *state, const void *input, size_t length);
202+
203+
/**
204+
* xxh64_digest() - produce the current xxh64 hash
205+
*
206+
* @state: Produce the current xxh64 hash of this state.
207+
*
208+
* A hash value can be produced at any time. It is still possible to continue
209+
* inserting input into the hash state after a call to xxh64_digest(), and
210+
* generate new hashes later on, by calling xxh64_digest() again.
211+
*
212+
* Return: The xxh64 hash stored in the state.
213+
*/
214+
uint64_t xxh64_digest(const struct xxh64_state *state);
215+
216+
/*-**************************
217+
* Utils
218+
***************************/
219+
220+
/**
221+
* xxh32_copy_state() - copy the source state into the destination state
222+
*
223+
* @src: The source xxh32 state.
224+
* @dst: The destination xxh32 state.
225+
*/
226+
void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src);
227+
228+
/**
229+
* xxh64_copy_state() - copy the source state into the destination state
230+
*
231+
* @src: The source xxh64 state.
232+
* @dst: The destination xxh64 state.
233+
*/
234+
void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src);
235+
236+
#endif /* XXHASH_H */

lib/Kconfig

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,9 @@ config CRC8
185185
when they need to do cyclic redundancy check according CRC8
186186
algorithm. Module will be called crc8.
187187

188+
config XXHASH
189+
tristate
190+
188191
config AUDIT_GENERIC
189192
bool
190193
depends on AUDIT && !AUDIT_ARCH

lib/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ obj-$(CONFIG_CRC32) += crc32.o
9393
obj-$(CONFIG_CRC7) += crc7.o
9494
obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
9595
obj-$(CONFIG_CRC8) += crc8.o
96+
obj-$(CONFIG_XXHASH) += xxhash.o
9697
obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
9798

9899
obj-$(CONFIG_842_COMPRESS) += 842/

0 commit comments

Comments
 (0)