|
53 | 53 | // speaking, words. They're just spans of code points that frequently |
54 | 54 | // occur together. They are ordered shortest to longest. |
55 | 55 | // |
| 56 | +// - If the translation uses a lot of code points or widely spaced code points, |
| 57 | +// then the huffman table entries are UTF-16 code points. But if the translation |
| 58 | +// uses only ASCII 7-bit code points plus a SMALL range of higher code points that |
| 59 | +// still fit in 8 bits, translation_offset and translation_offstart are used to |
| 60 | +// renumber the code points so that they still fit within 8 bits. (it's very beneficial |
| 61 | +// for mchar_t to be 8 bits instead of 16!) |
| 62 | +// |
56 | 63 | // - dictionary entries are non-overlapping, and the _ending_ index of each |
57 | 64 | // entry is stored in an array. A count of words of each length, from |
58 | 65 | // minlen to maxlen, is given in the array called wlencount. From |
59 | 66 | // this small array, the start and end of the N'th word can be |
60 | 67 | // calculated by an efficient, small loop. (A bit of time is traded |
61 | 68 | // to reduce the size of this table indicating lengths) |
62 | 69 | // |
| 70 | +// - Value 1 ('\1') is used to indicate that a QSTR number follows. the |
| 71 | +// QSTR is encoded as a fixed number of bits (translation_qstr_bits), e.g., |
| 72 | +// 10 bits if the highest core qstr is from 512 to 1023 inclusive. |
| 73 | +// (maketranslationdata uses a simple heuristic where any qstr >= 3 |
| 74 | +// characters long is encoded in this way; this is simple but probably not |
| 75 | +// optimal. In fact, the rule of >= 2 characters is better for SOME languages |
| 76 | +// on SOME boards.) |
| 77 | +// |
63 | 78 | // The "data" / "tail" construct is so that the struct's last member is a |
64 | 79 | // "flexible array". However, the _only_ member is not permitted to be |
65 | 80 | // a flexible member, so we have to declare the first byte as a separate |
|
0 commit comments