diff --git a/src/main/docs/algorithm-profiles.adoc b/src/main/docs/algorithm-profiles.adoc index f62b785..4d455e0 100644 --- a/src/main/docs/algorithm-profiles.adoc +++ b/src/main/docs/algorithm-profiles.adoc @@ -10,9 +10,9 @@ toc::[] Factories :: `LongHashFunction.city_1_1()`, `.city_1_1(long)`, `.city_1_1(long, long)` (`LongHashFunction.java:53-115`). Implementation :: -`net.openhft.hashing.CityAndFarmHash_1_1` ports Google’s CityHash64 v1.1 (`CityAndFarmHash_1_1.java`). +`net.openhft.hashing.CityAndFarmHash_1_1` ports Google's CityHash64 v1.1 (`CityAndFarmHash_1_1.java`). Key traits :: -* Normalises inputs to little-endian and forwards short-length cases to specialised mix routines (1–3, 4–7, 8–16 byte fast paths). +* Normalises inputs to little-endian and forwards short-length cases to specialised mix routines (1-3, 4-7, 8-16 byte fast paths). * Produces identical output across host endianness; big-endian incurs the expected byte swapping cost. * Provides seedless, single-seed, and dual-seed variants mirroring the upstream API. @@ -31,17 +31,17 @@ Key traits :: Factories :: `LongHashFunction.farmUo()`, `.farmUo(long)`, `.farmUo(long, long)` (`LongHashFunction.java:181-243`). Implementation :: -Also hosted in `CityAndFarmHash_1_1`, which covers the 1.1 update’s longer pipelines. +Also hosted in `CityAndFarmHash_1_1`, which covers the 1.1 update's longer pipelines. Key traits :: -* Maintains parity with Google’s C{pp} release for test vectors. -* Endianness neutral: always routes through an `Access` view that matches the algorithm’s little-endian assumptions. +* Maintains parity with Google's C{pp} release for test vectors. +* Endianness neutral: always routes through an `Access` view that matches the algorithm's little-endian assumptions. === MurmurHash3 Factories :: `LongHashFunction.murmur_3()`, `.murmur_3(long)` for 64-bit (`LongHashFunction.java:245-268`); `LongTupleHashFunction.murmur_3()`, `.murmur_3(long)` for 128-bit (`LongTupleHashFunction.java:35-69`). Implementation :: -`net.openhft.hashing.MurmurHash_3` adapts Austin Appleby’s x64 variants. +`net.openhft.hashing.MurmurHash_3` adapts Austin Appleby's x64 variants. It extends `DualHashFunction` so the 128-bit engine also exposes the low 64 bits through `LongHashFunction`. Key traits :: * Little-endian canonicalisation via `Access.byteOrder`. @@ -54,7 +54,7 @@ Factories :: Implementation :: `net.openhft.hashing.XxHash` ports the official XXH64 reference and keeps the unsigned prime constants as signed Java longs. Key traits :: -* Uses four-lane accumulation for ≥32 byte inputs, matching upstream behaviour bit-for-bit. +* Uses four-lane accumulation for >=32 byte inputs, matching upstream behaviour bit-for-bit. * Applies the canonical avalanche round in `XxHash.finalize` for all lengths. * Seeded and seedless instances differ only by the stored `seed()` override; serialisation preserves both forms. @@ -67,7 +67,7 @@ Implementation :: `net.openhft.hashing.XXH3` keeps the FARSH-derived 192 byte secret and streaming logic. It defines distinct entry points for 64-bit, 128-bit, and low-64-bit projections. Key traits :: -* Optimises for short messages with dedicated 1–3, 4–8, 9–16, 17–128, and 129–240 byte paths. +* Optimises for short messages with dedicated 1-3, 4-8, 9-16, 17-128, and 129-240 byte paths. * Uses `UnsafeAccess.INSTANCE.byteOrder(null, LITTLE_ENDIAN)` once to avoid per-call adapter allocation. * The 128-bit variant reuses the same mixing core; exposing the low 64 bits avoids extra copies for callers that only need a single `long`. @@ -76,10 +76,10 @@ Key traits :: Factories :: `LongHashFunction.wy_3()`, `.wy_3(long)` (`LongHashFunction.java:343-369`). Implementation :: -`net.openhft.hashing.WyHash` mirrors Wang Yi’s version 3 reference, including the `_wymum` 128-bit multiply-fold helper built on `Maths.unsignedLongMulXorFold`. +`net.openhft.hashing.WyHash` mirrors Wang Yi's version 3 reference, including the `_wymum` 128-bit multiply-fold helper built on `Maths.unsignedLongMulXorFold`. Key traits :: * Supports streaming chunks up to 256 bytes per loop iteration; beyond that it accumulates in 32 byte strides. -* Handles ≤3, ≤8, ≤16, ≤24, ≤32 byte inputs with the same branching as the C code. +* Handles <=3, <=8, <=16, <=24, <=32 byte inputs with the same branching as the C code. * Maintains deterministic output across architectures while acknowledging the performance hit on big-endian systems. === MetroHash (metrohash64_2) @@ -87,8 +87,8 @@ Key traits :: Factories :: `LongHashFunction.metro()`, `.metro(long)` (`LongHashFunction.java:371-389`). Implementation :: -`net.openhft.hashing.MetroHash` implements the 64-bit metrohash variant with the `_2` initialisation vector, matching the original author’s reference. +`net.openhft.hashing.MetroHash` implements the 64-bit metrohash variant with the `_2` initialisation vector, matching the original author's reference. Key traits :: -* Performs four-lane unrolled mixing for ≥32 byte inputs and cascades down to 16, 8, 4, 2, and 1 byte tails. +* Performs four-lane unrolled mixing for >=32 byte inputs and cascades down to 16, 8, 4, 2, and 1 byte tails. * Uses deterministic finalisation (`MetroHash.finalize`) shared by scalar and streaming paths. * Seeded instances override `seed()` and cache the pre-hashed `hashVoid()` constant to avoid re-computation. diff --git a/src/main/docs/architecture-overview.adoc b/src/main/docs/architecture-overview.adoc index 72775ef..c795d8d 100644 --- a/src/main/docs/architecture-overview.adoc +++ b/src/main/docs/architecture-overview.adoc @@ -15,7 +15,7 @@ It currently delivers 128-bit MurmurHash3 and XXH3 outputs and mirrors the singl === Memory Access Abstractions -* All hashing flows rely on `net.openhft.hashing.Access` to read primitive values from arrays, direct buffers, off-heap memory, or custom structures. `Access.byteOrder(input, desiredOrder)` returns a view that matches the algorithm’s expected endianness (`Access.java:273-308`). +* All hashing flows rely on `net.openhft.hashing.Access` to read primitive values from arrays, direct buffers, off-heap memory, or custom structures. `Access.byteOrder(input, desiredOrder)` returns a view that matches the algorithm's expected endianness (`Access.java:273-308`). * Concrete strategies cover heap arrays (`UnsafeAccess.INSTANCE`), `ByteBuffer` (`ByteBufferAccess`), `CharSequence` in native or explicit byte order (`CharSequenceAccess`), and compact Latin-1 backed strings (`CompactLatin1CharSequenceAccess`). * `UnsafeAccess` wraps `sun.misc.Unsafe` for zero-copy reads, falling back to legacy helpers when `getByte` or `getShort` are absent (e.g., pre-Nougat Android) (`UnsafeAccess.java:40-118`). * Reverse-order wrappers are generated automatically through `Access.newDefaultReverseAccess`, allowing algorithms to treat every source as little-endian while still accepting big-endian buffers (`Access.java:295-344`). diff --git a/src/main/docs/invariants-and-contracts.adoc b/src/main/docs/invariants-and-contracts.adoc index 57697e9..9b1fa32 100644 --- a/src/main/docs/invariants-and-contracts.adoc +++ b/src/main/docs/invariants-and-contracts.adoc @@ -6,7 +6,7 @@ toc::[] === Hash Interface Guarantees -* Every `LongHashFunction` and `LongTupleHashFunction` implementation treats primitives as if they were written to memory using the platform’s native byte order; the API therefore guarantees that `hashLong(v)` equals `hashLongs(new long[] {v})` and similar array forms (`LongHashFunction.java`, `LongTupleHashFunction.java`). +* Every `LongHashFunction` and `LongTupleHashFunction` implementation treats primitives as if they were written to memory using the platform's native byte order; the API therefore guarantees that `hashLong(v)` equals `hashLongs(new long[] {v})` and similar array forms (`LongHashFunction.java`, `LongTupleHashFunction.java`). * All bundled algorithms normalise multi-byte reads to little-endian before mixing, so the same input bytes produce identical hashes on big- and little-endian machines. Performance may differ, but results must not (`CityAndFarmHash_1_1.java`, `XxHash.java`, `XXH3.java`, `WyHash.java`, `MetroHash.java`, `MurmurHash_3.java`). * `hash(Object, Access, long off, long len)` assumes the addressed region is contiguous and valid for the requested byte count. @@ -27,7 +27,7 @@ Alternative `Access` implementations should document whether they permit null ba === Result Buffer Handling -* `LongTupleHashFunction.hash*(…, long[] result)` requires a pre-sized buffer created via `newResultArray()`. +* `LongTupleHashFunction.hash*(..., long[] result)` requires a pre-sized buffer created via `newResultArray()`. The method throws `NullPointerException` for null buffers and `IllegalArgumentException` for undersized buffers; the helper checks are centralised in `DualHashFunction` (`DualHashFunction.java:12-74`). * The allocation-free path is only honoured when callers reuse buffers. The overloads that return `long[]` will always allocate exactly one new array per call by design (`LongTupleHashFunction.java:70-118`). @@ -41,7 +41,7 @@ Several implementations expose singleton seedless instances via `readResolve`, e === String Handling -* `hashChars` and `hash(CharSequence…)` delegate to `Util.VALID_STRING_HASH`, which inspects the running JVM to choose the correct memory layout strategy. +* `hashChars` and `hash(CharSequence...)` delegate to `Util.VALID_STRING_HASH`, which inspects the running JVM to choose the correct memory layout strategy. Altering char sequence hashing must preserve this runtime detection, or mixed HotSpot/OpenJ9 estates will diverge (`Util.java:29-63`, `ModernCompactStringHash.java`, `ModernHotSpotStringHash.java`, `HotSpotPrior7u6StringHash.java`). * Latin-1 compact strings are read through `CompactLatin1CharSequenceAccess`, which reinterprets the backing `byte[]` without allocating. Any change to string support must maintain zero-allocation access for both UTF-16 and compact encodings (`CompactLatin1CharSequenceAccess.java`). @@ -50,7 +50,7 @@ Any change to string support must maintain zero-allocation access for both UTF-1 * Methods that accept `byte[]` plus `off` and `len` use `Util.checkArrayOffs` for bounds validation. Negative lengths or offsets, or slices that extend past the array end, raise `IndexOutOfBoundsException` immediately (`Util.java:70-77`, `LongHashFunction.java:480-547`). -* ByteBuffer hashing honours the buffer’s position, limit, and order. +* ByteBuffer hashing honours the buffer's position, limit, and order. The implementation temporarily adjusts `Buffer` state to satisfy IBM JDK 7 quirks, then restores the original markers (`LongHashFunction.java:392-470`, `LongHashFunctionTest.java:120-176`). === Thread Safety