Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions src/main/docs/algorithm-profiles.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ toc::[]
Factories ::
`LongHashFunction.city_1_1()`, `.city_1_1(long)`, `.city_1_1(long, long)` (`LongHashFunction.java:53-115`).
Implementation ::
`net.openhft.hashing.CityAndFarmHash_1_1` ports Googles CityHash64 v1.1 (`CityAndFarmHash_1_1.java`).
`net.openhft.hashing.CityAndFarmHash_1_1` ports Google's CityHash64 v1.1 (`CityAndFarmHash_1_1.java`).
Key traits ::
* Normalises inputs to little-endian and forwards short-length cases to specialised mix routines (13, 47, 816 byte fast paths).
* Normalises inputs to little-endian and forwards short-length cases to specialised mix routines (1-3, 4-7, 8-16 byte fast paths).
* Produces identical output across host endianness; big-endian incurs the expected byte swapping cost.
* Provides seedless, single-seed, and dual-seed variants mirroring the upstream API.

Expand All @@ -31,17 +31,17 @@ Key traits ::
Factories ::
`LongHashFunction.farmUo()`, `.farmUo(long)`, `.farmUo(long, long)` (`LongHashFunction.java:181-243`).
Implementation ::
Also hosted in `CityAndFarmHash_1_1`, which covers the 1.1 updates longer pipelines.
Also hosted in `CityAndFarmHash_1_1`, which covers the 1.1 update's longer pipelines.
Key traits ::
* Maintains parity with Googles C{pp} release for test vectors.
* Endianness neutral: always routes through an `Access` view that matches the algorithms little-endian assumptions.
* Maintains parity with Google's C{pp} release for test vectors.
* Endianness neutral: always routes through an `Access` view that matches the algorithm's little-endian assumptions.

=== MurmurHash3

Factories ::
`LongHashFunction.murmur_3()`, `.murmur_3(long)` for 64-bit (`LongHashFunction.java:245-268`); `LongTupleHashFunction.murmur_3()`, `.murmur_3(long)` for 128-bit (`LongTupleHashFunction.java:35-69`).
Implementation ::
`net.openhft.hashing.MurmurHash_3` adapts Austin Applebys x64 variants.
`net.openhft.hashing.MurmurHash_3` adapts Austin Appleby's x64 variants.
It extends `DualHashFunction` so the 128-bit engine also exposes the low 64 bits through `LongHashFunction`.
Key traits ::
* Little-endian canonicalisation via `Access.byteOrder`.
Expand All @@ -54,7 +54,7 @@ Factories ::
Implementation ::
`net.openhft.hashing.XxHash` ports the official XXH64 reference and keeps the unsigned prime constants as signed Java longs.
Key traits ::
* Uses four-lane accumulation for 32 byte inputs, matching upstream behaviour bit-for-bit.
* Uses four-lane accumulation for >=32 byte inputs, matching upstream behaviour bit-for-bit.
* Applies the canonical avalanche round in `XxHash.finalize` for all lengths.
* Seeded and seedless instances differ only by the stored `seed()` override; serialisation preserves both forms.

Expand All @@ -67,7 +67,7 @@ Implementation ::
`net.openhft.hashing.XXH3` keeps the FARSH-derived 192 byte secret and streaming logic.
It defines distinct entry points for 64-bit, 128-bit, and low-64-bit projections.
Key traits ::
* Optimises for short messages with dedicated 13, 48, 916, 17128, and 129240 byte paths.
* Optimises for short messages with dedicated 1-3, 4-8, 9-16, 17-128, and 129-240 byte paths.
* Uses `UnsafeAccess.INSTANCE.byteOrder(null, LITTLE_ENDIAN)` once to avoid per-call adapter allocation.
* The 128-bit variant reuses the same mixing core; exposing the low 64 bits avoids extra copies for callers that only need a single `long`.

Expand All @@ -76,19 +76,19 @@ Key traits ::
Factories ::
`LongHashFunction.wy_3()`, `.wy_3(long)` (`LongHashFunction.java:343-369`).
Implementation ::
`net.openhft.hashing.WyHash` mirrors Wang Yis version 3 reference, including the `_wymum` 128-bit multiply-fold helper built on `Maths.unsignedLongMulXorFold`.
`net.openhft.hashing.WyHash` mirrors Wang Yi's version 3 reference, including the `_wymum` 128-bit multiply-fold helper built on `Maths.unsignedLongMulXorFold`.
Key traits ::
* Supports streaming chunks up to 256 bytes per loop iteration; beyond that it accumulates in 32 byte strides.
* Handles 3, 8, 16, 24, 32 byte inputs with the same branching as the C code.
* Handles <=3, <=8, <=16, <=24, <=32 byte inputs with the same branching as the C code.
* Maintains deterministic output across architectures while acknowledging the performance hit on big-endian systems.

=== MetroHash (metrohash64_2)

Factories ::
`LongHashFunction.metro()`, `.metro(long)` (`LongHashFunction.java:371-389`).
Implementation ::
`net.openhft.hashing.MetroHash` implements the 64-bit metrohash variant with the `_2` initialisation vector, matching the original authors reference.
`net.openhft.hashing.MetroHash` implements the 64-bit metrohash variant with the `_2` initialisation vector, matching the original author's reference.
Key traits ::
* Performs four-lane unrolled mixing for 32 byte inputs and cascades down to 16, 8, 4, 2, and 1 byte tails.
* Performs four-lane unrolled mixing for >=32 byte inputs and cascades down to 16, 8, 4, 2, and 1 byte tails.
* Uses deterministic finalisation (`MetroHash.finalize`) shared by scalar and streaming paths.
* Seeded instances override `seed()` and cache the pre-hashed `hashVoid()` constant to avoid re-computation.
2 changes: 1 addition & 1 deletion src/main/docs/architecture-overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ It currently delivers 128-bit MurmurHash3 and XXH3 outputs and mirrors the singl

=== Memory Access Abstractions

* All hashing flows rely on `net.openhft.hashing.Access<T>` to read primitive values from arrays, direct buffers, off-heap memory, or custom structures. `Access.byteOrder(input, desiredOrder)` returns a view that matches the algorithms expected endianness (`Access.java:273-308`).
* All hashing flows rely on `net.openhft.hashing.Access<T>` to read primitive values from arrays, direct buffers, off-heap memory, or custom structures. `Access.byteOrder(input, desiredOrder)` returns a view that matches the algorithm's expected endianness (`Access.java:273-308`).
* Concrete strategies cover heap arrays (`UnsafeAccess.INSTANCE`), `ByteBuffer` (`ByteBufferAccess`), `CharSequence` in native or explicit byte order (`CharSequenceAccess`), and compact Latin-1 backed strings (`CompactLatin1CharSequenceAccess`).
* `UnsafeAccess` wraps `sun.misc.Unsafe` for zero-copy reads, falling back to legacy helpers when `getByte` or `getShort` are absent (e.g., pre-Nougat Android) (`UnsafeAccess.java:40-118`).
* Reverse-order wrappers are generated automatically through `Access.newDefaultReverseAccess`, allowing algorithms to treat every source as little-endian while still accepting big-endian buffers (`Access.java:295-344`).
Expand Down
8 changes: 4 additions & 4 deletions src/main/docs/invariants-and-contracts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ toc::[]

=== Hash Interface Guarantees

* Every `LongHashFunction` and `LongTupleHashFunction` implementation treats primitives as if they were written to memory using the platforms native byte order; the API therefore guarantees that `hashLong(v)` equals `hashLongs(new long[] {v})` and similar array forms (`LongHashFunction.java`, `LongTupleHashFunction.java`).
* Every `LongHashFunction` and `LongTupleHashFunction` implementation treats primitives as if they were written to memory using the platform's native byte order; the API therefore guarantees that `hashLong(v)` equals `hashLongs(new long[] {v})` and similar array forms (`LongHashFunction.java`, `LongTupleHashFunction.java`).
* All bundled algorithms normalise multi-byte reads to little-endian before mixing, so the same input bytes produce identical hashes on big- and little-endian machines.
Performance may differ, but results must not (`CityAndFarmHash_1_1.java`, `XxHash.java`, `XXH3.java`, `WyHash.java`, `MetroHash.java`, `MurmurHash_3.java`).
* `hash(Object, Access, long off, long len)` assumes the addressed region is contiguous and valid for the requested byte count.
Expand All @@ -27,7 +27,7 @@ Alternative `Access` implementations should document whether they permit null ba

=== Result Buffer Handling

* `LongTupleHashFunction.hash*(, long[] result)` requires a pre-sized buffer created via `newResultArray()`.
* `LongTupleHashFunction.hash*(..., long[] result)` requires a pre-sized buffer created via `newResultArray()`.
The method throws `NullPointerException` for null buffers and `IllegalArgumentException` for undersized buffers; the helper checks are centralised in `DualHashFunction` (`DualHashFunction.java:12-74`).
* The allocation-free path is only honoured when callers reuse buffers.
The overloads that return `long[]` will always allocate exactly one new array per call by design (`LongTupleHashFunction.java:70-118`).
Expand All @@ -41,7 +41,7 @@ Several implementations expose singleton seedless instances via `readResolve`, e

=== String Handling

* `hashChars` and `hash(CharSequence)` delegate to `Util.VALID_STRING_HASH`, which inspects the running JVM to choose the correct memory layout strategy.
* `hashChars` and `hash(CharSequence...)` delegate to `Util.VALID_STRING_HASH`, which inspects the running JVM to choose the correct memory layout strategy.
Altering char sequence hashing must preserve this runtime detection, or mixed HotSpot/OpenJ9 estates will diverge (`Util.java:29-63`, `ModernCompactStringHash.java`, `ModernHotSpotStringHash.java`, `HotSpotPrior7u6StringHash.java`).
* Latin-1 compact strings are read through `CompactLatin1CharSequenceAccess`, which reinterprets the backing `byte[]` without allocating.
Any change to string support must maintain zero-allocation access for both UTF-16 and compact encodings (`CompactLatin1CharSequenceAccess.java`).
Expand All @@ -50,7 +50,7 @@ Any change to string support must maintain zero-allocation access for both UTF-1

* Methods that accept `byte[]` plus `off` and `len` use `Util.checkArrayOffs` for bounds validation.
Negative lengths or offsets, or slices that extend past the array end, raise `IndexOutOfBoundsException` immediately (`Util.java:70-77`, `LongHashFunction.java:480-547`).
* ByteBuffer hashing honours the buffers position, limit, and order.
* ByteBuffer hashing honours the buffer's position, limit, and order.
The implementation temporarily adjusts `Buffer` state to satisfy IBM JDK 7 quirks, then restores the original markers (`LongHashFunction.java:392-470`, `LongHashFunctionTest.java:120-176`).

=== Thread Safety
Expand Down
Loading