Make FakeValuesService expression caches static to share across Faker instances#1819
Make FakeValuesService expression caches static to share across Faker instances#1819mferretti wants to merge 5 commits into
Conversation
PR Summary
|
There was a problem hiding this comment.
Pull request overview
Adds an opt-in way to share a single FakeValuesService across threads to avoid repeated YAML loading, plus a Faker factory that pairs the shared service with a per-thread Random. New concurrency tests exercise the singleton identity, behavior under load, and parity with a normally constructed Faker.
Changes:
- New
FakeValuesService.getShared(Locale)lazily caches per-locale instances in aConcurrentHashMap. - New
Faker.withSharedService(FakeValuesService, Locale, Random)factory wires a shared service with a per-threadRandomService. - New
SharedFakeValuesServiceTestcovering singleton identity, concurrent usage, and output parity.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/main/java/net/datafaker/service/FakeValuesService.java | Adds SHARED_INSTANCES map and getShared(Locale) factory. |
| src/main/java/net/datafaker/Faker.java | Adds withSharedService static factory for thread-local Fakers reusing a shared service. |
| src/test/java/net/datafaker/SharedFakeValuesServiceTest.java | New tests for concurrent identity, no-errors under load, and output parity. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1819 +/- ##
============================================
- Coverage 92.43% 92.22% -0.21%
- Complexity 3512 3519 +7
============================================
Files 343 343
Lines 6962 7009 +47
Branches 684 701 +17
============================================
+ Hits 6435 6464 +29
- Misses 360 369 +9
- Partials 167 176 +9 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@mferretti Please don't take my words as critics, I am just researching the problem. I feel that this solution heavily depends on the internal structure of DF. Ideally, the end-user should use only The bottleneck is redundant YAML loading, right? If we want to avoid redundant YAML loading, then shouldn't we just extract the YML loading to some smaller class, and cache its results? |
|
@asolntsev , none taken :) YAML loading is not the bottleneck — it's already cached globally in You're right that exposing Would that make sense? If so, I can revise the PR along those lines :-) |
|
@mferretti Yes, it seems absolutely reasonable to make UPD I realized that it was previously static, and I made in non-static in commit 751ab16 :) That's an irony! :) The reason was memory leak. Static map Maybe we should investigate #1263 once again and find a better solution for it. Or just use some Cache-like map for |
|
@asolntsev thanks for pointing this out. I'll have a look at the bug that made you opt for non static and try to put everything together. it'll have to wait a couple of days as i am heading out and will be back tomorrow late evening/night. |
|
No rush, enjoy your weekend!! |
|
Hello @asolntsev and @kingthorin
What I propose, but i'd need some validation from you guys, is :
On point 2, Obviously, I can recycle this pr or close this one and open a new one. Looking forward for your input ! |
|
Hi @mferretti !
|
Replaces the per-instance REGEXP2SUPPLIER_MAP with a two-level static+instance design: - L1 RECIPE_MAP (static): keyed by CacheKey(String exp, SingletonLocale locale) — no per-Faker references. Stores context-free "recipe" resolvers shared across all Fakers with the same locale. Fixes the memory leak that blocked making this cache static (issue datafaker-net#1263): the old RegExpContext key held a strong reference to the BaseFaker root, keeping every Faker alive indefinitely. - L2 instanceMap (per-instance): stores resolvers pre-bound to this Faker's concrete provider instances for fast repeated calls within the same Faker. ValueResolver gains materialize(ProviderRegistration) and cacheable() to support the two-level contract. New resolver types (ProviderMethodResolver, ChainedCoercedResolver, InstanceMethodResolver, etc.) are context-free at L1 and pre-bound at L2. expression2generex (RgxGen compiled-regex cache) is also made static Adds SharedFakeValuesServiceTest covering concurrent multi-Faker usage and determinism under caching.
|
Hi. The original PR added a public This revised PR implements exactly that — but with the additional structural work required to do it correctly (naively making the map static brings back the memory leak from issue #1263 / PR #1271).
Also, I added an L2 cache: L1 — static L2 — per-instance Resolution order: L2 hit → L1 hit (materialize + store L2) → full discovery (store L1 + materialize L2). The benchmarks tests i did locally:
In all honesty, I would suggest that you run your tests too as the performance gain, on my side, is clear with complex regex and a property based test scenario, but non existent with simple regex; plus I have added the L2 cache complexity. |
…owth note - SafeFetchResolver.resolve(): guard against null root - resolveFromMethodOn(): guard null root before NamedProviderCoercedResolver branch - resolveExpression inner: guard null root before dotIndex provider lookup - RECIPE_MAP: expand Javadoc noting bounded-in-practice growth - accessor(): fix dead-code `methods == null` check → `methods.isEmpty()` - accessor(): fix two "Didn't accessor" log typos → "Didn't find accessor" - SharedFakeValuesServiceTest: call shutdownNow() on awaitTermination timeout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ProviderMethodResolver.resolve(): return null when root is null - ProviderMethodResolver.materialize(): return this when root is null - resolveExpression outer loop: skip L2 caching when root is null, call recipe directly without materializing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RootCoercedResolver, NamedProviderCoercedResolver, and ChainedCoercedResolver all called root.getProvider() / fakerAccessor.invoke(root) without guarding against null root in both resolve() and materialize(). Pattern matches the existing guards on SafeFetchResolver and ProviderMethodResolver. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem
Every
new Faker()starts with empty expression-resolution and regex-compilation caches. In workloads that create many short-lived Fakers — property-based testing frameworks (jqwik, QuickCheck), seeded-per-record data generation — the warm-up cost is paid repeatedly, even when Fakers share the same locale.Why "just make the map static" needed structural work
The existing
REGEXP2SUPPLIER_MAPusedRegExpContext(String exp, BaseFaker root, FakerContext context)as its cache key. Making it static would re-introduce the memory leak fixed in #1263 / PR #1271: therootfield holds a strong reference to theFakerinstance, so everynew Faker()adds a unique entry the GC can never collect.Fixing this requires separating what method to call (shareable, context-free) from which provider instance to call it on (per-Faker, bound at first use).
What changed
Two-level expression cache:
RECIPE_MAP(static) — keyed byCacheKey(String exp, SingletonLocale locale). No per-Faker references; safe to share globally. Stores context-free "recipe" resolvers.instanceMap(per-instance) — stores "materialized" resolvers pre-bound to this Faker's concrete provider instances. Subsequent calls on the same Faker skipgetProvider()entirely.ValueResolvergainsmaterialize(ProviderRegistration root)andcacheable()to support the two-level contract. New resolver types (ProviderMethodResolver,ChainedCoercedResolver,InstanceMethodResolver, etc.) are context-free at L1 and pre-bound at L2.expression2generexmade static — theRgxGencompiled-regex cache was per-instance. This is the primary performance win: every new Faker recompiled the same patterns; now compilation happens once globally.No public API changes
FakeValuesServiceremains internal.Fakerconstructors are unchanged. No new public methods.Tests
SharedFakeValuesServiceTestcovers: