Skip to content

fix(fxconfig): make dropped notifications observable with logging and counter#218

Open
MayankSharmaCSE wants to merge 1 commit into
hyperledger:mainfrom
MayankSharmaCSE:fix/notifications-silent-drop
Open

fix(fxconfig): make dropped notifications observable with logging and counter#218
MayankSharmaCSE wants to merge 1 commit into
hyperledger:mainfrom
MayankSharmaCSE:fix/notifications-silent-drop

Conversation

@MayankSharmaCSE

Copy link
Copy Markdown
Contributor

Type of change

Description

The notification dispatcher in NotificationClient.listen (tools/fxconfig/internal/client/notifications.go:223–228) used a non-blocking send with a bare // message dropped comment in the default branch. Any status event that could not be delivered to a subscriber's channel was silently discarded - no log line, no counter, no error returned to the caller.

This is particularly harmful because the txID is removed from the subscribers map before the send is attempted, making every drop unrecoverable. From the caller's perspective the drop is indistinguishable from a genuine committer timeout, corrupting any retry or SLA decision built on top.

Changes:

  • Added a droppedNotifications atomic.Uint64 field to NotificationClient and an exported DroppedNotifications() uint64 accessor for diagnostics and tests.
  • Each drop now emits a WARN-level log via the existing flogging "client" logger, including the txID and status.
  • Extracted the dispatch loop from listen into two focused methods collectNotifications and dispatchNotifications and promoted notificationCall to package scope with a txID field, making the drop path directly unit-testable.
  • Replaced the legacy fmt.Printf listener-error line with logger.Errorf (the commented-out logger.Errorf directly beneath it already showed the intent).

Copilot AI review requested due to automatic review settings May 1, 2026 09:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves observability of dropped transaction status notifications in fxconfig’s notification client, and also changes TLS configuration behavior to be secure-by-default across services.

Changes:

  • Add drop observability to NotificationClient via WARN logging and an atomic dropped-notifications counter, and refactor dispatching for unit testability.
  • Flip TLS default to enabled-by-default, update tests/config samples accordingly, and warn in the CLI when TLS is explicitly disabled.
  • Update docs to describe the new TLS default and the resulting validation behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/fxconfig/internal/client/notifications.go Refactors notification dispatch and adds dropped-notification logging + counter.
tools/fxconfig/internal/client/notifications_test.go Adds unit tests for the refactored collection/dispatch paths and drop counting.
tools/fxconfig/internal/config/config.go Changes TLS default behavior to enabled-by-default via struct defaults/normalization.
tools/fxconfig/internal/config/config_test.go Updates TLS normalization test expectations for the new default.
tools/fxconfig/internal/config/load_test.go Expands TLS-related load tests, including a new “enabled by default” test.
tools/fxconfig/internal/cli/v1/root.go Adds a CLI warning when any service has TLS disabled.
tools/fxconfig/integration/helpers_test.go Updates integration test config generation to explicitly disable TLS.
tools/fxconfig/docs/README.md Documents secure-by-default TLS behavior and migration guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +255 to +267
// goroutine, but every dropped event is logged and counted — the txID has
// already been removed from the subscribers map, so a drop is unrecoverable
// and must be observable.
func (n *NotificationClient) dispatchNotifications(notifications []notificationCall) {
for _, c := range notifications {
select {
case c.receiverQueue <- c.status:
default:
n.droppedNotifications.Add(1)
logger.Warnf(
"notification dropped (unrecoverable): txID=%s status=%d (subscriber buffer full or receiver gone)",
c.txID, c.status,
)

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatchNotifications logs a WARN for every dropped notification. Under sustained backpressure this can generate extremely high log volume and become a performance/operational issue. Consider rate-limiting/sampling this log (or logging periodic summaries keyed off droppedNotifications) while still incrementing the counter for full fidelity diagnostics.

Suggested change
// goroutine, but every dropped event is logged and counted — the txID has
// already been removed from the subscribers map, so a drop is unrecoverable
// and must be observable.
func (n *NotificationClient) dispatchNotifications(notifications []notificationCall) {
for _, c := range notifications {
select {
case c.receiverQueue <- c.status:
default:
n.droppedNotifications.Add(1)
logger.Warnf(
"notification dropped (unrecoverable): txID=%s status=%d (subscriber buffer full or receiver gone)",
c.txID, c.status,
)
// goroutine. Dropped events are always counted, and warning logs are sampled
// to avoid excessive log volume under sustained backpressure. The txID has
// already been removed from the subscribers map, so a drop is unrecoverable
// and must remain observable.
func (n *NotificationClient) dispatchNotifications(notifications []notificationCall) {
const droppedNotificationLogEvery = 100
for _, c := range notifications {
select {
case c.receiverQueue <- c.status:
default:
dropped := n.droppedNotifications.Add(1)
if dropped == 1 || dropped%droppedNotificationLogEvery == 0 {
logger.Warnf(
"notification dropped (unrecoverable): droppedNotifications=%d latestTxID=%s status=%d (subscriber buffer full or receiver gone)",
dropped, c.txID, c.status,
)
}

Copilot uses AI. Check for mistakes.
Comment on lines 63 to 64
Enabled *bool `mapstructure:"enabled" yaml:"enabled,omitempty" desc:"Enable/disable TLS" default:"true"`
ClientKeyPath string `mapstructure:"clientKey" yaml:"clientKey,omitempty" desc:"Path to TLS client private key"`

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR’s title/description focuses on notification drop observability, but this change flips the default TLS behavior to enabled-by-default (a breaking behavioral change that will affect existing configs and validation). Please either update the PR description/title to explicitly call out the TLS default change and migration implications, or split the TLS work into a separate PR.

Copilot uses AI. Check for mistakes.
Comment on lines 70 to 77
// Normalize ensures the TLS config has an explicit enabled flag.
// Sets enabled to false if not specified.
// Sets enabled to true if not specified.
func (c *TLSConfig) Normalize() {
if c.Enabled == nil {
enabled := false
enabled := true
c.Enabled = &enabled
}
}

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalize now defaults Enabled to true, but IsEnabled still returns false when Enabled is nil. That makes the effective default depend on whether callers remembered to run Normalize/ResolveTLS, which is easy to miss in programmatic uses. Consider aligning IsEnabled with the new secure-by-default behavior (or documenting clearly that callers must normalize before checking).

Copilot uses AI. Check for mistakes.
The notification dispatcher in NotificationClient.listen used a bare
non-blocking send with a  comment, silently losing
status events whenever the subscriber buffer was full or the receiver
had given up. Because the txID is removed from the subscribers map
before delivery is attempted, a dropped event is unrecoverable — and
indistinguishable from a genuine committer timeout from the caller's
perspective.

Signed-off-by: mayanksharmaCSE <mayanksharmacse1@gmail.com>
@MayankSharmaCSE MayankSharmaCSE force-pushed the fix/notifications-silent-drop branch from 07b1b8d to 3c5c794 Compare May 1, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: Silent notification drop under backpressure in NotificationClient

2 participants