Skip to content

[*] migrate to time-only partitioning in pg sink#1408

Open
0xgouda wants to merge 12 commits into
masterfrom
simplify-pg-partitioing-scheme
Open

[*] migrate to time-only partitioning in pg sink#1408
0xgouda wants to merge 12 commits into
masterfrom
simplify-pg-partitioing-scheme

Conversation

@0xgouda

@0xgouda 0xgouda commented May 14, 2026

Copy link
Copy Markdown
Collaborator
  • Simplified the partitioning schema by transitioning from 3-level (metric-dbname-time) to 2-level (metric-time) partitioning
    • This solves the performance issues caused by having thousands of partitions for setups with extremely large numbers of monitored databases
  • Removed drop_source_partitions function
  • Implemented the migration from LIST-partitioned metric to RANGE time partitioning by a simple drop-on-write mechanism that drops the entire metric table if detected to have list partitioning, then recreates it appropriately.
    • This will result in losing all measurements data on upgrade to v6 - we probably should put a warning for that on the release page-.
  • Change --partition-interval 's default to 1 day

Closes: #1409
Closes: #1392


  • TODO: do extensive testing on the staging vm

@0xgouda 0xgouda self-assigned this May 14, 2026
@0xgouda 0xgouda added refactoring Something done as it should've been done from the start sinks Where and how to store monitored data labels May 14, 2026
@0xgouda 0xgouda force-pushed the simplify-pg-partitioing-scheme branch from d75322a to 5064859 Compare May 14, 2026 15:00
@coveralls

coveralls commented May 14, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 26195756891

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage decreased (-0.06%) to 85.64%

Details

  • Coverage decreased (-0.06%) from the base build.
  • Patch coverage: 20 uncovered changes across 1 file (21 of 41 lines covered, 51.22%).
  • 7 coverage regressions across 2 files.

Uncovered Changes

File Changed Covered %
internal/sinks/postgres.go 41 21 51.22%

Coverage Regressions

7 previously-covered lines in 2 files lost coverage.

File Lines Losing Coverage Coverage
internal/reaper/reaper.go 4 52.79%
internal/sinks/postgres.go 3 67.18%

Coverage Stats

Coverage Status
Relevant Lines: 5418
Covered Lines: 4640
Line Coverage: 85.64%
Coverage Strength: 0.98 hits per line

💛 - Coveralls

Comment thread internal/sinks/sql/admin_schema.sql Outdated
Comment thread internal/sinks/sql/ensure_partition_postgres.sql Outdated
Comment thread internal/sinks/postgres.go Outdated
0xgouda added 10 commits May 15, 2026 19:27
…tric_time and simplify to 2-level partitioning
Metrics tables created by older versions used LIST partitioning on the dbname
column. Since we now only use RANGE partitioning by time, we need to detect and
drop any LIST-partitioned table so the standard logic can recreate it with
the new partitioning strategy.
@0xgouda 0xgouda force-pushed the simplify-pg-partitioing-scheme branch from 7754158 to 6a7c490 Compare May 15, 2026 16:27
@0xgouda 0xgouda marked this pull request as ready for review May 15, 2026 16:27
@0xgouda 0xgouda changed the title WIP: Simplify pg partitioing scheme [*] migrate to time-only partitioning in pg sink May 15, 2026
@0xgouda 0xgouda requested a review from pashagolub May 15, 2026 16:31
@0xgouda

0xgouda commented May 15, 2026

Copy link
Copy Markdown
Collaborator Author

@pashagolub

  1. What are your thoughts on the drop-on-write trick I used? Do you think it may cause any problems?

  2. Also, I am thinking of changing the defaults for --partition-interval and --retention to say 3 days/15 days to make the time partitions pruning more effective. What do you think?

    • As if we sticked to the current 7 days/14 days defaults, we will only have 2 partitions at a time, which I believe isn't that effective.

@pashagolub

Copy link
Copy Markdown
Collaborator

What is drop on write?
I'd go for 1d partition and 2w retention

Comment thread internal/sinks/sql/ensure_partition_postgres.sql Outdated
@0xgouda 0xgouda force-pushed the simplify-pg-partitioing-scheme branch from 97437cf to 84b41b8 Compare May 15, 2026 17:27
instead of the old drop-on-write behaviour
EXECUTE format('COMMENT ON TABLE public.%I IS $$pgwatch-generated-metric-lvl$$', metric);
END IF;

-- 2. level

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this comment?

&migrator.Migration{
Name: "01409 Switch to time-only partitioning",
Func: func(ctx context.Context, tx pgx.Tx) error {
_, err := tx.Exec(ctx, `SELECT admin.drop_all_metric_tables()`)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud.. What if we make a copy of current measurements with ALTER SCHEMA public RENAME TO backup, then create all new tables with new partitioning and then INSERT INTO public.<new-table> SELECT * FROM backup.<old-table>? Then we keep the data and users are happy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Something done as it should've been done from the start sinks Where and how to store monitored data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch to time-only partitioning Support metric-time only partitioning for postgres sink

3 participants