| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array |
| Date: | 2025-11-07 05:02:10 |
| Message-ID: | CABPTF7WaoQig2bHbFwBcc2bvGciXygTGXe1RzKakDfjR_=U_WQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
> I conducted several benchmark tests on this patch, and here are some
> observations:
>
> 1) Workloads & settings
> # Workload MIXED - Realistic mix of DDL and DML
> create_mixed_workload() {
> local ROOT="$1"
> cat >"$ROOT/mixed_ddl.sql" <<'SQL'
> -- DDL workload (catalog changes)
> DO $$
> DECLARE
> tbl text := format('t_%s_%s',
> current_setting('application_name', true),
> floor(random()*1e9)::int);
> BEGIN
> EXECUTE format('CREATE TABLE %I (id int, data text) ON COMMIT DROP', tbl);
> EXECUTE format('INSERT INTO %I VALUES (1, ''x'')', tbl);
> END$$;
>
> SQL
> cat >"$ROOT/mixed_dml.sql" <<'SQL'
> -- DML workload (no catalog changes)
> INSERT INTO app_data (id, data)
> VALUES (floor(random()*1e6)::int, repeat('x', 100))
> ON CONFLICT (id) DO UPDATE SET data = repeat('y', 100);
> SQL
> }
>
> # Workload CONTROL - Pure DML, no catalog changes
> create_control_workload() {
> local ROOT="$1"
> cat >"$ROOT/control.sql" <<'SQL'
>
> -- Pure DML, no catalog changes
> -- Should show no difference between baseline and patched
> INSERT INTO control_data (id, data)
> VALUES (floor(random()*1e6)::int, repeat('x', 100))
> ON CONFLICT (id) DO UPDATE SET data = repeat('y', 100);
> SQL
> }
>
> # Start workload 100 clients, duration 40s, 1 run
> local pids=()
> for ((c=1; c<=CLIENTS; c++)); do
> (
> local end=$(($(date +%s) + DURATION))
> while (( $(date +%s) < end )); do
> "$psql" -h 127.0.0.1 -p "$PORT" -d postgres \
> -v ON_ERROR_STOP=0 \
> -f "$SQL_FILE" >/dev/null 2>&1 || true
> done
> ) &
> pids+=($!)
> done
>
> "SELECT pg_create_logical_replication_slot('$SLOT', 'test_decoding',
> false, true);" \
>
> "SELECT COALESCE(total_txns, 0), COALESCE(total_bytes, 0) FROM
> pg_stat_replication_slots WHERE slot_name='$SLOT';")
>
> shared_buffers = '4GB'
> wal_level = logical
> max_replication_slots = 10
> max_wal_senders = 10
> log_min_messages = warning
> max_connections = 600
> autovacuum = off
> checkpoint_timeout = 15min
> max_wal_size = 4GB
>
> 2) Performance results
>
> === Workload: mixed ===
> Client commits/sec:
> Baseline: 7845.82 commits/sec
> Patched: 7747.88 commits/sec
>
> Decoder throughput (from pg_stat_replication_slots):
> Baseline: 750.10 txns/sec (646.80 MB/s)
> Patched: 2440.03 txns/sec (2052.32 MB/s)
>
> Transaction efficiency (decoded vs committed):
> Baseline: 313833 committed → 30004 decoded (9.56%)
> Patched: 309915 committed → 97601 decoded (31.49%)
>
> Total decoded (all reps):
> Baseline: 30004 txns (25872.01 MB)
> Patched: 97601 txns (82092.83 MB)
>
> Decoder improvement: +225.00% (txns/sec)
> Decoder improvement: +217.00% (MB/s)
> Efficiency improvement: +21.93% points (more transactions decoded per committed)
>
> === Workload: control ===
> Client commits/sec:
> Baseline: 6756.80 commits/sec
> Patched: 6643.95 commits/sec
>
> Decoder throughput (from pg_stat_replication_slots):
> Baseline: 3373.28 txns/sec (0.29 MB/s)
> Patched: 3316.15 txns/sec (0.28 MB/s)
>
> Transaction efficiency (decoded vs committed):
> Baseline: 270272 committed → 134931 decoded (49.92%)
> Patched: 265758 committed → 132646 decoded (49.91%)
>
> Total decoded (all reps):
> Baseline: 134931 txns (11.56 MB)
> Patched: 132646 txns (11.37 MB)
>
> 3) Potential regression
>
> The potential regression point could be before the slot reaches the
> CONSISTENT state, particularly when building_full_snapshot is set to
> true. In this phase, all transactions including those that don’t
> modify the catalog — must be added to the committed.xip array. These
> XIDs don’t require later snapshot builds or sorting, so the
> batch-insert logic increases the per-insert cost from O(1) to O(m + n)
> without providing a direct benefit.
>
> However, the impact of this regression could be limited. The system
> remains in the pre-CONSISTENT phase only briefly during initial
> snapshot building, and the building_full_snapshot = true case is rare,
> mainly used when creating replication slots with the EXPORT_SNAPSHOT
> option.
>
> Once the slot becomes CONSISTENT, only catalog-modifying transactions
> are tracked in committed.xip, and the patch reduces overall
> snapshot-building overhead by eliminating repeated full-array sorts.
>
> We could also adopt a two-phase approach — keeping the current
> behavior before reaching the CONSISTENT state and maintaining a sorted
> array only after that point. This would preserve the performance
> benefits while avoiding potential regressions. However, it would
> introduce additional complexity and potential risks in handling the
> state transitions.
>
>
> if (builder->state < SNAPBUILD_CONSISTENT)
> {
> /* ensure that only commits after this are getting replayed */
> if (builder->start_decoding_at <= lsn)
> builder->start_decoding_at = lsn + 1;
>
> /*
> * If building an exportable snapshot, force xid to be tracked, even
> * if the transaction didn't modify the catalog.
> */
> if (builder->building_full_snapshot)
> {
> needs_timetravel = true;
> }
> }
>
> It also occurs to me that we can optimize the purge operation by using
> two binary searches to locate the interval to keep when the array is
> sorted.
The result of mixed workload in this email is that of the pure DDL
workload. Sorry for the noise here.
I started a new thread for the discussion of maintaining a sorted
committed.xip array. Please
see [1].
Best,
Xuneng
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Japin Li | 2025-11-07 05:06:26 | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
| Previous Message | Xuneng Zhou | 2025-11-07 04:55:57 | Optimize SnapBuild by maintaining committed.xip in sorted order |