| From: | CharSyam <charsyam(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA |
| Date: | 2026-04-13 09:52:57 |
| Message-ID: | CAMrLSE7mjAgWGykBdRM1Etiw6h=YopAp_9hrf4qOumverqab-Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I ran a benchmark on the patch *Reduce pg_class scans in GRANT/REVOKE
ON ALL TABLES IN SCHEMA* (collapses 5 per-relkind pg_class heap scans
into 1 scan distributed into per-relkind buckets). Summary below.(It took
much time to tests)
## Assumptions
- Two builds of PostgreSQL 19devel from the same source tree (one
patched, one at master tip), identical compile flags, separate
--prefix.
- Separate data directories, run sequentially on an otherwise idle
host.
- GUCs: shared_buffers=2GB, max_locks_per_transaction=100000,
fsync=off, synchronous_commit=off, full_page_writes=off,
autovacuum=off.
- bench_s schema contains N empty tables (CREATE TABLE t_i()).
- Measured operations:
GRANT SELECT ON ALL TABLES IN SCHEMA bench_s TO bench_role
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA bench_s FROM bench_role
- Best of 3 runs reported (seconds).
- Two scenarios:
A. Clean catalog — VACUUM FULL pg_class; VACUUM FULL pg_attribute
immediately before measurement. pg_class
densely packed.
B. Bloated catalog — pre-bloat phase: GRANT+REVOKE on all N tables
repeated C cycles, no VACUUM. Both patched
and master operate on catalogs with identical
relpages and n_dead_tup.
## Results — Scenario A (Clean catalog)
macOS (Apple Silicon), best of 3, seconds:
ntables | GRANT patched | GRANT master | REVOKE patched | REVOKE master
--------+---------------+--------------+----------------+--------------
20,000 | 0.116 | 0.115 | 0.231 | 0.226
40,000 | 0.250 | 0.250 | 0.460 | 0.460
100,000 | 0.730 | 0.678 | 1.193 | 1.193
Honestly, there is no measurable performance difference in the clean
state. Patched and master are statistically indistinguishable within
run-to-run noise. This matches the design of the patch: when pg_class
is densely packed, repeating a small seq scan five times is cheap, so
collapsing it into one has nothing meaningful to save. The patch adds
no overhead either — worst case is a tie.
## Results — Scenario B (Bloated catalog)
### Linux x86_64, C=20, best of 3, seconds
ntables | dead_tup | GRANT patched | GRANT master | Δ | REVOKE
patched | REVOKE master | Δ
----------+-----------+---------------+--------------+---------+----------------+---------------+---------
10,000 | 0 | 0.0924 | 0.0935 | −1.2 % |
0.1668 | 0.1696 | −1.6 %
20,000 | 109,825 | 0.2027 | 0.2069 | −2.0 % |
0.3381 | 0.3533 | −4.3 %
50,000 | 329,468 | 0.5555 | 0.5895 | −5.8 % |
0.8901 | 0.9371 | −5.0 %
100,000 | 879,311 | 1.1732 | 1.1968 | −2.0 % |
1.8808 | 1.9555 | −3.8 %
200,000 | 1,978,925 | 2.2188 | 2.3470 | −5.5 % |
3.7290 | 3.9064 | −4.5 %
500,000 | 4,178,604 | 6.0260 | 6.6663 | −9.6 % |
9.8162 | 10.2169 | −3.9 %
1,000,000 | 9,678,399 | 12.9241 | 14.7657 | −12.5 % |
24.8893 | 28.7566 | −13.4 %
### macOS (Apple Silicon), C=20 (C=10 at 1M), best of 3, seconds
ntables | dead_tup | GRANT patched | GRANT master | Δ | REVOKE
patched | REVOKE master | Δ
----------+-----------+---------------+--------------+--------+----------------+---------------+--------
20,000 | 299,960 | 0.168 | 0.163 | +3 % |
0.260 | 0.278 | −6 %
40,000 | 519,601 | 0.307 | 0.307 | 0 % |
0.552 | 0.564 | −2 %
100,000 | 959,268 | 0.784 | 0.934 | −16 % |
1.405 | 1.419 | ~0 %
200,000 | 2,058,886 | 1.787 | 1.878 | −5 % |
2.745 | 2.849 | −4 %
500,000 | 4,258,565 | 4.727 | 5.197 | −9 % |
7.126 | 7.908 | −10 %
1,000,000 | 9,758,364 | 10.977 | 11.126 | −1 % |
19.473 | 20.759 | −6 %
Negative Δ = patched faster. Under catalog bloat the patch produces a
consistent, reproducible improvement on both operating systems.
Happy to share the bench scripts and raw logs on request.
Thanks,
charsyam
2026년 4월 13일 (월) 오전 9:43, Michael Paquier <michael(at)paquier(dot)xyz>님이 작성:
> On Sun, Apr 12, 2026 at 04:22:24PM +0900, CharSyam wrote:
> > Benchmark
> > ---------
> > This is a targeted micro-optimization, not a dramatic speedup.
> > With 10,000 tables in a single schema (pg_class ~10,452 rows),
> > running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
> > (6 iterations, first dropped as warmup), I measured a consistent
> > ~15% reduction in end-to-end time:
> >
> > baseline patched delta
> > GRANT (avg) 88.2 ms 75.9 ms -14%
> > REVOKE (avg) 134.9 ms 115.7 ms -14%
>
> I am pretty sure that there are users with millions of relations in a
> single schema that could benefit from that. At least that would not
> be surprising with partitioning these days, and foreign tables. What
> kind of numbers do you get if you bump up the number of digits for
> these tests. Let's say a comparison based on a few million relations
> at least?
>
> The change you are proposing looks simple enough, quickly skimming
> through the patch. There may be more optimizations doable here, I
> have not looked at that, still I tend to like such micro-optimization
> proposals as they provide a silent benefit.
> --
> Michael
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Florents Tselai | 2026-04-13 09:56:56 | More jsonpath methods: translate, split, join |
| Previous Message | Benoit Lobréau | 2026-04-13 09:44:17 | Re: Logging parallel worker draught |