Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)
Date: 2022-04-08 14:52:07
Message-ID: CAEze2WgGexDM63dOvndLdAWwA6uSmSsc97jmrCuNmrF1JEDK7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 19 Mar 2022 at 01:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
> pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664
>
> pg_rewrite is the second biggest relation in an empty database already...

Yeah, that's not great. Thanks for nerd-sniping me into looking into
how views and pg_rewrite rules work, that was very interesting and I
learned quite a lot.

# Immediately potential, limited to progress views

I noticed that the CASE-WHEN (used in translating progress stage index
to stage names) in those progress reporting views can be more
efficiently described (althoug with slightly worse behaviour around
undefined values) using text array lookups (as attached). That
resulted in somewhat smaller rewrite entries for the progress views
(toast compression was good old pglz):

template1=# SELECT sum(octet_length(ev_action)),
SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE
ev_class::regclass::text LIKE '%progress%';

master:
sum | sum
-------+-------
97277 | 19956
patched:
sum | sum
-------+-------
77069 | 18417

So this seems like a nice improvement of 20% uncompressed / 7% compressed.

I tested various cases of phase number to text translations: `CASE ..
WHEN`; `(ARRAY[]::text[])[index]` and `('{}'::text[])[index]`. See
results below:

postgres=# create or replace view arrayliteral_view as select
(ARRAY['a','b','c','d','e','f']::text[])[index] as name from tst
s(index);
CREATE INDEX
postgres=# create or replace view stringcast_view as select
('{a,b,c,d,e,f}'::text[])[index] as name from tst s(index);
CREATE INDEX
postgres=# create or replace view split_stringcast_view as select
(('{a,b,' || 'c,d,e,f}')::text[])[index] as name from tst s(index);
CREATE VIEW
postgres=# create or replace view case_view as select case index when
0 then 'a' when 1 then 'b' when 2 then 'c' when 3 then 'd' when 4 then
'e' when 5 then 'f' end as name from tst s(index);
CREATE INDEX

postgres=# postgres=# select ev_class::regclass::text,
octet_length(ev_action), pg_column_size(ev_action) from pg_rewrite
where ev_class in ('arrayliteral_view'::regclass::oid,
'case_view'::regclass::oid, 'split_stringcast_view'::regclass::oid,
'stringcast_view'::regclass::oid);
ev_class | octet_length | pg_column_size
-----------------------+--------------+----------------
arrayliteral_view | 3311 | 1322
stringcast_view | 2610 | 1257
case_view | 5170 | 1412
split_stringcast_view | 2847 | 1350

It seems to me that we could consider replacing the CASE statements
with array literals and lookups if we really value our template
database size. But, as text literal concatenations don't seem to get
constant folded before storing them in the rules table, this rewrite
of the views would result in long lines in the system_views.sql file,
or we'd have to deal with the additional overhead of the append
operator and cast nodes.

# Future work; nodeToString / readNode, all rewrite rules

Additionally, we might want to consider other changes like default (or
empty value) elision in nodeToString, if that is considered a
reasonable option and if we really want to reduce the size of the
pg_rewrite table.

I think a lot of space can be recovered from that: A manual removal of
what seemed to be fields with default values (and the removal of all
query location related fields) in the current definition of
pg_stat_progress_create_index reduces its uncompressed size from
23226B raw and 4204B compressed to 13821B raw and 2784B compressed,
for an on-disk space saving of 33% for this view's ev_action.

Do note, however, that that would add significant branching in the
nodeToString and readNode code, which might slow down that code
significantly. I'm not planning on working on that; but in my opinion
that is a viable path to reducing the size of new database catalogs.

-Matthias

PS. attached patch is not to be considered complete - it is a minimal
example of the array literal form. It fails regression tests because I
didn't bother updating or including the regression tests on system
views.

Attachment Content-Type Size
v0-Replace-system-view-CASE-WHEN-with-array-lookups.patch.txt text/plain 6.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2022-04-08 14:57:21 Re: Mingw task for Cirrus CI
Previous Message Thom Brown 2022-04-08 14:44:51 Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery