Re: Add pg_walinspect function with block info columns

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, melanieplageman(at)gmail(dot)com, boekewurm+postgres(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add pg_walinspect function with block info columns
Date: 2023-03-30 21:41:58
Message-ID: CAH2-Wzm9shOkEDM10_+qOZkRSQhKVxwBFiehH6EHWQQRd_rDPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 29, 2023 at 8:28 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> I took a look at v9 and LGTM.

Pushed, thanks.

There is still an outstanding question around the overhead of
outputting FPIs and even block data from pg_get_wal_block_info(). At
one point Melanie suggested that we'd need to do something about that,
and I tend to agree. Attached patch provides an optional parameter
that will make pg_get_wal_block_info return NULLs for both block_data
and block_fpi_data, no matter whether or not there is something to
show. Note that this only affects those two bytea columns; we'll still
show everything else, including valid block_data_length and
block_fpi_length values (so the metadata describing the on-disk size
of block_data and block_fpi_data is unaffected).

To test this patch, I ran pgbench for about 5 minutes, using a fairly
standard configuration with added indexes and with wal_log_hints
enabled. I ended up with the following WAL records afterwards:

pg(at)regression:5432 [1402115]=# SELECT
"resource_manager/record_type" t,
pg_size_pretty(combined_size) s,
fpi_size_percentage perc_fpi
FROM
pg_get_wal_Stats ('0/10E9D80', 'FFFFFFFF/FFFFFFFF', FALSE) where
combined_size > 0;
┌─[ RECORD 1 ]──────────────────┐
│ t │ XLOG │
│ s │ 1557 MB │
│ perc_fpi │ 22.029466865781302 │
├─[ RECORD 2 ]──────────────────┤
│ t │ Transaction │
│ s │ 49 MB │
│ perc_fpi │ 0 │
├─[ RECORD 3 ]──────────────────┤
│ t │ Storage │
│ s │ 13 kB │
│ perc_fpi │ 0 │
├─[ RECORD 4 ]──────────────────┤
│ t │ CLOG │
│ s │ 1380 bytes │
│ perc_fpi │ 0 │
├─[ RECORD 5 ]──────────────────┤
│ t │ Database │
│ s │ 118 bytes │
│ perc_fpi │ 0 │
├─[ RECORD 6 ]──────────────────┤
│ t │ RelMap │
│ s │ 565 bytes │
│ perc_fpi │ 0 │
├─[ RECORD 7 ]──────────────────┤
│ t │ Standby │
│ s │ 30 kB │
│ perc_fpi │ 0 │
├─[ RECORD 8 ]──────────────────┤
│ t │ Heap2 │
│ s │ 4235 MB │
│ perc_fpi │ 0.6731388657682449 │
├─[ RECORD 9 ]──────────────────┤
│ t │ Heap │
│ s │ 4482 MB │
│ perc_fpi │ 54.46811493602934 │
├─[ RECORD 10 ]─────────────────┤
│ t │ Btree │
│ s │ 1786 MB │
│ perc_fpi │ 22.829279332421116 │
└──────────┴────────────────────┘

Time: 3618.693 ms (00:03.619)

So about 12GB of WAL -- certainly enough to be a challenge for pg_walinspect.

I then ran the following query several times over the same LSN range
as before, with my patch applied, but with behavior equivalent to
current git HEAD (this is with outputting block_data and
block_fpi_data values still turned on):

pg(at)regression:5432 [1402115]=# SELECT
count(*)
FROM
pg_get_wal_block_info ('0/10E9D80', 'FFFFFFFF/FFFFFFFF', false);
┌─[ RECORD 1 ]───────┐
│ count │ 17,031,979 │
└───────┴────────────┘

Time: 35171.463 ms (00:35.171)

The time shown here is typical of what I saw.

And now the same query, but without any overhead for outputting
block_data and block_fpi_data values:

pg(at)regression:5432 [1402115]=# SELECT
count(*)
FROM
pg_get_wal_block_info ('0/10E9D80', 'FFFFFFFF/FFFFFFFF', true);
┌─[ RECORD 1 ]───────┐
│ count │ 17,031,979 │
└───────┴────────────┘

Time: 15235.499 ms (00:15.235)

This time is also typical of what I saw. The variance was fairly low,
so I won't bother describing it.

I think that this is a compelling reason to apply the patch. It would
be possible to get about 75% of the benefit shown here by just
suppressing block_fpi_data output, without suppressing block_data, but
I think that it makes sense to either suppress both or neither. Things
like page split records can write a fairly large amount of WAL in a
way that resembles an FPI, even though technically no FPI is involved.

If there are no objections, I'll move ahead with committing something
along the lines of this patch in the next couple of days.

--
Peter Geoghegan

Attachment Content-Type Size
v1-0001-pg_get_wal_block_info-suppress-block-data-outputs.patch application/octet-stream 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-03-30 22:25:45 Re: Add pg_walinspect function with block info columns
Previous Message Andres Freund 2023-03-30 21:33:48 Re: Thoughts on using Text::Template for our autogenerated code?