Re: Add pg_walinspect function with block info columns

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, boekewurm+postgres(at)gmail(dot)com, melanieplageman(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, pg(at)bowt(dot)ie
Subject: Re: Add pg_walinspect function with block info columns
Date: 2023-03-08 10:31:56
Message-ID: CALj2ACXR_SWazf5UVTBPa_32rwD2=M4Z1j6qt8k7QND2BEpuVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 8, 2023 at 12:58 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Mar 07, 2023 at 03:56:22PM +0530, Bharath Rupireddy wrote:
> > That would be a lot better. Not just the test, but also the
> > documentation can have it. Simple way to generate such a record (both
> > block data and FPI) is to just change the wal_level to logical in
> > walinspect.conf [1], see code around REGBUF_KEEP_DATA and
> > RelationIsLogicallyLogged in heapam.c
>
> I don't agree that we need to go down to wal_level=logical for this.
> The important part is to check that the non-NULL and NULL paths for
> the block data and FPI data are both taken, making 4 paths to check.
> So we need two tests at minimum, which would be either:
> - One SQL generating no FPI with no block data and a second generating
> a FPI with block data. v2 was doing that but did not cover the first
> case.
> - One SQL generating a FPI with no block data and a second generating
> no FPI with block data.
>
> So let's just geenrate a heap record on an UPDATE, for example, like
> in the version attached.

Yup, that should work too because block data gets logged [1].

> > 4. I think we need to free raw_data, raw_page and flags as we loop
> > over multiple blocks (XLR_MAX_BLOCK_ID) and will leak memory which can
> > be a problem if we have many blocks assocated with a single WAL
> > record.
> > flags = (Datum *) palloc0(sizeof(Datum) * bitcnt);
> > Also, we will leak all CStringGetTextDatum memory in the block_id for loop.
> > Another way is to use and reset temp memory context in the for loop
> > over block_ids. I prefer this approach over multiple pfree()s in
> > block_id for loop.
>
> I disagree, this was on purpose in the last version. This version
> finishes by calling AllocSetContextCreate() and MemoryContextDelete()
> once per *record*, which will not be free, and we are arguing about
> resetting the memory context after scanning up to XLR_MAX_BLOCK_ID
> blocks, or 32 blocks which would go up to 32kB per page in the worst
> case. That's not going to matter in a large scan for each record, but
> the extra AllocSet*() calls could. And we basically do the same thing
> on HEAD.

It's not just 32kB per page right? 32*8KB on HEAD (no block data,
flags and CStringGetTextDatum there). With the patch, the number of
pallocs for each block_id = 6 CStringGetTextDatum + BLCKSZ (8KB) +
flags (5*size of ptr) + block data_len. In the worst case, all
XLR_MAX_BLOCK_ID can have both FPIs and block data. Furthermore,
imagine if someone initialized their cluster with a higher BLCKSZ (>=
8KB), then the memory leak happens noticeably on a lower-end system.

I understand that performance is critical here but we need to ensure
memory is used wisely. Therefore, I'd still vote to free at least the
major contributors here, that is, pfree(raw_data);, pfree(raw_page);
and pfree(flags); right after they are done using. I'm sure pfree()s
don't hurt more than resetting memory context for every block_id.

> Any comments?

I think we need to output block data length (blk->data_len) similar to
fpilen to save users from figuring out how to get the length of a
bytea column. This will also keep block data in sync with FPI info.

[1]
needs_backup = (page_lsn <= RedoRecPtr);

(gdb) p page_lsn
$2 = 21581544
(gdb) p RedoRecPtr
$3 = 21484808
(gdb) p needs_backup
$4 = false
(gdb)
(gdb) bt
#0 XLogRecordAssemble (rmid=10 '\n', info=64 '@',
RedoRecPtr=21484808, doPageWrites=true, fpw_lsn=0x7ffde118d640,
num_fpi=0x7ffde118d634, topxid_included=0x7ffde118d633) at xloginsert.c:582
#1 0x00005598cd9c3ef7 in XLogInsert (rmid=10 '\n', info=64 '@') at
xloginsert.c:497
#2 0x00005598cd930452 in log_heap_update (reln=0x7f4a4c7cd808,
oldbuf=136, newbuf=136, oldtup=0x7ffde118d820,
newtup=0x5598d00cb098, old_key_tuple=0x0,
all_visible_cleared=false, new_all_visible_cleared=false)
at heapam.c:8473
#3 0x00005598cd92876e in heap_update (relation=0x7f4a4c7cd808,
otid=0x7ffde118dab2, newtup=0x5598d00cb098, cid=0,
crosscheck=0x0, wait=true, tmfd=0x7ffde118db60,
lockmode=0x7ffde118da74) at heapam.c:3741

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-03-08 10:35:40 Re: HOT chain validation in verify_heapam()
Previous Message Drouvot, Bertrand 2023-03-08 10:25:10 Re: Minimal logical decoding on standbys