Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: lubennikovaav(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, pavan(dot)deolasee(at)gmail(dot)com, ibrar(dot)ahmad(at)gmail(dot)com
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Date: 2020-11-02 13:44:22
Message-ID: 1d2e0f4a-6ab8-10f1-c177-af97475df001@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.10.2020 03:42, Tomas Vondra wrote:
> Hi,
>
> I might be somewhat late to the party, but I've done a bit of
> benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
> different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
> current master and with the patch.
>
> So I don't really observe any measurable slowdowns in the COPY part (in
> fact there seems to be a tiny speedup, but it might be just noise). In
> the VACUUM part, there's clear speedup when the data was already frozen
> by COPY (Yes, those are zeroes, because it took less than 1 second.)
>
> So that looks pretty awesome, I guess.
>
> For the record, these tests were run on a server with NVMe SSD, so
> hopefully reliable and representative numbers.
>
Thank you for the review.

> A couple minor comments about the code:
>
> 2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
> of personal preference, but I'd just use a single level of nesting, i.e.
> something like this:
>
>     /* if everything frozen, the whole page has to be visible */
>     Assert(!(all_frozen_set && !PageIsAllVisible(page)));
>
>     /*
>      * If we've frozen everything on the page, and if we're already
>      * holding pin on the vmbuffer, record that in the visibilitymap.
>      * If we're not holding the pin, it's OK to skip this - it's just
>      * an optimization.
>      *
>      * It's fine to use InvalidTransactionId here - this is only used
>      * when HEAP_INSERT_FROZEN is specified, which intentionally
>      * violates visibility rules.
>      */
>     if (all_frozen_set &&
>         visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
>     visibilitymap_set(...);
>
> IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
> to say what we're doing etc. And I've moved the comment from inside the
> if block into the main comment - that was making it harder to read too.
>
I agree that it's a matter of taste. I've updated comments and left
nesting unchanged to keep assertions simple.

>
> 3) I see RelationGetBufferForTuple does this:
>
>     /*
>      * This is for COPY FREEZE needs. If page is empty,
>      * pin vmbuffer to set all_frozen bit
>      */
>     if ((options & HEAP_INSERT_FROZEN) &&
>         (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
>         visibilitymap_pin(relation, BufferGetBlockNumber(buffer),
> vmbuffer);
>
> so is it actually possible to get into the (all_frozen_set) without
> holding a pin on the visibilitymap? I haven't investigated this so
> maybe there are other ways to get into that code block. But the new
> additions to hio.c get the pin too.
>
I was thinking that GetVisibilityMapPins() can somehow unset the pin. I
gave it a second look. And now I don't think it's possible to get into
this code block without a pin.  So, I converted this check into an
assertion.

>
> 4) In heap_xlog_multi_insert we now have this:
>
>     if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
>         PageClearAllVisible(page);
>     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
>         PageSetAllVisible(page);
>
> IIUC it makes no sense to have both flags at the same time, right? So
> what about adding
>
>     Assert(!(XLH_INSERT_ALL_FROZEN_SET &&
> XLH_INSERT_ALL_VISIBLE_CLEARED));
>
> to check that?
>
Agree.

I placed this assertion to the very beginning of the function. It also
helped to simplify the code a bit.
I also noticed, that we were not updating visibility map for all_frozen
from heap_xlog_multi_insert. Fixed.

>
> I wonder what to do about the heap_insert - I know there are concerns it
> would negatively impact regular insert, but is it really true? I suppose
> this optimization would be valuable even for cases where multi-insert is
> not possible.
>
Do we have something like INSERT .. FREEZE? I only see
TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations. Can
you explain, what use-case are we trying to optimize by extending this
patch to heap_insert()?

The new version is attached.
I've also fixed a typo in the comment by Tatsuo Ishii suggestion.
Also, I tested this patch with replication and found no issues.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
copy-freeze-vm_freeze_v4.patch text/x-patch 11.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2020-11-02 13:46:02 Re: -O switch
Previous Message Thomas Munro 2020-11-02 13:38:31 Re: Collation versioning