Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Date: 2019-02-27 04:56:25
Message-ID: CABOikdOzzLE8A9CivB+xYVZSVUUHRfkA=FOC48TNHtyWPBBhmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 27, 2019 at 7:05 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

>
>
> After doing a truncation and '\copy ... with (freeze)' of a table with
> long data, I find that the associated toast table has a handful of unfrozen
> blocks. I don't know if that is an actual problem, but it does seem a bit
> odd, and thus suspicious.
>
>
Hi Jeff, thanks for looking at it and the test. I can reproduce the problem
and quite curiously block number 1 and then every 32672th block is getting
skipped.

postgres=# select * from pg_visibility('pg_toast.pg_toast_16384') where
all_visible = 'f';
blkno | all_visible | all_frozen | pd_all_visible
--------+-------------+------------+----------------
1 | f | f | f
32673 | f | f | f
65345 | f | f | f
98017 | f | f | f
130689 | f | f | f
163361 | f | f | f
<snip>

Having investigated this a bit, I see that a relcache invalidation arrives
after 1st and then after every 32672th block is filled. That clears the
rel->rd_smgr field and we lose the information about the saved target
block. The code then moves to extend the relation again and thus skips the
previously less-than-half filled block, losing the free space in that block.

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 0));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B37748 | 0 | 4 | 40 | 64 | 8192 | 8192 |
4 | 0
(1 row)

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 1));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B39A28 | 0 | 4 | 28 | 7640 | 8192 | 8192 |
4 | 0
(1 row)

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 2));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B3BE08 | 0 | 4 | 40 | 64 | 8192 | 8192 |
4 | 0
(1 row)

So the block 1 has a large amount of free space (upper - lower), which
never gets filled.

I am not yet sure what causes the relcache invalidation at regular
intervals. But if I have to guess, it could be because of a new VM (or
FSM?) page getting allocated. I am bit puzzled because this issue seems to
only occur with toast tables since I tested the patch while writing it on a
regular table and did not see any block remaining unfrozen. I tested only
upto 450 blocks, but that shouldn't matter because with your test, we see
the problem with block 1 as well. So something to look into in more detail.

While we could potentially fix this by what you'd done in the original
patch and what Kuntal also suggested, i.e. by setting the PD_ALL_VISIBLE
bit during page initialisation itself, I am a bit circumspect about that
approach for two reasons:

1. It requires us to then add extra logic to avoid clearing the bit during
insertions
2. It requires us to also update the VM bit during page init or risk having
divergent views on the page-level bit and the VM bit.

And even if we do that, this newly discovered problem of less-than-half
filled intermediate blocks remain. I wonder if we should instead track the
last used block in BulkInsertState and if the relcache invalidation flushes
smgr, start inserting again from the last saved block. In fact, we already
track the last used buffer in BulkInsertState and that's enough to know the
last used block.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2019-02-27 05:16:34 Re: Early WIP/PoC for inlining CTEs
Previous Message Tom Lane 2019-02-27 04:45:56 Re: Early WIP/PoC for inlining CTEs