Re: GiST VACUUM

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: Re: GiST VACUUM
Date: 2019-03-21 17:04:52
Message-ID: 738ecf7d-abc1-ed64-6af7-09eb6d9f3d2f@iki.fi
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 21/03/2019 18:06, Andrey Borodin wrote:
>> 21 марта 2019 г., в 2:30, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
>> написал(а): one remaining issue that needs to be fixed:
>>
>> During Hot Standby, the B-tree code writes a WAL reord, when a
>> deleted page is recycled, to prevent the deletion from being
>> replayed too early in the hot standby. See _bt_getbuf() and
>> btree_xlog_reuse_page(). I think we need to do something similar in
>> GiST.
>>
>> I'll try fixing that tomorrow, unless you beat me to it. Making
>> the changes is pretty straightforward, but it's a bit cumbersome to
>> test.
>
> I've tried to deal with it and stuck...

So, I came up with the attached. I basically copy-pasted the page-reuse
WAL-logging stuff from nbtree.

When I started testing this, I quickly noticed that empty pages were not
being deleted nearly as much as I expected. I tracked it to this check
in gistdeletepage:

> + if (GistFollowRight(leafPage)
> + || GistPageGetNSN(parentPage) > GistPageGetNSN(leafPage))
> + {
> + /* Don't mess with a concurrent page split. */
> + return false;
> + }

That NSN test was bogus. It prevented the leaf page from being reused,
if the parent page was *ever* split after the leaf page was created. I
don't see any reason to check the NSN here. The NSN is normally used to
detect if a (leaf) page has been concurrently split, when you descend
the tree. We don't need to care about that here; as long as the
FOLLOW_RIGHT flag is not set, the page has a downlink, and if we can
find the downlink and the page is empty, we can delete it.

After removing that bogus NSN check, page reuse become much more
effective. I've been testing this by running this test script repeatedly:

----------
/*
create sequence gist_test_seq;
create table gist_point_tbl(id int4, p point);
create index gist_pointidx on gist_point_tbl using gist(p);
*/

insert into gist_point_tbl (id, p)
select nextval('gist_test_seq'), point(nextval('gist_test_seq'),
1000 + g) from generate_series(1, 10000) g;

delete from gist_point_tbl where id < currval('gist_test_seq') - 20000;
vacuum gist_point_tbl;

select pg_table_size('gist_point_tbl'), pg_indexes_size('gist_point_tbl');
----------

It inserts a bunch of rows, deletes a bunch of older rows, and vacuums.
The interesting thing here is that the key values keep "moving", so that
new tuples are added to different places than where old ones are
removed. That's the case where page reuse is needed.

Before this patch, the index bloated very quickly. With the patch, it
still bloats, because we still don't delete internal pages. But it's a
small fraction of the bloat you got before.

Attached is the latest patch version, to be applied on top of the
IntegerSet patch.

> I think we should make B-tree WAL record for this shared, remove
> BlockNumber and other unused stuff, leaving just xid and db oid. And
> reuse this record for B-tree, GiST and GIN (yeah, it is not checking
> for that conflict).
Good point. For now, I didn't try to generalize this, but perhaps we
should.

> Though, I'm not sure it is important for GIN. Scariest thing that can
> happen: it will return same tid twice. But it is doing bitmap scan,
> you cannot return same bit twice...

Hmm. Could it return a completely unrelated tuple? We don't always
recheck the original index quals in a bitmap index scan, IIRC. Also, a
search might get confused if it's descending down a posting tree, and
lands on a different kind of a page, altogether.

Alexander, you added the mechanism to GIN recently, to prevent pages
from being reused too early (commit 52ac6cd2d0). Do we need something
like B-tree's REUSE_PAGE records in GIN, too, to prevent the same bug
from happening in hot standby?

PS. for Gist, we could almost use the LSN / NSN mechanism to detect the
case that a deleted page is reused: Add a new field to the GiST page
header, to store a new "deleteNSN" field. When a page is deleted, the
deleted page's deleteNSN is set to the LSN of the deletion record. When
the page is reused, the deleteNSN field is kept unchanged. When you
follow a downlink during search, if you see that the page's deleteNSN >
parent's LSN, you know that it was concurrently deleted and recycled,
and should be ignored. That would allow reusing deleted pages
immediately. Unfortunately that would require adding a new field to the
gist page header/footer, which requires upgrade work :-(. Maybe one day,
we'll bite the bullet. Something to keep in mind, if we have to change
the page format anyway, for some reason.

- Heikki

Attachment Content-Type Size
0002-Delete-empty-pages-during-GiST-VACUUM.patch text/x-patch 35.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-03-21 17:23:45 Re: Re: ToDo: show size of partitioned table
Previous Message Andrey Borodin 2019-03-21 16:06:41 Re: GiST VACUUM