GIN non-intrusive vacuum of posting tree

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Evgeniy Efimkin <efimkin(at)yandex-team(dot)ru>, Vladimir Borodin <root(at)simply(dot)name>, Teodor Sigaev <teodor(at)postgrespro(dot)ru>
Subject: GIN non-intrusive vacuum of posting tree
Date: 2016-11-28 17:31:22
Message-ID: CAJEAwVEA4iLqBfdi3Qm=UwGcFX=h5x0FSbifo2+EE4G3D9e3wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

Here is a patch with more concurrency-friendly vacuum of GIN.

===Short problem statement===
Currently GIN vacuum during cleaning posting tree can lock this tree
for a long time without real need.

===Problem statement===
Vacuum of posting tree (function ginVacuumPostingTree() in ginvacuum.c
) is doing two passes through posting tree:

1. ginVacuumPostingTreeLeaves() takes LockBufferForCleanup,
effectively excluding all inserts. Then it traverses down trough tree
taking exclusive lock, effectively excluding all reads. On leaf level
it calls ginVacuumPostingTreeLeaf(), which deletes all dead tuples. If
there are any empty page, root lock is not relaesed, it passes to
stage two.

Interim: vacuum_delay_point(), which can hand for a while, holding
CleanupLock on root.

2. If there are any empty pages, ginScanToDelete() scans through tree,
deleting empty lead pages, then deleting empty inner pages, if they
emerged after leaf page deletion.

===Proposed algorithm===
ginVacuumPostingTreeLeaves() takes SHARED locks on inner pages and
EXCLUSIVE locks on leaf pages. On leaf pages it calls
ginVacuumPostingTreeLeaf().

If ginVacuumPostingTreeLeaves() encounters subtree with all leafs
empty, then it takes LockBufferForCleanup() on page P pointing to
empty subtree. After taking lock on P, ginScanToDelete() is called for
P. For every parent of P we can guarantee that this procedure will not
be necessary: if P was empty subtree itself, we would pass this
procedure to parent (unless P is root, than behavior is effectively
equal to before-patched).

===Testing===
This patch solved a problem encountered by Evgeniy Efimkin and
Vladimir Borodin from Yandex.Mail.
They implemented testbed with GIN index

CREATE TABLE BOX (uid bigint,
lids integer[] NOT NULL
CHECK (array_ndims(lids) = 1));

CREATE OR REPLACE FUNCTION ulids(
i_uid bigint,
i_lids integer[]
) RETURNS bigint[] AS $$
SELECT array_agg((i_uid << 32) | lid)
FROM unnest(i_lids) lid;
$$ LANGUAGE SQL IMMUTABLE STRICT;

CREATE INDEX i_box_uid_lids
ON box USING gin (ulids(uid, lids)) WITH (FASTUPDATE=OFF);

Executing by a pgbench following query

\setrandom uid 1 1500000
\setrandom lid 1 1000
\setrandom lid2 1 1000
\setrandom lid3 1 1000
BEGIN;
insert into box values(:uid,'{:lid,:lid2,:lid3}');
insert into box values(:uid,'{}');
END;

and eventually deleting some of data. This testbed showed VACUUM
holding inserts for up to tenths of seconds. They claim that proposed
patch made vacuum invisible in this test.

Evgeniy, Vladimir, if I missed something or you have something to add,
please join discussion.

===Known issues===
1. Probably, there exists better algorithms. I could not adapt B-tree
vacuum wizardy to GIN, but that does not mean it is impossible.

2. There may be more CleanUp locks taken. Under write-dense scenarios,
this can lead to longer vacuum, but it should not consume more
resources, just wait.

3. I have changed locks sequence during page deleteion. I think that
it is safe, but comments there were in fear of inserts (excluded by
CleanupLock). More details in a code of ginDeletePage().

4. ginVacuumPostingTreeLeaves() traverses children pages after release
of parent lock (event SHARED). This is safe if there is only one
vacuum at a time. Is there a reason to be afraid of concurrent
vacuums?

I will be happy if someone points me were is a best place to read
about B-tree vacuum process. Or if someone will explain me how it
works in a few words.
Dev process is here
https://github.com/x4m/postgres_g/pull/2

Thank you for reading, I'm looking forward to hear your thought on the matter.

Best regards, Andrey Borodin.

Attachment Content-Type Size
gin_nonintrusive_vacuum_v0.diff text/plain 9.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Convey 2016-11-28 17:40:19 Re: Tackling JsonPath support
Previous Message Robert Haas 2016-11-28 17:31:13 Re: UNDO and in-place update