Re: Review: GIN non-intrusive vacuum of posting tree

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Vladimir Borodin <root(at)simply(dot)name>
Subject: Re: Review: GIN non-intrusive vacuum of posting tree
Date: 2017-01-24 11:18:46
Message-ID: CAJEAwVHkhO8_nGTPD3MMKfvkgSbByW31RuCtSVdWYYh9d6XJ7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

I see your point on alternatives. I will do my best to generate more
ideas on how vacuum can be done withing existing index structure, this
will take a day or two.
In this message I'll answer some questions.

2017-01-23 22:45 GMT+05:00 Jeff Davis <pgsql(at)j-davis(dot)com>:
> 1. I haven't seen the approach before of finding the parent of any
> empty subtree. While that sounds like a reasonable thing to do, it
> worries me to invent new approaches in an area as well-studied as
> btrees. The problem is less that it's too complex, and more that it's
> different. Future developers looking at this code will expect it to be
> either very simple or very standard, because it's just a posting list.

Technically, approach of locking a subtree is not novel. Lehman and
Yao focused on "that any process for manipulating the tree uses only a
small (constant) number of locks at any time." We are locking unknown
and possibly large amount of pages.

> 2. It relies on searches to only start from the very leftmost page
> (correct?). If someone wants to optimize the intersection of two
> posting trees later, they might break it if they don't understand the
> assumptions in this patch.

Searches start from root, to find a correct position.
But cuncurrency of the patch does not depend on that.
Patch relies on the idea, that at any given subtree, if:
1. There is no inserts within subtree (guaranteed by cleanup lock)
2. Every page was locked with Exclusive lock at least once, locks were
taken from left to right, without lock coupling.
than: there is no read searches within the subtree.

> 3. This adds a heap allocation, and copies the contents of the page,
> and then releases the pin and lock. GinStepRight is careful to avoid
> doing that (it locks the target before releasing the page containing
> the pointer). I don't see a problem in your patch, but again, we are
> breaking an assumption that future developers might make.

Yes, heap allocation worries me a lot too, but let me explain details.

GinStepRight - is the place where lock coupling is done. That is, you
lock rightlink before releasing already locked page.
Generally, B-tree tends to lock just one page at a time, but this
place is an important exception (not the only exception, but others
behave similarly).

Proposed patch is doing tree traversal at worst two times.

First: scan trough the tree, lock internal pages for read, lock leafs
for write, delete dead tuples. If empty subtrees are found, invoke for
them second stage.
Here children Block Numbers are stored in palloc`d array, to do this
scan with minimum of locks. This is unacceptable if there may be
concurrent VACUUM, which can delete a page when we are asleep for some
reason. (My point of worring number 1)

Second: this stage recives subtree containing a lot of pages which
must be just deleted. Here we take cleanup lock on subtree root,
ensuring no inserts are within (they hold pins on stack from root to
leaf). Than we take exclusive locks on all pages from left to right,
without lock coupling. When we find empty page, we lock left page too,
thus doing lock coupling form right to left. If search might be wihtin
a subtree, it could deadlock with us. (My point of worring number 2)

But both points seems to have reasonable explanation why it cannot happen.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-01-24 11:36:03 Re: Add pgstathashindex() to get hash index table statistics.
Previous Message Nikhil Sontakke 2017-01-24 11:08:07 Re: Speedup twophase transactions