Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted
Date: 2017-11-25 02:09:42
Message-ID: CAMkU=1w+wu2XNDp2w3sAKA8jD7Rf5GireLvxwDuAv5KhBdi9OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2017 at 2:43 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Nov 16, 2017 at 2:16 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > The only reference to super-exclusive lock in
> src/backend/access/gin/README,
> > that I can find, is about posting trees, not pending lists. Can you
> quote
> > or give line numbers of the section you are referring to?
>
> That's the section I was referring to. I apologize for being unclear.
>
> I would like to see a *general* explanation of why it's okay that the
> pending list can have its pages recycled early. How can we be sure
> that somebody looking through the pending list won't land on a
> just-recycled page and somehow get confused?
>
> We see this for the entry tree (no deletion is possible in the first
> place), and we also see it for the posting tree (the dance with
> inserters having a pin on the root, and so on). Not mentioning why
> pending list recycling is safe in terms of locking protocols seems
> like an omission that should be addressed. I'm no GIN expert, but I
> would expect this because the pending list seems like a kind of
> extension of the posting tree.
>

The pending list follows the same rule as the right-links on the posting
trees:

"To prevent a backend from reaching a deleted page via a right-link, when
following a right-link the lock on the previous page is not released until
the lock on next page has been acquired."

This lock-chaining rule is also followed when stepping from the meta page
to the first page in the pending list.

(Yes, neither of these are mentioned in the README).

Since pending lists don't have downlinks, the dance with the
super-exclusive locks is not needed.

I think what is true for deleted pages should also be sufficient for
deleted-and-freed-and-recycled pages. Since you are only allowed to follow
a link from a locked buffer and not from private memory, you can't follow a
stale link as long as the links are updated before the page is deleted and
then recycled.

Is this convincing?

Also missing is an argument for the deadlock-freeness. I think what is
needed for that is that you never lock a page earlier in the pending list,
nor the metapage, while holding a lock on a pending list page. (Which as
far as I can tell is true, but I don't know how to exhaustively verify it)

As I've said, it feels slightly off to me that a user backend that is
> performing opportunistic cleanup during insertion can call
> RecordFreeIndexPage() in GIN.
>

I don't know how to address that, because I don't know why it seems off.
No one else does this because no one else generates large number of free
pages during insertions. If we ever implement something like fractal
indexes for BTrees, then we would need to do this from the BTree insertion
(or clean-up-from-insertion) code as well. There are no warnings in the .h
or .c for RecordFreeIndexPage() about where one is allowed to call it
from. I don't see any hazards of this beyond those associated with
deleting the page in the first place. Any deleted page unpinned can be
recycled at any time by a vacuum that comes along while you are swapped
out, no?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-11-25 05:14:00 Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted
Previous Message Amit Kapila 2017-11-25 01:38:03 Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com