Re: GiST insert algorithm rewrite

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST insert algorithm rewrite
Date: 2010-12-03 21:54:44
Message-ID: 4CF96724.5050301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01.12.2010 11:00, Heikki Linnakangas wrote:
> On 01.12.2010 04:10, Robert Haas wrote:
>> On Tue, Nov 30, 2010 at 10:26 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Does the current code cope with the corruption?
>>>
>>> It's not corruption, but "intended degradation". Yes, the current
>>> code copes
>>> with it, that's how GiST survives a crash. However, even with the
>>> current
>>> code, VACUUM will nag if it finds any invalid tuples with this message:
>>>
>>> ereport(NOTICE,
>>> (errmsg("index \"%s\" needs VACUUM FULL or REINDEX to finish crash
>>> recovery",
>>>
>>> That's harmless, in the sense that all scans and inserts work fine, but
>>> scans might need to do more work than if the invalid tuple wasn't there.
>>>
>>> I don't think we need to go out of our way to support such degraded
>>> indexes
>>> in 9.1. If you see such notices in your logs, you should REINDEX anyway,
>>> before of after pg_upgrade. Let's just make sure that you get a
>>> reasonable
>>> error message in 9.1 if a scan or insert encounters such a tuple.
>>
>> I just don't want to take a risk of giving people unexpected wrong
>> answers. It's not clear to me whether that's a risk here or not.
>
> You'll get an error if a scan encounters an invalid tuple.
>
> In the patch I posted, I just ripped out everything related to invalid
> tuples altogether. But we should add a check and ereport for that before
> commit.

Here's an updated patch.

On closer look, supporting the invalid tuples in scans was trivial, so I
kept that after all. So you can still query an index with invalid
tuples. If an insert encounters one, you get an error, and VACUUM emits
a LOG message on any such tuples.

There's one bug remaining that I found during testing. If you crash,
leaving an incomplete split behind, and then vacuum the table removing
all the aborted tuples from the pages, it's possible that you end up
with a completely empty page that has no downlink yet. The code to
complete incomplete splits doesn't cope with that at the moment - it
doesn't know how to construct a parent key for a child that has no tuples.

The nicest way to handle that would be to recycle the empty page instead
of trying to finish the page split, but I think there might be a race
condition there if the page gets quickly reused while a scan is just
about to visit it through the rightlink. GiST doesn't seem to ever reuse
pages in normal operation, which conveniently avoids that problem.
Simply abandoning the page forever is certainly one way to handle it, it
shouldn't happen that often.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
gist-insert-rewrite-3.patch text/x-diff 108.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2010-12-03 21:57:20 Re: ERROR: could not identify an equality operator for type box
Previous Message Oleg Bartunov 2010-12-03 21:53:57 ERROR: could not identify an equality operator for type box