Re: [PATCH] Microvacuum for gist.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Microvacuum for gist.
Date: 2015-09-16 11:12:43
Message-ID: 55F94EAB.9070007@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

16.09.2015 07:30, Jeff Janes:
>
> The commit of this patch seems to have created a bug in which updated
> tuples can disappear from the index, while remaining in the table.
>
> It looks like the bug depends on going through a crash-recovery cycle,
> but I am not sure of that yet.
>
> I've looked through the commit diff and don't see anything obviously
> wrong. I notice index tuples are marked dead with only a buffer
> content share lock, and the page is defragmented with only a buffer
> exclusive lock (as opposed to a super-exclusive buffer clean up
> lock). But as far as I can tell, both of those should be safe on an
> index. Also, if that was the bug, it should happen without
> crash-recovery.
>
> The test is pretty simple. I create a 10,000 row table with a
> unique-by-construction id column with a btree_gist index on it and a
> counter column, and fire single-row updates of the counter for random
> ids in high concurrency (8 processes running flat out). I force the
> server to crash frequently with simulated torn-page writes in which
> md.c writes a partial page and then PANICs. Eventually (1 to 3 hours)
> the updates start indicating they updated 0 rows. At that point, a
> forced table scan will find the row, but the index doesn't.
>
> Any hints on how to proceed with debugging this? If I can't get it to
> reproduce the problem in the absence of crash-recovery cycles with an
> overnight run, then I think my next step will be to run it over
> hot-standby and see if WAL replay in the absence of crashes might be
> broken as well.
>

I've found the bug. It's because of mixed calls of
PageIndexMultiDelete() in gistvacuumpage() and
PageIndexTupleDelete() in gistRedoPageUpdateRecord().
These functions are conflicting.

I've fixed my patch by change MultiDelete to TupleDelete in
gistvacuumpage(). Patch is attached.
But It seems to me that it would be better to rewrite all mentions of
TupleDelete to MultiDelete in gist code.
I'm working on it.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company()

Attachment Content-Type Size
micro_fix.patch text/x-patch 740 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2015-09-16 11:21:32 Re: Sequence Access Method WIP
Previous Message Thomas Munro 2015-09-16 11:07:03 Re: synchronous_commit = apply