Re: Write Ahead Logging for Hash Indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Write Ahead Logging for Hash Indexes
Date: 2016-09-12 05:59:23
Message-ID: CAMkU=1wiW=+k_C0uweMsD8OUdpGYVmbMprkBAsKBNjQiKzUcgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 11, 2016 at 7:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Mon, Sep 12, 2016 at 7:00 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > On Thu, Sep 8, 2016 at 12:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >
> >>
> >> I plan to do testing using my own testing harness after changing it to
> >> insert a lot of dummy tuples (ones with negative values in the pseudo-pk
> >> column, which are never queried by the core part of the harness) and
> >> deleting them at random intervals. I think that none of pgbench's
> built in
> >> tests are likely to give the bucket splitting and squeezing code very
> much
> >> exercise.
> >
> >
> >
> > I've implemented this, by adding lines 197 through 202 to the count.pl
> > script. (I'm reattaching the test case)
> >
> > Within a few minutes of testing, I start getting Errors like these:
> >
> > 29236 UPDATE XX000 2016-09-11 17:21:25.893 PDT:ERROR: buffer 2762 is not
> > owned by resource owner Portal
> > 29236 UPDATE XX000 2016-09-11 17:21:25.893 PDT:STATEMENT: update foo set
> > count=count+1 where index=$1
> >
> >
> > In one test, I also got an error from my test harness itself indicating
> > tuples are transiently missing from the index, starting an hour into a
> test:
> >
> > child abnormal exit update did not update 1 row: key 9555 updated 0E0 at
> > count.pl line 194.\n at count.pl line 208.
> > child abnormal exit update did not update 1 row: key 8870 updated 0E0 at
> > count.pl line 194.\n at count.pl line 208.
> > child abnormal exit update did not update 1 row: key 8453 updated 0E0 at
> > count.pl line 194.\n at count.pl line 208.
> >
> > Those key values should always find exactly one row to update.
> >
> > If the tuples were permanently missing from the index, I would keep
> getting
> > errors on the same key values very frequently. But I don't get that, the
> > errors remain infrequent and are on different value each time, so I think
> > the tuples are in the index but the scan somehow misses them, either
> while
> > the bucket is being split or while it is being squeezed.
> >
> > This on a build without enable-asserts.
> >
> > Any ideas on how best to go about investigating this?
> >
>
> I think these symptoms indicate the bug in concurrent hash index
> patch, but it could be that the problem can be only revealed with WAL
> patch. Is it possible to just try this with concurrent hash index
> patch? In any case, thanks for testing it, I will look into these
> issues.
>

My test program (as posted) injects crashes and then checks the
post-crash-recovery system for consistency, so it cannot be run as-is
without the WAL patch. I also ran the test with crashing turned off (just
change the JJ* variables at the stop of the do.sh to all be set to the
empty string), and in that case I didn't see either problem, but it it
could just be that I that I didn't run it long enough.

It should have been long enough to detect the rather common "buffer <x> is
not owned by resource owner Portal" problem, so that one I think is
specific to the WAL patch (probably the part which tries to complete bucket
splits when it detects one was started but not completed?)

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-09-12 06:02:33 Re: patch: function xmltable
Previous Message Amit Langote 2016-09-12 05:59:13 Re: Let file_fdw access COPY FROM PROGRAM