From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Write Ahead Logging for Hash Indexes |
Date: | 2016-09-12 05:59:23 |
Message-ID: | CAMkU=1wiW=+k_C0uweMsD8OUdpGYVmbMprkBAsKBNjQiKzUcgg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Sep 11, 2016 at 7:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> On Mon, Sep 12, 2016 at 7:00 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > On Thu, Sep 8, 2016 at 12:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >
> >>
> >> I plan to do testing using my own testing harness after changing it to
> >> insert a lot of dummy tuples (ones with negative values in the pseudo-pk
> >> column, which are never queried by the core part of the harness) and
> >> deleting them at random intervals. I think that none of pgbench's
> built in
> >> tests are likely to give the bucket splitting and squeezing code very
> much
> >> exercise.
> >
> >
> >
> > I've implemented this, by adding lines 197 through 202 to the count.pl
> > script. (I'm reattaching the test case)
> >
> > Within a few minutes of testing, I start getting Errors like these:
> >
> > 29236 UPDATE XX000 2016-09-11 17:21:25.893 PDT:ERROR: buffer 2762 is not
> > owned by resource owner Portal
> > 29236 UPDATE XX000 2016-09-11 17:21:25.893 PDT:STATEMENT: update foo set
> > count=count+1 where index=$1
> >
> >
> > In one test, I also got an error from my test harness itself indicating
> > tuples are transiently missing from the index, starting an hour into a
> test:
> >
> > child abnormal exit update did not update 1 row: key 9555 updated 0E0 at
> > count.pl line 194.\n at count.pl line 208.
> > child abnormal exit update did not update 1 row: key 8870 updated 0E0 at
> > count.pl line 194.\n at count.pl line 208.
> > child abnormal exit update did not update 1 row: key 8453 updated 0E0 at
> > count.pl line 194.\n at count.pl line 208.
> >
> > Those key values should always find exactly one row to update.
> >
> > If the tuples were permanently missing from the index, I would keep
> getting
> > errors on the same key values very frequently. But I don't get that, the
> > errors remain infrequent and are on different value each time, so I think
> > the tuples are in the index but the scan somehow misses them, either
> while
> > the bucket is being split or while it is being squeezed.
> >
> > This on a build without enable-asserts.
> >
> > Any ideas on how best to go about investigating this?
> >
>
> I think these symptoms indicate the bug in concurrent hash index
> patch, but it could be that the problem can be only revealed with WAL
> patch. Is it possible to just try this with concurrent hash index
> patch? In any case, thanks for testing it, I will look into these
> issues.
>
My test program (as posted) injects crashes and then checks the
post-crash-recovery system for consistency, so it cannot be run as-is
without the WAL patch. I also ran the test with crashing turned off (just
change the JJ* variables at the stop of the do.sh to all be set to the
empty string), and in that case I didn't see either problem, but it it
could just be that I that I didn't run it long enough.
It should have been long enough to detect the rather common "buffer <x> is
not owned by resource owner Portal" problem, so that one I think is
specific to the WAL patch (probably the part which tries to complete bucket
splits when it detects one was started but not completed?)
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2016-09-12 06:02:33 | Re: patch: function xmltable |
Previous Message | Amit Langote | 2016-09-12 05:59:13 | Re: Let file_fdw access COPY FROM PROGRAM |