Re: Write Ahead Logging for Hash Indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Write Ahead Logging for Hash Indexes
Date: 2016-08-24 17:02:28
Message-ID: CAMkU=1wiPnV2d+5JauFoVMN=GkYKvRhjN9m8Huy_4+y1rw+LPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> >
> > After an intentionally created crash, I get an Assert triggering:
> >
> > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
> > (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
> >
> > freep[0] is zero and bitmapbit is 16.
> >
>
> Here what is happening is that when it tries to clear the bitmapbit,
> it expects it to be set. Now, I think the reason for why it didn't
> find the bit as set could be that after the new overflow page is added
> and the bit corresponding to it is set, you might have crashed the
> system and the replay would not have set the bit. Then while freeing
> the overflow page it can hit the Assert as mentioned by you. I think
> the problem here could be that I am using REGBUF_STANDARD to log the
> bitmap page updates which seems to be causing the issue. As bitmap
> page doesn't follow the standard page layout, it would have omitted
> the actual contents while taking full page image and then during
> replay, it would not have set the bit, because page doesn't need REDO.
> I think here the fix is to use REGBUF_NO_IMAGE as we use for vm
> buffers.
>
> If you can send me the detailed steps for how you have produced the
> problem, then I can verify after fixing whether you are seeing the
> same problem or something else.
>

The test is rather awkward, it might be easier to just have me test it.
But, I've attached it.

There is a patch that needs to applied and compiled (alongside your
patches, of course), to inject the crashes. A perl script which creates
the schema and does the updates. And a shell script which sets up the
cluster with the appropriate parameters, and then calls the perl script in
a loop.

The top of the shell script has some hard coded paths to the binaries, and
to the test data directory (which is automatically deleted)

I run it like "sh do.sh >& do.err &"

It gives two different types of assertion failures:

$ fgrep TRAP: do.err |sort|uniq -c

21 TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
(1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
32 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c",
Line: 2506)

The second one is related to the intentional crashes, and so is not
relevant to you.

Cheers,

Jeff

Attachment Content-Type Size
count.pl application/octet-stream 8.5 KB
crash_REL10.patch application/octet-stream 12.9 KB
do.sh application/x-sh 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2016-08-24 17:32:36 Re: SP-GiST support for inet datatypes
Previous Message Robert Haas 2016-08-24 16:48:19 Re: "Some tests to cover hash_index"