Re: [BUG] Error in BRIN summarization

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] Error in BRIN summarization
Date: 2020-08-11 23:43:21
Message-ID: 20200811234321.GA17597@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-Jul-30, Anastasia Lubennikova wrote:

> While testing this fix, Alexander Lakhin spotted another problem. I
> simplified  the test case to this:

Ah, good catch. I think a cleaner way to fix this problem is to just
consider the range as not summarized and return NULL from there, as in
the attached patch. Running your test case with a telltale WARNING
added at that point, it's clear that it's being hit.

By returning NULL, we're forcing the caller to scan the heap, which is
not great. But note that if you retry, and your VACUUM hasn't run yet
by the time we go through the loop again, the same thing would happen.
So it seems to me a good enough answer.

A much more troubling thought is what happens if the range is
desummarized, then the index item is used for the summary of a different
range. Then the index might end up returning corrupt results.

> At first, I tried to fix it by holding the lock on revmap->rm_currBuf until
> we locked the regular page, but it causes a deadlock with brinsummarize(),
> It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages in
> brin? I haven't found anything in README.

Umm, I thought that stuff was in the README, but it seems I didn't add
it there. I think I had a .org file with my notes on that ... must be
in an older laptop disk, because it's not in my worktree for that. I'll
see if I can fish it out.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
brin-desumm-race.patch text/x-diff 908 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-12 00:19:38 Re: Improving connection scalability: GetSnapshotData()
Previous Message Tom Lane 2020-08-11 21:28:42 Re: Inconsistent behavior of smart shutdown handling for queries with and without parallel workers