|From:||Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>|
|To:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>|
|Subject:||Re: [BUG] Error in BRIN summarization|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 30.07.2020 16:40, Anastasia Lubennikova wrote:
> While testing this fix, Alexander Lakhin spotted another problem.
> After a few runs, it will fail with "ERROR: corrupted BRIN index:
> inconsistent range map"
> The problem is caused by a race in page locking in
> brinGetTupleForHeapBlock :
> (1) bitmapsan locks revmap->rm_currBuf and finds the address of the
> tuple on a regular page "page", then unlocks revmap->rm_currBuf
> (2) in another transaction desummarize locks both revmap->rm_currBuf
> and "page", cleans up the tuple and unlocks both buffers
> (1) bitmapscan locks buffer, containing "page", attempts to access the
> tuple and fails to find it
> At first, I tried to fix it by holding the lock on revmap->rm_currBuf
> until we locked the regular page, but it causes a deadlock with
> brinsummarize(), It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages
> in brin? I haven't found anything in README.
> As an alternative, we can leave locks as is and add a recheck, before
> throwing an error.
Here are the updated patches for both problems.
1) brin_summarize_fix_REL_12_v2 fixes
"failed to find parent tuple for heap-only tuple at (50661,130) in table
This patch checks that we only access initialized entries of
root_offsets array. If necessary, collect the array again. One recheck
is enough here, since concurrent pruning is not possible.
2) brin_pagelock_fix_REL_12_v1.patch fixes
"ERROR: corrupted BRIN index: inconsistent range map"
This patch adds a recheck as suggested in previous message.
I am not sure if one recheck is enough to eliminate the race completely,
but the problem cannot be reproduced anymore.
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
|Next Message||Tom Lane||2020-08-10 18:50:26||Re: pendingOps table is not cleared with fsync=off|
|Previous Message||legrand legrand||2020-08-10 16:51:40||nested queries vs. pg_stat_activity|