|From:||Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>|
|To:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Cc:||Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM|
|Views:||Raw Message | Whole Thread | Download mbox|
Tom Lane wrote:
> So in a few more runs this morning using Alvaro's simplified test case,
> I have seen the following behaviors not previously reported:
> 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
> offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the
> same message appears in plain PageIndexTupleDelete as well.
> I think we've been too hasty to assume all instances of this came out of
Ah, I wasn't paying close attention to the originator routine of the
message, but you're right, I see this one too.
> 2. Crashes in the data-insertion process, not only the process running
Yeah, I saw these. I was expecting it, since the two routines
(brininsert and summarize_range) pretty much share the insertion
> I really don't understand how any of this "let's release the buffer
> lock and then take it back later" logic is supposed to work reliably.
Yeah, evidently that was way too optimistic and I'll need to figure out
a better mechanism to handle this.
The intention was to avoid deadlocks while locking the target page for
the insertion: by having both pages start unlocked we can simply lock
them in block number order. If we keep the page containing the tuple
locked, I don't see how to reliably avoid a deadlock while acquiring a
buffer to insert the new tuple.
> BTW, while I'm bitching, it seems fairly insane from a concurrency
> standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
> while holding at least one and possibly two buffer locks. Shouldn't
> that be done someplace else?
Hmm. I spent a lot of effort (commit ccc4c074994d) to avoid leaving
pages uninitialized / unrecorded in FSM. I left this on purpose on the
rationale that trying to fix it would make the callsites more convoluted
(the retry logic doesn't help). But as I recall this was supposed to be
done only in the rare case where the buffer could not be returned to
caller ... but that's not what the current code does, so there is
something wrong there.
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Tom Lane||2017-10-31 17:42:41||Re: Fix dumping pre-10 DBs by pg_dump10 if table "name" exists|
|Previous Message||Peter Eisentraut||2017-10-31 17:23:03||SQL procedures|