Parallel Bitmap scans a bit broken

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com
Subject: Parallel Bitmap scans a bit broken
Date: 2017-03-09 15:47:37
Message-ID: CAKJS1f8OtrHE+-P+=E=4ycnL29e9idZKuaTQ6o2MbhvGN9D8ig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was just doing some testing on [1] when I noticed that there's a problem
with parallel bitmap index scans scans.

Test case:

patch with [1]

=# create table r1(value int);
CREATE TABLE
=# insert into r1 select (random()*1000)::int from
generate_Series(1,1000000);
INSERT 0 1000000
=# create index on r1 using brin(value);
CREATE INDEX
=# set enable_seqscan=0;
SET
=# explain select * from r1 where value=555;
QUERY PLAN

-----------------------------------------------------------------------------------------
Gather (cost=3623.52..11267.45 rows=5000 width=4)
Workers Planned: 2
-> Parallel Bitmap Heap Scan on r1 (cost=2623.52..9767.45 rows=2083
width=4)
Recheck Cond: (value = 555)
-> Bitmap Index Scan on r1_value_idx (cost=0.00..2622.27
rows=522036 width=0)
Index Cond: (value = 555)
(6 rows)

=# explain analyze select * from r1 where value=555;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The crash occurs in tbm_shared_iterate() at:

PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];

I see in tbm_prepare_shared_iterate() tbm->npages is zero. I'm unsure if
bringetbitmap() does something different with npages than btgetbitmap()
around setting npages?

But anyway, due to the npages being 0 the tbm->ptpages is not allocated
in tbm_prepare_shared_iterate()

if (tbm->npages)
{
tbm->ptpages = dsa_allocate(tbm->dsa, sizeof(PTIterationArray) +
tbm->npages * sizeof(int));

so when tbm_shared_iterate runs this code;

/*
* If both chunk and per-page data remain, must output the numerically
* earlier page.
*/
if (istate->schunkptr < istate->nchunks)
{
PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
BlockNumber chunk_blockno;

chunk_blockno = chunk->blockno + istate->schunkbit;

if (istate->spageptr >= istate->npages ||
chunk_blockno < page->blockno)
{
/* Return a lossy page indicator from the chunk */
output->blockno = chunk_blockno;
output->ntuples = -1;
output->recheck = true;
istate->schunkbit++;

LWLockRelease(&istate->lock);
return output;
}
}

it fails, due to idxpages pointing to random memory

Probably this is a simple fix for the authors, so passing it along. I'm a
bit unable to see how the part above is meant to work.

[1]
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-03-09 15:54:11 Re: [bug fix] dblink leaks unnamed connections
Previous Message Amit Kapila 2017-03-09 15:44:05 Re: Write Ahead Logging for Hash Indexes