Quick Links

Proposal: speeding up GIN build with parallel workers

From:	"Constantin S(dot) Pan" <kvapen(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Proposal: speeding up GIN build with parallel workers
Date:	2016-01-15 22:38:39
Message-ID:	20160116013839.57cfcb37@thought
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Hackers.

The task of building GIN can require lots of time and eats 100 % CPU,
but we could easily make it use more than a 100 %, especially since we
now have parallel workers in postgres.

The process of building GIN looks like this:

1. Accumulate a batch of index records into an rbtree in maintenance
work memory.

2. Dump the batch to disk.

3. Repeat.

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

This speeds up the first step N times, but slows down the second one,
because when multiple workers dump item pointers for the same key, each
of them has to read and decode the results of the previous one. That is
a huge waste, but there is an idea on how to eliminate it.

When it comes to dumping the next batch, a worker does not do it
independently. Instead, it (and every other worker) sends the
accumulated index records to the parent (backend) in ascending key
order. The backend, which receives the records from the workers through
shared memory, can merge them and dump each of them once, without the
need to reread the records N-1 times.

In current state the implementation is just a proof of concept
and it has all the configuration hardcoded, but it already works as is,
though it does not speed up the build process more than 4 times on my
configuration (12 CPUs). There is also a problem with temporary tables,
for which the parallel mode does not work.

Please leave your feedback.

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
pgin.patch	text/x-patch	7.1 KB

Responses

Re: Proposal: speeding up GIN build with parallel workers at 2016-01-15 23:29:51 from Peter Geoghegan
Re: Proposal: speeding up GIN build with parallel workers at 2016-01-18 18:43:54 from Robert Haas
Re: [WIP] speeding up GIN build with parallel workers at 2016-02-17 15:55:20 from Constantin S. Pan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Julien Rouhaud	2016-01-15 22:42:21	Re: GIN pending list clean up exposure to SQL
Previous Message	Jeff Janes	2016-01-15 21:59:02	Re: GIN pending list clean up exposure to SQL