Re: Proposal: speeding up GIN build with parallel workers

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: "Constantin S(dot) Pan" <kvapen(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: speeding up GIN build with parallel workers
Date: 2016-01-15 23:29:51
Message-ID: CAM3SWZRecitpRdsg8XmBQ6rAg_dzrpaLMDsfUD0XRnzfpXGXJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kvapen(at)gmail(dot)com> wrote:
> I have a draft implementation which divides the whole process between
> N parallel workers, see the patch attached. Instead of a full scan of
> the relation, I give each worker a range of blocks to read.

I am currently working on a patch that allows B-Tree index builds to
be performed in parallel. I think I'm a week or two away from posting
it.

Even without parallelism, wouldn't it be better if GIN indexes were
built using tuplesort? I know way way less about the gin am than the
nbtree am, but I imagine that a prominent cost for GIN index builds is
constructing the main B-Tree (the one that's constructed over key
values) itself. Couldn't tuplesort.c be adapted to cover this case?
That would be much faster in general, particularly with the recent
addition of abbreviated keys, while also leaving a clear path forward
to performing the build in parallel.

I understand that a long term ambition for the gin am is to merge it
with nbtree, to almost automatically benefit from enhancements, and to
reduce the maintenance burden of each.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-01-16 01:00:36 Re: Combining Aggregates
Previous Message Julien Rouhaud 2016-01-15 22:42:21 Re: GIN pending list clean up exposure to SQL