Re: [WIP] speeding up GIN build with parallel workers

From: "Constantin S(dot) Pan" <kvapen(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] speeding up GIN build with parallel workers
Date: 2016-03-16 09:25:17
Message-ID: 20160316122517.2a76cfd2@ppg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 16 Mar 2016 12:14:51 +0530
Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Wed, Mar 16, 2016 at 5:41 AM, Constantin S. Pan <kvapen(at)gmail(dot)com>
> wrote:
>
> > On Mon, 14 Mar 2016 08:42:26 -0400
> > David Steele <david(at)pgmasters(dot)net> wrote:
> >
> > > On 2/18/16 10:10 AM, Constantin S. Pan wrote:
> > > > On Wed, 17 Feb 2016 23:01:47 +0300
> > > > Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
> > > >
> > > >> My feedback is (Mac OS X 10.11.3)
> > > >>
> > > >> set gin_parallel_workers=2;
> > > >> create index message_body_idx on messages using
> > > >> gin(body_tsvector); LOG: worker process: parallel worker for
> > > >> PID 5689 (PID 6906) was terminated by signal 11: Segmentation
> > > >> fault
> > > >
> > > > Fixed this, try the new patch. The bug was in incorrect handling
> > > > of some GIN categories.
> > >
> > > Oleg, it looks like Constantin has updated to patch to address the
> > > issue you were seeing. Do you have time to retest and review?
> > >
> > > Thanks,
> >
> > Actually, there was some progress since. The patch is
> > attached.
> >
> > 1. Added another GUC parameter for changing the amount of
> > shared memory for parallel GIN workers.
> >
> > 2. Changed the way results are merged. It uses shared memory
> > message queue now.
> >
> > 3. Tested on some real data (GIN index on email message body
> > tsvectors). Here are the timings for different values of
> > 'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
> > machine. Seems 'gin_shared_mem' has no visible effect.
> >
> > wnum mem(MB) time(s)
> > 0 16 247
> > 1 16 256
> >
>
>
> It seems from you data that with 1 worker, you are always seeing
> slowdown, have you investigated the reason of same?
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com

That slowdown is expected. It slows down because with 1 worker it
has to transfer the results from the worker to the backend.

The backend just waits for the results from the workers and merges them
(in case wnum > 0). So the 1-worker configuration should never be used,
because it is as sequential as the 0-worker, but adds data transfer.

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-16 09:41:51 Re: Minor bug affecting ON CONFLICT lock wait log messages
Previous Message Fabien COELHO 2016-03-16 08:56:32 Re: pgbench stats per script & other stuff