Re: [WIP] speeding up GIN build with parallel workers

From: "Constantin S(dot) Pan" <kvapen(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] speeding up GIN build with parallel workers
Date: 2016-03-16 00:11:15
Message-ID: 20160316031115.5856920c@monster
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 14 Mar 2016 08:42:26 -0400
David Steele <david(at)pgmasters(dot)net> wrote:

> On 2/18/16 10:10 AM, Constantin S. Pan wrote:
> > On Wed, 17 Feb 2016 23:01:47 +0300
> > Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
> >
> >> My feedback is (Mac OS X 10.11.3)
> >>
> >> set gin_parallel_workers=2;
> >> create index message_body_idx on messages using gin(body_tsvector);
> >> LOG: worker process: parallel worker for PID 5689 (PID 6906) was
> >> terminated by signal 11: Segmentation fault
> >
> > Fixed this, try the new patch. The bug was in incorrect handling
> > of some GIN categories.
>
> Oleg, it looks like Constantin has updated to patch to address the
> issue you were seeing. Do you have time to retest and review?
>
> Thanks,

Actually, there was some progress since. The patch is
attached.

1. Added another GUC parameter for changing the amount of
shared memory for parallel GIN workers.

2. Changed the way results are merged. It uses shared memory
message queue now.

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
0 16 247
1 16 256
2 16 126
4 16 89
0 32 247
1 32 270
2 32 123
4 32 92
0 64 254
1 64 272
2 64 123
4 64 88
0 128 250
1 128 263
2 128 126
4 128 85
0 256 247
1 256 269
2 256 130
4 256 88
0 512 257
1 512 275
2 512 129
4 512 92
0 1024 255
1 1024 273
2 1024 130
4 1024 90

On Wed, 17 Feb 2016 12:26:05 -0800
Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Wed, Feb 17, 2016 at 7:55 AM, Constantin S. Pan <kvapen(at)gmail(dot)com>
> wrote:
> > 4. Hit the 8x speedup limit. Made some analysis of the reasons (see
> > the attached plot or the data file).
>
> Did you actually compare this to the master branch? I wouldn't like to
> assume that the one worker case was equivalent. Obviously that's the
> really interesting baseline.

Compared with the master branch. The case of 0 workers is
indeed equivalent to the master branch.

Regards,
Constantin

Attachment Content-Type Size
pgin-5.patch text/x-patch 20.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-16 00:17:07 Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types
Previous Message Vik Fearing 2016-03-16 00:08:21 Re: Idle In Transaction Session Timeout, revived