Re: Parallel CREATE INDEX for GIN indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-05-09 14:28:36
Message-ID: d759ee8f-1fc8-4ca2-b192-e28405e9a202@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/2/24 20:22, Tomas Vondra wrote:
>>
>>> For some of the opclasses it can regress (like the jsonb_path_ops). I
>>> don't think that's a major issue. Or more precisely, I'm not surprised
>>> by it. It'd be nice to be able to disable the parallel builds in these
>>> cases somehow, but I haven't thought about that.
>>
>> Do you know why it regresses?
>>
>
> No, but one thing that stands out is that the index is much smaller than
> the other columns/opclasses, and the compression does not save much
> (only about 5% for both phases). So I assume it's the overhead of
> writing writing and reading a bunch of GB of data without really gaining
> much from doing that.
>

I finally got to look into this regression, but I think I must have done
something wrong before because I can't reproduce it. This is the timings
I get now, if I rerun the benchmark:

workers trgm tsvector jsonb jsonb (hash)
-------------------------------------------------------
0 1225 404 104 56
1 772 180 57 60
2 549 143 47 52
3 426 127 43 50
4 364 116 40 48
5 323 111 38 46
6 292 111 37 45

and the speedup, relative to serial build:

workers trgm tsvector jsonb jsonb (hash)
--------------------------------------------------------
1 63% 45% 54% 108%
2 45% 35% 45% 94%
3 35% 31% 41% 89%
4 30% 29% 38% 86%
5 26% 28% 37% 83%
6 24% 28% 35% 81%

So there's a small regression for the jsonb_path_ops opclass, but only
with one worker. After that, it gets a bit faster than serial build.
While not a great speedup, it's far better than the earlier results that
showed maybe 40% regression.

I don't know what I did wrong before - maybe I had a build with an extra
debug info or something like that? No idea why would that affect only
one of the opclasses. But this time I made doubly sure the results are
correct etc.

Anyway, I'm fairly happy with these results. I don't think it's
surprising there are cases where parallel build does not help much.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-05-09 14:49:37 Re: First draft of PG 17 release notes
Previous Message Bruce Momjian 2024-05-09 14:20:23 Re: First draft of PG 17 release notes