| From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> | 
|---|---|
| To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> | 
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Parallel CREATE INDEX for GIN indexes | 
| Date: | 2024-05-09 14:28:36 | 
| Message-ID: | d759ee8f-1fc8-4ca2-b192-e28405e9a202@enterprisedb.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 5/2/24 20:22, Tomas Vondra wrote:
>> 
>>> For some of the opclasses it can regress (like the jsonb_path_ops). I
>>> don't think that's a major issue. Or more precisely, I'm not surprised
>>> by it. It'd be nice to be able to disable the parallel builds in these
>>> cases somehow, but I haven't thought about that.
>>
>> Do you know why it regresses?
>>
> 
> No, but one thing that stands out is that the index is much smaller than
> the other columns/opclasses, and the compression does not save much
> (only about 5% for both phases). So I assume it's the overhead of
> writing writing and reading a bunch of GB of data without really gaining
> much from doing that.
> 
I finally got to look into this regression, but I think I must have done
something wrong before because I can't reproduce it. This is the timings
I get now, if I rerun the benchmark:
     workers       trgm   tsvector     jsonb  jsonb (hash)
    -------------------------------------------------------
           0       1225        404       104            56
           1        772        180        57            60
           2        549        143        47            52
           3        426        127        43            50
           4        364        116        40            48
           5        323        111        38            46
           6        292        111        37            45
and the speedup, relative to serial build:
     workers       trgm   tsvector      jsonb  jsonb (hash)
    --------------------------------------------------------
           1        63%        45%        54%          108%
           2        45%        35%        45%           94%
           3        35%        31%        41%           89%
           4        30%        29%        38%           86%
           5        26%        28%        37%           83%
           6        24%        28%        35%           81%
So there's a small regression for the jsonb_path_ops opclass, but only
with one worker. After that, it gets a bit faster than serial build.
While not a great speedup, it's far better than the earlier results that
showed maybe 40% regression.
I don't know what I did wrong before - maybe I had a build with an extra
debug info or something like that? No idea why would that affect only
one of the opclasses. But this time I made doubly sure the results are
correct etc.
Anyway, I'm fairly happy with these results. I don't think it's
surprising there are cases where parallel build does not help much.
regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2024-05-09 14:49:37 | Re: First draft of PG 17 release notes | 
| Previous Message | Bruce Momjian | 2024-05-09 14:20:23 | Re: First draft of PG 17 release notes |