Re: Parallel CREATE INDEX for BRIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for BRIN indexes
Date: 2023-11-30 17:47:39
Message-ID: CAEze2WiMsPZg=xkvSF_jt4=69k6K7gz5B8V2wY3gCGZ+1BzCbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 30 Nov 2023 at 01:10, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 11/29/23 23:59, Matthias van de Meent wrote:
>> On Wed, 29 Nov 2023 at 21:56, Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>
>>> On 11/29/23 21:30, Matthias van de Meent wrote:
>>>> On Wed, 29 Nov 2023 at 18:55, Tomas Vondra
>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>> I did try to measure how much it actually saves, but none of the tests I
>>>>> did actually found measurable improvement. So I'm tempted to just not
>>>>> include this part, and accept that we may deserialize some of the tuples
>>>>> unnecessarily.
>>>>>
>>>>> Did you actually observe measurable improvements in some cases?
>>>>
>>>> The improvements would mostly stem from brin indexes with multiple
>>>> (potentially compressed) by-ref types, as they go through more complex
>>>> and expensive code to deserialize, requiring separate palloc() and
>>>> memcpy() calls each.
>>>> For single-column and by-value types the improvements are expected to
>>>> be negligible, because there is no meaningful difference between
>>>> copying a single by-ref value and copying its container; the
>>>> additional work done for each tuple is marginal for those.
>>>>
>>>> For an 8-column BRIN index ((sha256((id)::text::bytea)::text),
>>>> (sha256((id+1)::text::bytea)::text),
>>>> (sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I
>>>> measured a difference of 10x less time spent in the main loop of
>>>> _brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block
>>>> ranges. It's not a lot, but worth at least something, I guess?
>>>>
>>>
>>> It is something, but I can't really convince myself it's worth the extra
>>> code complexity. It's a somewhat extreme example, and the parallelism
>>> certainly saves much more than this.
>>
>> True. For this, I usually keep in mind that the docs on multi-column
>> indexes still indicate to use 1 N-column brin index over N 1-column
>> brin indexes (assuming the same storage parameters), so multi-column
>> BRIN indexes should not be considered to be uncommon:
>>
>> "The only reason to have multiple BRIN indexes instead of one
>> multicolumn BRIN index on a single table is to have a different
>> pages_per_range storage parameter."
>>
>> Note that most of the time in my example index is spent in creating
>> the actual tuples due to the use of hashing for data generation; for
>> index or plain to-text formatting the improvement is much more
>> pronounced: If I use an 8-column index (id::text, id, ...), index
>> creation takes ~500ms with 4+ workers. Of this, deforming takes some
>> 20ms, though when skipping the deforming step (i.e.,with my patch) it
>> takes ~3.5ms. That's a 3% shaved off the build time when the index
>> shape is beneficial.
>>
>
> That's all true, and while 3.5% is not something to ignore, my POV is
> that the parallelism speeds this up from ~2000ms to ~500ms. Yes, it
> would be great to shave off the extra 1% (relative to the original
> duration). But I don't have a great idea how to do code that in a way
> that is readable, and I don't want to stall the patch indefinitely
> because of a comparatively small improvement.
>
> Therefore I propose we get the simpler code committed and leave this as
> a future improvement.

That's fine with me, it is one reason why I kept it as a separate patch file.

>>>> The attached patch fixes the issue that you called out .
>>>> It also further updates _brin_end_parallel: the final 'write empty
>>>> tuples' loop is never hit and is thus removed, because if there were
>>>> any tuples in the spool we'd have filled the empty ranges at the end
>>>> of the main loop, and if there were no tuples in the spool then the
>>>> memtuple would still be at its original initialized value of 0 thus
>>>> resulting in a constant false condition. I also updated some comments.
>>>>
>>>
>>> Ah, right. I'll take a look tomorrow, but I guess I didn't realize we
>>> insert the empty ranges in the main loop, because we're already looking
>>> at the *next* summary.
>>
>> Yes, merging adds some significant complexity here. I don't think we
>> can easily get around that though...
>>
>>> But I think the idea was to insert empty ranges if there's a chunk of
>>> empty ranges at the end of the table, after the last tuple the index
>>> build reads. But I'm not sure that can actually happen ...
>>
>> This would be trivial to construct with partial indexes; e.g. WHERE
>> (my_pk IS NULL) would consist of exclusively empty ranges.
>> I don't see a lot of value in partial BRIN indexes, but I may be
>> overlooking something.
>>
>
> Oh, I haven't even thought about partial BRIN indexes! I'm sure for
> those it's even more important to actually fill-in the empty ranges,
> otherwise we end up scanning the whole supposedly filtered-out part of
> the table. I'll do some testing with that.

I just ran some more tests in less favorable environments, and it
looks like I hit a bug:

% SET max_parallel_workers = 0;
% CREATE INDEX ... USING brin (...);
ERROR: cannot update tuples during a parallel operation

Fix attached in 0002.
In 0003 I add the mentioned backfilling of empty ranges at the end of
the table. I added it for both normal and parallel index builds, as
normal builds apparently also didn't yet have this yet.

Kind regards,

Matthias van de Meent

Attachment Content-Type Size
v7-0002-BRIN-Exit-parallel-mode-when-not-starting-paralle.patch application/octet-stream 1.1 KB
v7-0003-BRIN-Backfill-empty-ranges.patch application/octet-stream 4.5 KB
v7-0001-Allow-BRIN-to-build-its-index-in-parallel.patch application/octet-stream 51.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tristan Partin 2023-11-30 18:44:33 Re: Refactoring backend fork+exec code
Previous Message Robert Haas 2023-11-30 17:16:01 Re: [HACKERS] Changing references of password encryption to hashing