Re: Progress report of CREATE INDEX for nested partitioned tables

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Progress report of CREATE INDEX for nested partitioned tables
Date: 2023-01-18 00:04:06
Message-ID: df9a4294-41b1-d77c-03bc-1eeff07a68c8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/17/23 21:59, Justin Pryzby wrote:
> On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:
>> On 1/9/23 09:44, Ilya Gladyshev wrote:
>>> On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
>>>> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
>>>>> ...
>>>>>
>>>>> @committers: Is it okay to add nparts_done to IndexStmt ?
>>>>
>>>> Any hint about this ?
>>>>
>>
>> AFAIK fields added at the end of a struct is seen as acceptable from the
>> ABI point of view. It's not risk-free, but we did that multiple times
>> when fixing bugs, IIRC.
>
> My question isn't whether it's okay to add a field at the end of a
> struct in general, but rather whether it's acceptable to add this field
> at the end of this struct, and not because it's unsafe to do in a minor
> release, but whether someone is going to say that it's an abuse of the
> data structure.
>

Ah, you mean whether it's the right place for the parameter?

I don't think it is, really. IndexStmt is meant to be a description of
the CREATE INDEX statement, not something that includes info about how
it's processed. But it's the only struct we pass to the DefineIndex for
child indexes, so the only alternatives I can think of is a global
variable and the (existing) param array field.

Nevertheless, ABI compatibility is still relevant for backbranches.

>> Do we actually need the new parts_done field? I mean, we already do
>> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
>> st_progress_param array. Can't we simply read it from there? Then we
>> would not have ABI issues with the new field added to IndexStmt.
>
> Good idea to try.
>

OK

>> As for the earlier discussion about the "correct" behavior for leaf vs.
>> non-leaf partitions and whether to calculate partitions in advance:
>>
>> * I agree it's desirable to count partitions in advance, instead of
>> adding incrementally. The view is meant to provide "overview" of the
>> CREATE INDEX progress, and imagine you get
>>
>> partitions_total partitions_done
>> 10 9
>>
>> so that you believe you're ~90% done. But then it jumps to the next
>> child and now you get
>>
>> partitions_total partitions_done
>> 20 10
>>
>> which makes the view a bit useless for it's primary purpose, IMHO.
>
> To be clear, that's the current, buggy behavior, and this thread is
> about fixing it. The proposed patches all ought to avoid that.
>
> But the bug isn't caused by not "calculating partitions in advance".
> Rather, the issue is that currently, the "total" is overwritten while
> recursing.
>

You're right the issue us about overwriting the total - not sure what I
was thinking about when writing this. I guess I got distracted by the
discussion about "preliminary planning" etc. Sorry for the confusion.

> That's a separate question from whether indirect partitions are counted
> or not.
>
>> I wonder if we could improve this to track the size of partitions for
>> total/done? That'd make leaf/non-leaf distinction unnecessary, because
>> non-leaf partitions have size 0.
>
> Maybe, but it's out of scope for this patch.
>

+1, it was just an idea for future.

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-01-18 00:05:52 Re: constify arguments of copy_file() and copydir()
Previous Message Jacob Champion 2023-01-17 23:18:26 Re: Can we let extensions change their dumped catalog schemas?