Re: [PoC] Reducing planning time when tables have many partitions

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Yuya Watari <watari(dot)yuya(at)gmail(dot)com>
Cc: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PoC] Reducing planning time when tables have many partitions
Date: 2023-12-16 15:41:27
Message-ID: f70240f0-d2d0-4d25-b326-db798ba22acb@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!
On 13.12.2023 09:21, Yuya Watari wrote:
> Hello Alena, Andrei, and all,
>
> I am sorry for my late response. I found that the current patches do
> not apply to the master, so I have rebased those patches. I have
> attached v22. For this later discussion, I separated the rebasing and
> bug fixing that Alena did in v21 into separate commits, v22-0003 and
> v22-0004. I will merge these commits after the discussion.
>
> 1. v22-0003 (solved_conflict_with_self_join_removal.txt)
Thank you!
> Thank you for your rebase. Looking at your rebasing patch, I thought
> we could do this more simply. Your patch deletes (more precisely, sets
> to null) non-redundant members from the root->eq_sources list and
> re-adds them to the same list. However, this approach seems a little
> waste of memory. Instead, we can update
> EquivalenceClass->ec_source_indexes directly. Then, we can reuse the
> members in root->eq_sources and don't need to extend root->eq_sources.
> I did this in v22-0003. What do you think of this approach?
I thought about this earlier and was worried that the index links of the
equivalence classes might not be referenced correctly for outer joins,
so I decided to just overwrite them and reset the previous ones.
> The main concern with this idea is that it does not fix
> RangeTblEntry->eclass_source_indexes. The current code works fine even
> if we don't fix the index because get_ec_source_indexes() always does
> bms_intersect() for eclass_source_indexes and ec_source_indexes. If we
> guaranteed this behavior of doing bms_intersect, then simply modifying
> ec_source_indexes would be fine. Fortunately, such a guarantee is not
> so difficult.
>
> And your patch removes the following assertion code from the previous
> patch. May I ask why you removed this code? I think this assertion is
> helpful for sanity checks. Of course, I know that this kind of
> assertion will slow down regression tests or assert-enabled builds.
> So, we may have to discuss which assertions to keep and which to
> discard.
>
> =====
> -#ifdef USE_ASSERT_CHECKING
> - /* verify the results look sane */
> - i = -1;
> - while ((i = bms_next_member(rel_esis, i)) >= 0)
> - {
> - RestrictInfo *rinfo = list_nth_node(RestrictInfo, root->eq_sources,
> - i);
> -
> - Assert(bms_overlap(relids, rinfo->clause_relids));
> - }
> -#endif
> =====
this is due to the fact that I explained before: we zeroed the values
indicated by the indexes,
then this check is not correct either - since the zeroed value indicated
by the index is correct.
That's why I removed this check.
> Finally, your patch changes the name of the following function. I
> understand the need for this change, but it has nothing to do with our
> patches, so we should not include it and discuss it in another thread.
>
> =====
> -update_eclasses(EquivalenceClass *ec, int from, int to)
> +update_eclass(PlannerInfo *root, EquivalenceClass *ec, int from, int to)
> =====
I agree.
> 2. v22-0004 (bug_related_to_atomic_function.txt)
>
> Thank you for fixing the bug. As I wrote in the previous mail:
>
> On Wed, Nov 22, 2023 at 2:32 PM Yuya Watari<watari(dot)yuya(at)gmail(dot)com> wrote:
>> On Mon, Nov 20, 2023 at 1:45 PM Andrei Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>> During the work on committing the SJE feature [1], Alexander Korotkov
>>> pointed out the silver lining in this work [2]: he proposed that we
>>> shouldn't remove RelOptInfo from simple_rel_array at all but replace it
>>> with an 'Alias', which will refer the kept relation. It can simplify
>>> further optimizations on removing redundant parts of the query.
>> Thank you for sharing this information. I think the idea suggested by
>> Alexander Korotkov is also helpful for our patch. As mentioned above,
>> the indexes are in RangeTblEntry in the current implementation.
>> However, I think RangeTblEntry is not the best place to store them. An
>> 'alias' relids may help solve this and simplify fixing the above bug.
>> I will try this approach soon.
> I think that the best way to solve this issue is to move these indexes
> from RangeTblEntry to RelOptInfo. Since they are related to planning
> time, they should be in RelOptInfo. The reason why I put these indexes
> in RangeTblEntry is because some RelOptInfos can be null and we cannot
> store the indexes. This problem is similar to an issue regarding
> 'varno 0' Vars. I hope an alias RelOptInfo would help solve this
> issue. I have attached the current proof of concept I am considering
> as poc-alias-reloptinfo.txt. To test this patch, please follow the
> procedure below.
>
> 1. Apply all *.patch files,
> 2. Apply Alexander Korotkov's alias_relids.patch [1], and
> 3. Apply poc-alias-reloptinfo.txt, which is attached to this email.
>
> My patch creates a dummy (or an alias) RelOptInfo to store indexes if
> the corresponding RelOptInfo is null. The following is the core change
> in my patch.
>
> =====
> @@ -627,9 +627,19 @@ add_eq_source(PlannerInfo *root, EquivalenceClass
> *ec, RestrictInfo *rinfo)
> i = -1;
> while ((i = bms_next_member(rinfo->clause_relids, i)) >= 0)
> {
> - RangeTblEntry *rte = root->simple_rte_array[i];
> + RelOptInfo *rel = root->simple_rel_array[i];
>
> - rte->eclass_source_indexes = bms_add_member(rte->eclass_source_indexes,
> + /*
> + * If the corresponding RelOptInfo does not exist, we create a 'dummy'
> + * RelOptInfo for storing EquivalenceClass indexes.
> + */
> + if (rel == NULL)
> + {
> + rel = root->simple_rel_array[i] = makeNode(RelOptInfo);
> + rel->eclass_source_indexes = NULL;
> + rel->eclass_derive_indexes = NULL;
> + }
> + rel->eclass_source_indexes = bms_add_member(rel->eclass_source_indexes,
> source_idx);
> }
> =====
>
> At this point, I'm not sure if this approach is correct. It seems to
> pass the regression tests, but we should doubt its correctness. I will
> continue to experiment with this idea.
>
> [1]https://www.postgresql.org/message-id/CAPpHfdseB13zJJPZuBORevRnZ0vcFyUaaJeSGfAysX7S5er%2BEQ%40mail.gmail.com
>
Yes, I also thought in this direction before and I agree that this is
the best way to develop the patch.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-12-16 16:42:28 Re: useless LIMIT_OPTION_DEFAULT
Previous Message jian he 2023-12-16 13:03:00 Re: Adding OLD/NEW support to RETURNING