Re: Asymmetric partition-wise JOIN

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kohei KaiGai <kaigai(at)heterodb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2021-06-18 12:02:39
Message-ID: 1fd87bb742825fe147d9a114fc250895@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrey Lepikhov писал 2021-05-27 07:27:
> Next version of the patch.
> For searching any problems I forced this patch during 'make check'
> tests. Some bugs were found and fixed.

Hi.
I've tested this patch and haven't found issues, but I have some
comments.

src/backend/optimizer/path/joinrels.c:

1554
1555 /*
1556 * Build RelOptInfo on JOIN of each partition of the outer relation
and the inner
1557 * relation. Return List of such RelOptInfo's. Return NIL, if at
least one of
1558 * these JOINs are impossible to build.
1559 */
1560 static List *
1561 extract_asymmetric_partitionwise_subjoin(PlannerInfo *root,
1562
RelOptInfo *joinrel,
1563
AppendPath *append_path,
1564
RelOptInfo *inner_rel,
1565
JoinType jointype,
1566
JoinPathExtraData *extra)
1567 {
1568 List *result = NIL;
1569 ListCell *lc;
1570
1571 foreach (lc, append_path->subpaths)
1572 {
1573 Path *child_path = lfirst(lc);
1574 RelOptInfo *child_rel =
child_path->parent;
1575 Relids child_join_relids;
1576 Relids parent_relids;
1577 RelOptInfo *child_join_rel;
1578 SpecialJoinInfo *child_sjinfo;
1579 List *child_restrictlist;

Variable names - child_join_rel and child_join_relids seem to be
inconsistent with rest of the file (I see child_joinrelids in
try_partitionwise_join() and child_joinrel in try_partitionwise_join()
and get_matching_part_pairs()).

1595 child_join_rel = build_child_join_rel(root,
1596
child_rel,
1597
inner_rel,
1598
joinrel,
1599
child_restrictlist,
1600
child_sjinfo,
1601
jointype);
1602 if (!child_join_rel)
1603 {
1604 /*
1605 * If can't build JOIN between
inner relation and one of the outer
1606 * partitions - return immediately.
1607 */
1608 return NIL;
1609 }

When build_child_join_rel() can return NULL?
If I read code correctly, joinrel is created in the begining of
build_child_join_rel() with makeNode(), makeNode() wraps newNode() and
newNode() uses MemoryContextAllocZero()/MemoryContextAllocZeroAligned(),
which would error() on alloc() failure.

1637
1638 static bool
1639 is_asymmetric_join_capable(PlannerInfo *root,
1640 RelOptInfo
*outer_rel,
1641 RelOptInfo
*inner_rel,
1642 JoinType
jointype)
1643 {

Function misses a comment.

1656 /*
1657 * Don't allow asymmetric JOIN of two append subplans.
1658 * In the case of a parameterized NL join, a
reparameterization procedure will
1659 * lead to large memory allocations and a CPU consumption:
1660 * each reparameterize will induce subpath duplication,
creating new
1661 * ParamPathInfo instance and increasing of ppilist up to
number of partitions
1662 * in the inner. Also, if we have many partitions, each
bitmapset
1663 * variable will large and many leaks of such variable
(caused by relid
1664 * replacement) will highly increase memory consumption.
1665 * So, we deny such paths for now.
1666 */

Missing word:
each bitmapset variable will large => each bitmapset variable will be
large

1694 foreach (lc, outer_rel->pathlist)
1695 {
1696 AppendPath *append_path = lfirst(lc);
1697
1698 /*
1699 * MEMO: We assume this pathlist keeps at least one
AppendPath that
1700 * represents partitioned table-scan, symmetric or
asymmetric
1701 * partition-wise join. It is not correct right
now, however, a hook
1702 * on add_path() to give additional decision for
path removal allows
1703 * to retain this kind of AppendPath, regardless of
its cost.
1704 */
1705 if (IsA(append_path, AppendPath))

What hook do you refer to?

src/backend/optimizer/plan/setrefs.c:

282 /*
283 * Adjust RT indexes of AppendRelInfos and add to final
appendrels list.
284 * We assume the AppendRelInfos were built during planning
and don't need
285 * to be copied.
286 */
287 foreach(lc, root->append_rel_list)
288 {
289 AppendRelInfo *appinfo = lfirst_node(AppendRelInfo,
lc);
290 AppendRelInfo *newappinfo;
291
292 /* flat copy is enough since all valuable fields are
scalars */
293 newappinfo = (AppendRelInfo *)
palloc(sizeof(AppendRelInfo));
294 memcpy(newappinfo, appinfo, sizeof(AppendRelInfo));

You've changed function to copy appinfo, so now comment is incorrect.

src/backend/optimizer/util/appendinfo.c:
588 /* Construct relids set for the immediate parent of the
given child. */
589 normal_relids = bms_copy(child_relids);
590 for (cnt = 0; cnt < nappinfos; cnt++)
591 {
592 AppendRelInfo *appinfo = appinfos[cnt];
593
594 parent_relids = bms_add_member(parent_relids,
appinfo->parent_relid);
595 normal_relids = bms_del_member(normal_relids,
appinfo->child_relid);
596 }
597 parent_relids = bms_union(parent_relids, normal_relids);

Do I understand correctly that now parent_relids also contains relids of
relations from 'global' inner relation, which we join to childs?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-06-18 12:54:27 Re: pgbench bug candidate: negative "initial connection time"
Previous Message Andrey Borodin 2021-06-18 12:02:15 Supply restore_command to pg_rewind via CLI argument