Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-16 11:19:49
Message-ID: CAFjFpRcCH0Xm+Ae6jArpN2ychoX4NWseMji6o81uUM7Sbv6+_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2017 at 7:10 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> So I am looking at this part of 0008:
>
> + /*
> + * Do not copy parent_rinfo and child_rinfos because 1. they create a
> + * circular dependency between child and parent RestrictInfo 2. dropping
> + * those links just means that we loose some memory
> optimizations. 3. There
> + * is a possibility that the child and parent RestrictInfots
> themselves may
> + * have got copied and thus the old links may no longer be valid. The
> + * caller may set up those links itself, if needed.
> + */
>
> I don't think that it's very clear whether or not this is safe. I
> experimented with making _copyRestrictInfo PANIC,

I am not able to understand how to make _copyRestrictInfo PANIC. Can
you please share the patch or compiler flags or settings? I will look
at the case below once I have that.

> which,
> interestingly, does not affect the core regression tests at all, but
> does trip on this bit from the postgres_fdw tests:
>
> -- subquery using stable function (can't be sent to remote)
> PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN
> (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) =
> '1970-01-17'::date) ORDER BY c1;
> EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
>
> I'm not sure why this particular case is affected when so many others
> are not, and the comment doesn't help me very much in figuring it out.

>
> Why do we need this cache in the RestrictInfo, anyway? Aside from the
> comment above, I looked at the comment in the RestrictInfo struct, and
> I looked at the comment in build_child_restrictinfo, and I looked at
> the comment in build_child_clauses, and I looked at the place where
> build_child_clauses is called in set_append_rel_size, and none of
> those places explain why we need this cache. I would assume we'd need
> a separate translation of the RestrictInfo for every separate
> child-join, so how does the cache help?
>
> Maybe the answer is that build_child_clauses() is also called from
> try_partition_wise_join() and add_paths_to_child_joinrel(), and those
> three call sights all end up producing the same set of translated
> RestrictInfos. But if that's the case, somehow it seems like we ought
> to be producing these in one place where we can get convenient access
> to them from each child join, rather than having to search through
> this cache to find it.

I had explained this briefly in [1]. But forgot to add it as comments.

There are multiple means by which a RestrictInfo gets translated
multiple times for the same child.

1. Consider a join A J (B J C on B.b = C.c) ON (A.a = B.b) the clause
A.a = B.b is part of the restrictlist for orders (AB)C and A(BC) (and
(AC)B depending upon the type of join). So, the clause gets translated
twice once for each of those join orders.

2. In the above example, A.a = B.b is part of joininfo list (if it
happens to be an outer join) of A, B and BC. So, it should be part of
joininfo list of children of A, B and BC. But the RestrictInfo which
is part of joininfo of B and BC looks exactly same.

Similarly param_info->clauses get translated multiple times each time
with a different set of required_outer.

In order to avoid multiple translations and spend memory in each
translation it's better to cache the result and retrieve it.

Updated prologue of build_child_restrictinfo with this explanation.

> It's a pretty inefficient cache: it takes O(n)
> time to search it, I think, where n is the number of partitions.

Above explanation shows that it's worse than that.

> And
> you do O(n) searches. So it's an O(n^2) algorithm, which is a little
> unfortunate. Can't we affix the translated RestrictInfos someplace
> where they can be found more efficiently?

Would a hash similar to root->join_rel_hash help? That will reduce the
searches to O(1). I have added a separate patch (0008) for using
hashtable to store child restrictinfos. If that patch looks good to
you, I will merge it with the main patch supporting partition-wise
join.

>
> Yet another thing that the comments don't explain is why the existing
> adjust_appendrel_attrs call needs to be replaced with
> build_child_clauses.

The call to adjust_appendrel_attrs() used to translate joininfo for
child has been replaced by build_child_clauses to take advantage of
the RestrictInfo cache. As explained above a clause which is part of
joininfo of a child, is also part of joininfo of the child-join in
which it participates except the child-joins covering the clause. So,
a cached copy of that RestrictInfo helps. I have added a patch (0010)
to use build_child_clause() only for partitioned tables and use
adjust_appendrel_attrs() for non-partitioned case. If this change
looks good, I will merge it with the main patch.
>
> So I feel, overall, that the point of all of this is not explained well at all.

Sorry for that. I should have added the explanation in the comments.
Corrected this in this set of patches.

[1] https://www.postgresql.org/message-id/CAFjFpRe66z%2Bw9%2BdnAkWGiaB1CU2CUQsLGsqzHzYBoA%3DKJFf%2BPQ%40mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v5.zip application/zip 91.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas 'ads' Scherbaum 2017-03-16 11:33:41 Re: Defaulting psql to ON_ERROR_ROLLBACK=interactive
Previous Message Amit Kapila 2017-03-16 11:08:08 Re: Patch to improve performance of replay of AccessExclusiveLock