From: | Jim Nasby <jim(dot)nasby(at)openscg(dot)com> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improve OR conditions on joined columns (common star schema problem) |
Date: | 2017-03-19 00:29:34 |
Message-ID: | 3605fcb2-6622-6520-b678-83171fe4c9d8@openscg.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/16/17 11:54 AM, David Steele wrote:
> On 2/14/17 4:03 PM, Tom Lane wrote:
>> Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
>>> On 2/14/17 1:18 PM, Tom Lane wrote:
>>>> One point that could use further review is whether the de-duplication
>>>> algorithm is actually correct. I'm only about 95% convinced by the
>>>> argument I wrote in planunionor.c's header comment.
>>
>>> I'll put some thought into it and see if I can find any holes. Are you
>>> only worried about the removal of "useless" rels or is there more?
>>
>> Well, the key point is whether it's really OK to de-dup on the basis
>> of only the CTIDs that are not eliminated in any UNION arm. I was
>> feeling fairly good about that until I thought of the full-join-to-
>> left-join-to-no-join conversion issue mentioned in the comment.
>> Now I'm wondering if there are other holes; or maybe I'm wrong about
>> that one and it's not necessary to be afraid of full joins.
>
> This patch applies cleanly (with offsets) and compiles at cccbdde.
>
> Jim, have you had time to think about this? Any insights?
Having thought about it, I share Tom's concerns. And now I'm worried
about what happens if there are multiple separate OR clauses. I guess
those would be handled by separate UNIONs?
I'm also finding it a bit hard to follow the comment Tom mentioned. I'm
pretty sure I understand what's going on until this part:
> The identical proof can be expected to apply
> + * in other arms, except in an arm that references that rel in its version
> + * of the OR clause. But in such an arm, we have effectively added a
> + * restriction clause to what is known in other arms, which means that the
> + * set of rows output by that rel can't increase compared to other arms.
AIUI, this is describing a case something like this:
SELECT child.blah FROM child LEFT JOIN parent USING(parent_id)
WHERE child.foo AND (child.baz=1 or child.baz=2)
given that parent.parent_id is unique. Except for these concerns, there
would need to be a complex OR somewhere in here that sometimes
referenced parent and sometimes didn't, such as
WHERE child.foo AND (child.baz=1 OR parent.foo=3)
But I'm not following the logic here (very possibly because I'm wrong
about what I said above):
> + * Therefore the situation in such an arm must be that including the rel
> + * could result in either zero or one output row, rather than exactly one
> + * output row as in other arms. So we still don't need to consider it for
> + * de-duplication.
I'm definitely not certain about the rest of it.
Tom, could you expand the description some, especially with some examples?
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG http://OpenSCG.com
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2017-03-19 00:51:42 | Re: PinBuffer() no longer makes use of strategy |
Previous Message | Tom Lane | 2017-03-18 22:57:38 | Re: Composite IS NULL again, this time with plpgsql |