Re: New CORRESPONDING clause design

From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New CORRESPONDING clause design
Date: 2017-03-22 13:34:53
Message-ID: CALAY4q83=9HfTz42mY_zhaFTKmNxC_sdDSeev2BYXzUznXWPfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 18, 2017 at 7:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I have not any objection - I'll mark this patch as ready for commiter
>
>
> I'm a little disturbed by the fact that determineMatchingColumns()
> is called twice, and more disturbed by the fact that it looks to be
> O(N^2). This could be really slow with a lot of columns, couldn't it?
>
> I also think there should be some comments about exactly what matching
> semantics we're enforcing. The SQL standard says
>
> a) If CORRESPONDING is specified, then:
> i) Within the columns of T1, equivalent <column name>s shall
> not be specified more than once and within the columns of
> T2, equivalent <column name>s shall not be specified more
> than once.
>
> That seems unreasonably stringent to me; it ought to be sufficient to
> forbid duplicates of the names listed in CORRESPONDING, or the common
> column names if there's no BY list. But whichever restriction you prefer,
> this code seems to be failing to check it --- I certainly don't see any
> new error message about "column name "foo" appears more than once".
>
> determineMatchingColumns can be reduce to one call but in order
to enforce the above semantics I leave it as it is by thinking
the performance penalty can be compromise with sql standard
conformance.

> What I think actually is needed is some targeted testing showing
> non-interference with nearby features. As an example, CORRESPONDING
> can't safely be implemented by just replacing the tlists of the
> input sub-selects with shortened versions, because that would break
> situations such as DISTINCT in the sub-selects. I think it'd be a
> good idea to have a test along the lines of
>
> SELECT DISTINCT x, y FROM ...
> UNION ALL CORRESPONDING
> SELECT DISTINCT x, z FROM ...
>
> with values chosen to show that the DISTINCT operators did operate
> on x/y and x/z, not just x alone.
>

Even if without replacing the tlists of the top-level query
which is leftmostQuery tagretlist with shortened versioned
corresponding target list DISTINCT operators did’t operate
on x/z on left query because top-level Query has a dummy
targetlist that exists mainly to show the union'd datatype
of each output column, and it carries any sortClause, limitClause,
etc needed for the output of the entire operation

reference https://www.postgresql.org/message-id/4939.970506269@sss.pgh.pa.us

if I don’t miss something that means in current implementation we cannot
use sortClause limitClause with target list outside final out put or on
individual query and I think planning doing as a whole will be good

Regrades

Surafel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-03-22 13:39:20 Re: Partitioned tables and relfilenode
Previous Message Ashutosh Sharma 2017-03-22 13:32:11 Re: Page Scan Mode in Hash Index