Re: New CORRESPONDING clause design

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New CORRESPONDING clause design
Date: 2017-03-18 17:16:19
Message-ID: CAFj8pRA0A4p15YLM80LdRdg7WNrY+VNo-N1DzxR6QajYVRLH9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-03-18 17:50 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I have not any objection - I'll mark this patch as ready for commiter
>
> I took a quick look through this and noted that it fails to touch
> ruleutils.c, which means that dumping of views containing CORRESPONDING
> certainly doesn't work.
>

it my mistake - this bug I should to see :(

>
> Also, the changes in parser/analyze.c seem rather massive and
> correspondingly hard to review. Is it possible to rearrange the
> patch to reduce the size of that diff? If you can avoid moving
> or reindenting existing code, that'd help.
>
> The code in that area seems rather confused, too. For instance,
> I'm not sure exactly what orderCorrespondingList() is good for,
> but it certainly doesn't look to be doing anything that either its
> name or its header comment (or the comments at the call sites) would
> suggest. Its input and output tlists are always in the same order.
>
> I'm a little disturbed by the fact that determineMatchingColumns()
> is called twice, and more disturbed by the fact that it looks to be
> O(N^2). This could be really slow with a lot of columns, couldn't it?
>
> I also think there should be some comments about exactly what matching
> semantics we're enforcing. The SQL standard says
>
> a) If CORRESPONDING is specified, then:
> i) Within the columns of T1, equivalent <column name>s shall
> not be specified more than once and within the columns of
> T2, equivalent <column name>s shall not be specified more
> than once.
>
> That seems unreasonably stringent to me; it ought to be sufficient to
> forbid duplicates of the names listed in CORRESPONDING, or the common
> column names if there's no BY list. But whichever restriction you prefer,
> this code seems to be failing to check it --- I certainly don't see any
> new error message about "column name "foo" appears more than once".
>
> I don't think you want to be leaving behind debugging cruft like this:
> + elog(DEBUG4, "%s", ltle->resname);
>
> I'm not impressed by using A_Const for the members of the CORRESPONDING
> name list. That's not a clever solution, that's a confusing kluge,
> because it's a complete violation of the meaning of A_Const. Elsewhere
> we just use lists of String for name lists, and that seems sufficient
> here. Personally I'd just use the existing columnList production rather
> than rolling your own.
>

The reason was attach a location to name for more descriptive error
message.

> The comments in parsenodes.h about the new fields seem rather confused,
> in fact I think they're probably backwards.
>
> Also, I thought the test cases were excessively uninspired and repetitive.
> This, for example, seems like about two tests too many:
>
> +SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a) SELECT 4 a, 5 b, 6 c;
> + a
> +---
> + 1
> + 4
> +(2 rows)
> +
> +SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 a, 5 b, 6 c;
> + b
> +---
> + 2
> + 5
> +(2 rows)
> +
> +SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c) SELECT 4 a, 5 b, 6 c;
> + c
> +---
> + 3
> + 6
> +(2 rows)
>
> without even considering the fact that these are pretty duplicative of
> tests around them. And some of the added tests seem to be just testing
> basic UNION functionality, which we already have tests for. I fail to
> see the point of adding any of these:
>
> +SELECT 1 AS two UNION CORRESPONDING SELECT 2 two;
> + two
> +-----
> + 1
> + 2
> +(2 rows)
> +
> +SELECT 1 AS one UNION CORRESPONDING SELECT 1 one;
> + one
> +-----
> + 1
> +(1 row)
> +
> +SELECT 1 AS two UNION ALL CORRESPONDING SELECT 2 two;
> + two
> +-----
> + 1
> + 2
> +(2 rows)
> +
> +SELECT 1 AS two UNION ALL CORRESPONDING SELECT 1 two;
> + two
> +-----
> + 1
> + 1
> +(2 rows)
>
> What I think actually is needed is some targeted testing showing
> non-interference with nearby features. As an example, CORRESPONDING
> can't safely be implemented by just replacing the tlists of the
> input sub-selects with shortened versions, because that would break
> situations such as DISTINCT in the sub-selects. I think it'd be a
> good idea to have a test along the lines of
>
> SELECT DISTINCT x, y FROM ...
> UNION ALL CORRESPONDING
> SELECT DISTINCT x, z FROM ...
>
> with values chosen to show that the DISTINCT operators did operate
> on x/y and x/z, not just x alone.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-03-18 17:18:53 pgsql: Improve pg_dump regression tests and code coverage
Previous Message Robert Haas 2017-03-18 17:13:44 Re: createlang/droplang deprecated