Re: New CORRESPONDING clause design

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New CORRESPONDING clause design
Date: 2017-03-18 16:50:40
Message-ID: 3424.1489855840@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I have not any objection - I'll mark this patch as ready for commiter

I took a quick look through this and noted that it fails to touch
ruleutils.c, which means that dumping of views containing CORRESPONDING
certainly doesn't work.

Also, the changes in parser/analyze.c seem rather massive and
correspondingly hard to review. Is it possible to rearrange the
patch to reduce the size of that diff? If you can avoid moving
or reindenting existing code, that'd help.

The code in that area seems rather confused, too. For instance,
I'm not sure exactly what orderCorrespondingList() is good for,
but it certainly doesn't look to be doing anything that either its
name or its header comment (or the comments at the call sites) would
suggest. Its input and output tlists are always in the same order.

I'm a little disturbed by the fact that determineMatchingColumns()
is called twice, and more disturbed by the fact that it looks to be
O(N^2). This could be really slow with a lot of columns, couldn't it?

I also think there should be some comments about exactly what matching
semantics we're enforcing. The SQL standard says

a) If CORRESPONDING is specified, then:
i) Within the columns of T1, equivalent <column name>s shall
not be specified more than once and within the columns of
T2, equivalent <column name>s shall not be specified more
than once.

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list. But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".

I don't think you want to be leaving behind debugging cruft like this:
+ elog(DEBUG4, "%s", ltle->resname);

I'm not impressed by using A_Const for the members of the CORRESPONDING
name list. That's not a clever solution, that's a confusing kluge,
because it's a complete violation of the meaning of A_Const. Elsewhere
we just use lists of String for name lists, and that seems sufficient
here. Personally I'd just use the existing columnList production rather
than rolling your own.

The comments in parsenodes.h about the new fields seem rather confused,
in fact I think they're probably backwards.

Also, I thought the test cases were excessively uninspired and repetitive.
This, for example, seems like about two tests too many:

+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a) SELECT 4 a, 5 b, 6 c;
+ a
+---
+ 1
+ 4
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 a, 5 b, 6 c;
+ b
+---
+ 2
+ 5
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c) SELECT 4 a, 5 b, 6 c;
+ c
+---
+ 3
+ 6
+(2 rows)

without even considering the fact that these are pretty duplicative of
tests around them. And some of the added tests seem to be just testing
basic UNION functionality, which we already have tests for. I fail to
see the point of adding any of these:

+SELECT 1 AS two UNION CORRESPONDING SELECT 2 two;
+ two
+-----
+ 1
+ 2
+(2 rows)
+
+SELECT 1 AS one UNION CORRESPONDING SELECT 1 one;
+ one
+-----
+ 1
+(1 row)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 2 two;
+ two
+-----
+ 1
+ 2
+(2 rows)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 1 two;
+ two
+-----
+ 1
+ 1
+(2 rows)

What I think actually is needed is some targeted testing showing
non-interference with nearby features. As an example, CORRESPONDING
can't safely be implemented by just replacing the tlists of the
input sub-selects with shortened versions, because that would break
situations such as DISTINCT in the sub-selects. I think it'd be a
good idea to have a test along the lines of

SELECT DISTINCT x, y FROM ...
UNION ALL CORRESPONDING
SELECT DISTINCT x, z FROM ...

with values chosen to show that the DISTINCT operators did operate
on x/y and x/z, not just x alone.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-03-18 16:51:48 Re: new set of psql patches for loading (saving) data from (to) text, binary files
Previous Message Tom Lane 2017-03-18 15:29:16 Re: createlang/droplang deprecated