Re: Adding CORRESPONDING to Set Operations

From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding CORRESPONDING to Set Operations
Date: 2011-09-24 10:08:14
Message-ID: CAJZSWkWLfu_G9Vjn4N0F6eWQrrPv=BaHweAossvYYKVAFz4vWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am looking into perpunion.c and analyze.c

There is a catch inserting subqueries for corresponding in the planner.
Parser expects to see equal number of columns in both sides of the
UNION query. If there is corresponding however we cannot guarantee that.
Target columns, collations and types for the SetOperationStmt are
determined in the parser. If we pass the column number equality checks,
it is not clear that how one would proceed with the targetlist generation
loop
which is a forboth for two table's columns.

One way would be filtering the columns in the parser anyway and inserting
subqueries in the planner but it leads to the previous problem of column
ordering and view definition mess-up, and it would be too much bloat
methinks.

I can guess what needs to be done in prepunion.c, but I need a waypointer
for the parser.

tom lane: Thanks for your description

regards

Kerem KAT

On Fri, Sep 23, 2011 at 07:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Kerem Kat <keremkat(at)gmail(dot)com> writes:
> > While testing I noticed that ordering is incorrect in my implementation.
> At
> > first I thought that removing mismatched entries from ltargetlist and
> > rtargetlist would be enough, it didn't seem enough so I added rtargetlist
> > sorting.
>
> I don't think you can get away with changing the targetlists of the
> UNION subqueries; you could break their semantics. Consider for
> instance
>
> select distinct a, b, c from t1
> union corresponding
> select b, c from t2;
>
> If you discard the A column from t1's output list then it will deliver a
> different set of rows than it should, because the DISTINCT is
> considering the wrong set of values.
>
> One possible way to fix that is to introduce a level of sub-select,
> as if the query had been written
>
> select b, c from (select distinct a, b, c from t1) ss1
> union
> select b, c from (select b, c from t2) ss2;
>
> However, the real problem with either type of hackery is that these
> machinations will be visible in the parsed query, which means for
> example that a view defined as
>
> create view v1 as
> select distinct a, b, c from t1
> union corresponding
> select b, c from t2;
>
> would come out looking like the transformed version rather than the
> original when it's dumped, or even just examined with tools such as
> psql's \d+. I think this is bad style. It's certainly ugly to expose
> your implementation shortcuts to the user like that, and it also can
> cause problems down the road: if in the future we think of some better
> way to implement CORRESPONDING, we've lost the chance to do so for any
> stored views that got transformed this way. (There are several places
> in Postgres now that take such shortcuts, and all of them were mistakes
> that we need to clean up someday, IMO.)
>
> So I think that as far as the parser is concerned, you just want to
> store the CORRESPONDING clause more or less as-is, and not do too much
> more than verify that it's valid. The place to actually implement it is
> in the planner (see prepunion.c). Possibly the add-a-level-of-subselect
> approach will work, but you want to do that querytree transformation at
> plan time not parse time.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2011-09-24 10:31:01 Re: CUDA Sorting
Previous Message Josh Berkus 2011-09-24 00:30:35 Re: unite recovery.conf and postgresql.conf