Some progress on INSERT/SELECT/GROUP BY bugs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Some progress on INSERT/SELECT/GROUP BY bugs
Date: 1999-05-14 00:01:45
Message-ID: 13811.926640105@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.

Here is the test case I'm looking at:

CREATE TABLE si_tmpverifyaccountbalances (
type int4 NOT NULL,
memberid int4 NOT NULL,
categoriesid int4 NOT NULL,
amount numeric);

CREATE TABLE invoicelinedetails (
invoiceid int4,
memberid int4,
totshippinghandling numeric,
invoicelinesid int4);

INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;

ERROR: INSERT has more expressions than target columns

The reason this is coming out is that the matching of GROUP BY (also
ORDER BY) items to targetlist entries is fundamentally broken in this
context. The GROUP BY items "memberid" and "totshippinghandling" are
simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
in parse_clause.c; what findTargetlistEntry() does with them is to try
to match them against the resdom names of the existing targetlist items.
I think that's correct behavior in the plain SELECT case (but note it
means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
--- is that per spec??). But it fails miserably in the INSERT/SELECT
case, because by the time control gets here, the targetlist items have
been given resdom names *corresponding to the column names of the target
table*.

So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.

Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong. (Don't have a spec to check this with,
however.)

I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete. In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does. The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.

(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones. The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE. So the revised code should only
assign dest column names to non-junk TLEs.)

I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it? I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-05-14 00:56:58 Progress on char(n) default-value problem
Previous Message Thomas Lockhart 1999-05-13 23:32:49 Re: Report on NetBSD/mac port of Postgres 6.4.2