Re: must appear in the GROUP BY clause or be used in an aggregate function problem

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: "Edward W(dot) Rouse" <erouse(at)comsquared(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: must appear in the GROUP BY clause or be used in an aggregate function problem
Date: 2012-01-31 20:57:15
Message-ID: CAEV0TzCdY9KsbqKJDWH0R4-CtmXQgfetg2L5DSD_ioX4qU7NOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse <erouse(at)comsquared(dot)com>wrote:

> And in most cases this works fine. The problem arises when invoices get
> added to the exception table due to their not being an invoice number. Even
> though we join on the tracking id, the group by on invoicenum lumps the
> different blank invoices into a single line, if the same user has more than
> 1. What we want is for each of the individual blank invoicenum entries to
> have a separate line in the result. If I remove b.invoicenum from the group
> by then I get the error in the subject line. If I try to use an aggregate
> function (like I used MAX on the names) it's even worse. MAX works on the
> names because they are all the same. MAX on the date doesn't seem to effect
> the results that I can see other than if an invoice went into exception
> more
> than once, and in that case we only want the latest one anyway.
>
> Any hints as to how to get this to not lump all of the empty invoicenums
> for
> a user into a single line? And I have mentioned putting in a dummy value
> like the date for an invoicenum, but that works as well as I expected it
> would (not at all).
>

It seems like the use of the aggregation functions in your example aren't
absolutely necessary - though perhaps the date one is. Best solution would
be to get rid of the aggregate columns so that you don't have this issue.

If that isn't possible, there are definitely some kludges you can use to
get it to work if you really don't want to union 2 queries together - one
with and one without valid invoice ids. Assuming invoice ids are generated
via a sequence, you can do the following to guarantee a unique 'id' for
each empty invoice - at the expense of incrementing your sequence
unnecessarily:

COALESCE(invoiceid, nextval(invoiceid_sequence))

A better version of that kludge would be to create a sequence just for this
purpose and set it to a very negative number. All of your generated fake
ids will then be negative numbers (so easily identified by whatever is
consuming the query results) and you can reset the sequence back to the
most negative possible value whenever you get concerned about running out
of ids, since you won't be using the same sequence as the invoice table
itself.

There are probably lots of variations on that general concept. You can
write a window function which will remember the ids already seen/generated
for each row and just generate an arbitrary id to be used in place of null
for grouping, though you'd have to worry about using an id that has not yet
appeared but which later does appear. Assuming you can create a bucket of
space large enough that is guaranteed to not conflict with valid invoice
ids, you could make it work. I don't know if you can have select max(x),
invoiceid group by func(invoiceid), so you may have to structure it as
"select max(x), func(invoiceid) group by 2" which would require that your
query results can deal with faked invoiceids - negative numbers again
providing a possible way to identify them.

Doesn't postgres allow operator overloading? Perhaps you could override
the equality operator for that type so that just a single value (the max
value, for example) always compares as unequal to itself and then use
COALESCE(invoiceid, 9223372036854775807). All rows without a valid value
will have that value, but the overloaded equality operator will cause them
to not group together (I'm guessing. I haven't tried this). That one makes
me nervous because of the potential for nasty side effects should something
legitimately have that value, but one could probably make the case for
having a bigger problem if a column has a value equal to max bigint.

Restructuring the query to separate valid invoice ids from invalid and/or
getting rid of the aggregation does seem like the best solution, though.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2012-01-31 21:04:18 Re: pg_dump not correctly saving schema with partitioned tables?
Previous Message David Johnston 2012-01-31 20:48:33 Re: must appear in the GROUP BY clause or be used in an aggregate function problem