Re: weird GROUPING SETS and ORDER BY behaviour

From: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: weird GROUPING SETS and ORDER BY behaviour
Date: 2024-01-06 23:27:40
Message-ID: CAEzk6fcwQ9aX4Q7TsejkPhe_Df6A2oNMs=EoUkewXTysLZ=ApA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 6 Jan 2024, 19:49 Tom Lane, <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> > Something does seem off here with the interaction between grouping sets
> and
> > order by.
>
> Yeah. I think Geoff is correct to identify the use of subqueries in
> the grouping sets as the triggering factor.

[snip]

> I think this particular symptom might be new, but we've definitely
> seen related trouble reports before. I'm inclined to think that the
> right fix will require making the parser actually replace such
> expressions with Vars referencing a notional grouping output relation,
> so that there's not multiple instances of the sub-query in the parser
> output in the first place.

Well yes. I assumed that since it's required that a group expression is in
the query itself that the grouping values were taken from the result set, I
have to admit to some surprise that they're calculated twice (three times?).

That's a fairly big job and nobody's
> tackled it yet.

For what it's worth, as a user if we could reference a column alias in the
GROUP and ORDER sections, rather than having to respecify the expression
each time, that would be a far more friendly solution. Not sure it makes
the work any less difficult though.

In the meantime, what I'd suggest as a workaround is to put those
> subexpressions into a sub-select with an optimization fence (you
> could use OFFSET 0 or a materialized CTE), so that the grouping
> sets list in the outer query just has simple Vars as elements.
>

Not possible in our case, sadly - at least not without a complete redesign
of our SQL-generating code. It would be (much) easier to add a sort to the
output stage, tbh, and stop lazily relying on the output being sorted for
us; I guess that's the route we'll have to take.

Thanks all for taking the time to look at it.

Geoff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-01-06 23:48:30 Re: POC: Extension for adding distributed tracing - pg_tracing
Previous Message Tom Lane 2024-01-06 22:41:14 Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500