Re: Super PathKeys (Allowing sort order through precision loss functions)

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Super PathKeys (Allowing sort order through precision loss functions)
Date: 2018-10-30 22:41:35
Message-ID: CAKJS1f8bVo0X8+b-G92U7bV9aZ11C6dQbyJBeMQvTUFjGxq_aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 October 2018 at 08:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> I've started working on something I've ended up calling "Super
>> PathKeys". The idea here is to increase the likelihood of a Path with
>> PathKeys being used for a purpose that requires a less strict sort
>> order due to ordering being required from the return value of some
>> precision loss function.
>
> I'm a little confused by the terminology here, or why this has anything
> at all to do with a new sort of pathkey. It seems like the idea you're
> driving at is to be able to mark a function as being order-preserving,
> in the sense that if one input is known sorted then the output will
> also be sorted (by the same or a related opclass). You probably need
> some additional constraints, like any other inputs being constants,
> before that really works. But given that, I don't see why you need a
> new kind of pathkey: you just have a new way to conclude that some
> path is sorted by the pathkey you already want.

Thanks for chipping in on this.

The additional pathkeys are not required to make it work, they're just
required to make it work efficiently. The fact that we could to the
trouble of making pathkeys canonical so we can perform pointer
comparison rather than using equals() says to me that I'd better not
do anything to slow this down too much. Doing it without the superkey
idea seems to require quite a bit of analysis during
pathkeys_contained_in() as we'd need to check for superkeys then,
instead of when we're building the pathkey in the first place. As
for the code that I did add to pathkeys_contained_in(), I doubt it's
measurably slower for the normal case.

The other fields being Const part I did miss. That will also be a
requirement. I just failed to consider that date_trunc() could be used
with a variable 1st arg.

> Maybe if I read the patch it'd be clearer why you want to describe it
> that way, but I'm too lazy to do that right now. One thing I would
> say though is that a pg_proc column identifying the interesting input
> parameter is insufficient; you'd need to *explicitly* say which opclass(es)
> the sorting behavior guarantee applies for.
>
>> -- Test a more complex case where the superkey can be matched to
>> multiple pathkeys
>> explain (costs off) select date_trunc('year', ts), date_trunc('month',
>> ts), a, count(*) from tstbl group by 1,2,3 order by 1,2,3;
>> QUERY PLAN
>> -----------------------------------------------------------------------------
>> GroupAggregate
>> Group Key: date_trunc('year'::text, ts), date_trunc('month'::text, ts), a
>> -> Index Only Scan using tstbl_ts_a_idx on tstbl
>> (3 rows)
>
> [ squint... ] Does that really work? If so, how? It would need a whole
> lot more knowledge about the behavior of date_trunc than I think could
> possibly be reasonable to encode in a general mechanism.

I'm not entirely certain we can always consume multiple matching keys
or if it has to be one for one. However, I get an empty diff from each
of the following two query pairs.

select date_trunc('month'::text,date),date_Trunc('year', date) from dt
order by dt;
select date_trunc('month'::text,date),date_Trunc('year', date) from dt
order by 1,2;

and

select date_trunc('year'::text,date),date_Trunc('month', date) from dt
order by dt;
select date_trunc('year'::text,date),date_Trunc('month', date) from dt
order by 1,2;

with setup:

create table dt (date date);
insert into dt select d from generate_series('2018-01-01',
'2018-12-31', '1 day'::interval) d;

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2018-10-30 23:24:59 Re: Ordered Partitioned Table Scans
Previous Message Andres Freund 2018-10-30 22:20:06 Re: Lambda expressions (was Re: BUG #15471)