Re: Super PathKeys (Allowing sort order through precision loss functions)

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Super PathKeys (Allowing sort order through precision loss functions)
Date: 2018-10-31 03:32:37
Message-ID: CAKJS1f8BLAWNUevJEMX+sbj58QBpyyKJLwTkOM9A3fs1Ltf7SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 October 2018 at 14:23, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> The other thing likely affecting this is locale / collation. Probably
> not for date_trunc, but certainly for things like substr()/trim(),
> mentioned by Simon upthread.
>
> In some languages the rules are pretty complex, and there's no chance
> it'll survive arbitrary substr() applied to the string. For example, in
> Czech we mostly sort character-by-character, but "ch" is an exception
> sorted in between "h" and "i".
>
> So essentially "hhhh <= hchh <= hihh". Obviously, substr($1,0,3) cuts
> the "ch" in half, changing the ordering:
>
> create table x (y text collate "cs_CZ");
> insert into x values ('hhhh'), ('hchh'), ('hihh');
>
> test=# select y from x order by 1;
> y
> ------
> hhhh
> hchh
> hihh
> (3 rows)
>
> test=# select substr(y,0,3) from x order by 1;
> substr
> --------
> hc
> hh
> hi
> (3 rows)
>
> I'm preeeeeeetty sure other languages have even funnier rules ...

That's pretty interesting, but as mentioned in my initial email...
More careful thought would be needed for other numerical types and
text types, I imagine, though I've not thought much about that.

I don't really think trim() or substr() would ever work for the reason
that they don't always operate on a prefix of the string. What you've
mentioned seems to rule out LEFT().

> I'm mildly suspicious of this data set, because it's essentially
> perfectly sorted/deterministic, and the Sort node won't have to do
> anything. So perhaps it just works by chance.
>
> Consider this instead:
>
> create table dt (ts timestamp, x text);
>
> insert into dt select * from (select d, (case when random() < 0.5 then
> 'month' else 'hour' end) from generate_series('2018-01-01'::timestamp,
> '2018-12-31'::timestamp, '1 hour'::interval) d) as foo order by random();
>

[...]

> 2018-01-01 00:00:00
> 2018-01-01 00:00:00
> 2018-01-01 00:00:00
> 2018-01-02 13:00:00
> ... kaboooooom! ...

Yeah. This is an issue. Thanks for the test case. However, I
acknowledged that in my reply to Tom. I did overlook it, which was
completely down to lack of imagination on my part. I'd never
considered using date_trunc() without a const 1st argument before. It
seems simple enough to disable the optimisation in that case. I've
attached a patch which does that. Maybe that'll help us look beyond
this and focus on any other reasons why this is not possible.

It's also true that this diminishes the usefulness of the idea, but
part of the reason I've posting the idea so early after having thought
about it is precisely to see if this is going to float or sink.
Maybe we'll decide the scope of usefulness is so small that it's not
worth it, or that each function has such different requirements that
we can't reasonably make it work by adding a few columns to pg_proc.
I'm personally more interested in the cases that can work. I
understand there is no shortage of cases where it can't.

Giving that we require const arguments away from the orderkey, perhaps
it could be made to work for simple arithmetic OpExprs. I'm not sure
if the use cases are actually there for that sort of thing and I've
seen WHERE indexcol+0 = <n> used many times to disable the use of
indexes, so making pathkeys see through those might be more annoying
than useful... But it's a thought...

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v2-0001-Allow-Pathkeys-to-derive-their-order-from-a-paren.patch application/octet-stream 23.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-10-31 04:38:29 Re: FDW Parallel Append
Previous Message Amit Langote 2018-10-31 02:26:59 Re: ToDo: show size of partitioned table