Re: optimisation? collation "C" sorting for GroupAggregate for all deterministic collations

From: Maxim Ivanov <hi(at)yamlcoder(dot)me>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: optimisation? collation "C" sorting for GroupAggregate for all deterministic collations
Date: 2020-03-23 15:41:15
Message-ID: Lri2JvSIB54_Xpapqi2i1B_hC4FB6QCOnYv_kQvqSN8EVhoMbI_LMSoM5zDgUIcMxxK6HQBJ7pYjPw9Xzahwd8ZJCRGk8jqqv_O9Zt-yiOc=@yamlcoder.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> Perhaps this is what you mean by "deterministic", but isn't it
> possible for some collations to treat multiple byte sequences as equal
> values? And those multiple byte sequences wouldn't necessarily occur
> sequentially in C collation, so it wouldn't be possible to work around
> that by having the grouping node use one collation but the sorting
> node use the C one.
>
> If my memory is incorrect, then this sounds like an intriguing idea.

Yes, as per doc (https://www.postgresql.org/docs/12/collation.html#COLLATION-NONDETERMINISTIC) some collations can result in symbols(chars? codes? runes?) to be equal, while their byte representations is not. This optimisation should check for source table collation and do not change sorting collation if columns being sorted use non deterministic collation.

Luckily in practice it is probably to be very rare, all builtin collations are deterministic.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-23 15:57:15 Re: replay pause vs. standby promotion
Previous Message Joel Mariadasan (jomariad) 2020-03-23 15:28:22 RE: ASLR support for Postgres12