Re: Degression (PG10 > 11, 12 or 13)

From: Johannes Graën <johannes(at)selfnet(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Degression (PG10 > 11, 12 or 13)
Date: 2021-05-28 23:19:52
Message-ID: 0ff0701d-48fb-39f5-0760-443cc452daad@selfnet.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/05/2021 18.24, Tom Lane wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> pá 28. 5. 2021 v 16:12 odesílatel Johannes Graën <johannes(at)selfnet(dot)de>
>> napsal:
>>> When trying to upgrade an existing database from version 10 to 13 I came
>>> across a degression in some existing code used by clients. Further
>>> investigations showed that performance measures are similar in versions
>>> 11 to 13, while in the original database on version 10 it's around 100
>>> times faster. I could boil it down to perl functions used for sorting.
>
>> Are you sure, so all databases use the same encoding and same locale?
>
> Yeah ... I don't know too much about the performance of Perl regexps,
> but it'd be plausible that it varies depending on locale setting.

It probably wasn't Perl at all. Thanks to the hint I checked the initial
database again and, while encoding and ctype are set to UTF8, the
collation is C, which makes a huge difference:

... order by tab(attr) => Execution Time: 51429.875 ms
... order by tab(attr collate "C") => Execution Time: 537.757 ms

in the original database. Any other version yields similar times.

On 28/05/2021 17.47, Tomas Vondra wrote:
> That function is pretty much just a sequence of ~120 regular
> expressions, doing something similar to unaccent(). I wonder if we're
> calling the function much more often, perhaps due to some changes in the
> sort code (the function is immutable, but that does not guarantee it's
> called just once).

> Also, maybe try materializing the function results before doing the
> sort, perhaps like this:
>
> SELECT * FROM (select attr, func(attr) as fattr from tab offset 0) foo
> ORDER BY fattr;

I was expecting it to be called once in the process of sorting, and it
seems that this is actually true for all version and different
collations, but sorting for a collation that is not C requires
considerable more resources (that still needs to be shown for other
collations, but I see the overhead of having more or less complex
definitions vs. just comparing numbers).

That being said, I would have used unaccent or, if that wasn't an
option, maybe have those values calculated by a trigger function when
the corresponding rows are changed. But I don't control the code.

Now what keeps me wondering is how the sorting works internally and if
we could conclude that using the C collation in order expressions and
indexes is a general way to speed up queries - if the actual order is of
less importance.

Best
Johannes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-05-29 00:27:08 ANALYZE's dead tuple accounting can get confused
Previous Message Dean Gibson (DB Administrator) 2021-05-28 22:13:58 Re: AWS forcing PG upgrade from v9.6 a disaster