Re: BUG #16241: Degraded hash join performance

From: Thomas Butz <tbutz(at)optitool(dot)de>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16241: Degraded hash join performance
Date: 2020-02-05 11:06:41
Message-ID: 1965637501.286993.1580900801394.JavaMail.zimbra@optitool.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I've opened an issue: https://github.com/giggls/mapnik-german-l10n/issues/40

I suspect that the number of executed regexp_replace calls is the culprit here.
The cache of regexp.c seems to be limited to 32 entries which might be to low to keep all involved regexes cached.

> Hi,
>
> On 2020-02-04 11:00:29 -0500, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>> > Interesting! The no-children one clearly shows that a lot of the the
>> > time is spent evaluating regular expressions (there's other regex
>> > functions in the profile too):
>> > 23.36% postgres postgres [.] subcolor
>>
>> Huh ...
>>
>> > I'm not aware of any relevant regular expression evaluation changes
>> > between 11 and 12. Tom, does this trigger anything?
>>
>> (1) Nope, I'm not either; the last non-back-patched change in that
>> code was c54159d44 in v10.
>>
>> (2) subcolor() is part of regex compilation, not execution, which makes
>> one wonder why it's showing up at all. Maybe the regex cache in
>> adt/regexp.c is overflowing and preventing useful caching? But
>> that didn't change in v12 either. Are these test cases really
>> 100% equivalent? I'm wondering if there are a few more "hot"
>> regex patterns in the v12 data ...
>
> They are not 100% equivalent, but the part of the plan we see is very
> similar rowcount wise. It's possible that the functions differ more
> however, there are different postgis versions involved, and apparently
> also an "osml10n" extension.
>
>
>> (3) Where the heck is the regex use coming from at all? I don't
>> see any regex operators in the plan. Maybe it's inside the
>> plpgsql function?
>
> It definitely is. The stack shows at least two levels of plpgsql
> functions. And Thomas has since confirmed that removing the functioncall
> fixes the issue.
>
> Based on the name I think this is somewhere around this:
> https://github.com/giggls/mapnik-german-l10n/blob/master/plpgsql/get_localized_name_from_tags.sql#L120
> The callgraph indicates that most of the cost comes from within
> textregexreplace_noopt.
>
> Not clear why the cache isn't fixing this - there are no variables in
> the regexp_replace calls as far as I can see.
>
> Greetings,
>
> Andres Freund

--

Thomas Butz

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Emil Iggland 2020-02-05 11:46:33 Re: BUG #15858: could not stat file - over 4GB
Previous Message Arseny Sher 2020-02-05 09:04:06 Re: ERROR: subtransaction logged without previous top-level txn record