Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More work on SortSupport for text - strcoll() and strxfrm() caching
Date: 2015-08-04 20:30:58
Message-ID: CAM3SWZRWrsGx80kO1f5utWiPhB=rbzE_9ONqCHfgSpUXhy8nJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2015 at 12:41 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Interesting work.

Thanks.

> 1. My biggest gripe with this patch is that the comments are not easy
> to understand.

> Of course everybody may prefer something different here; I'm just
> telling you what I think.

I have struggled with trying to put just the right amount of
exposition on the theory behind a particular optimization in source
code comments, and things like that. Since no one is telling me that I
need to write more, clearly I don't have the balance right yet. To a
certain extent, it is a matter of personal style, but I'll try and be
more terse.

> 2. I believe the change to bttextcmp_abbrev() should be pulled out
> into a separate patch and committed separately. That part seems like
> a slam dunk.

Makes sense.

> 3. What is the worst case for the strxfrm()-reuse stuff? I suppose
> it's the case where we have many strings that are long, all
> equal-length, and all different, but only in the last few characters.
> Then the memcmp() is as expensive as possible but never works out.
> How does the patch hold up in that case?

I haven't tested it. I'll get around to it at some point in the next
couple of weeks. I imagine that it's exactly the same as the memcmp()
equality thing because of factors like speculative execution, and the
fact that we need both strings in cache anyway. It's almost exactly
the same story, although unlike the memcmp() opportunistic equality
pre-check thing, this check happens only n times, not n log n times.

I'm quite sure that the cost needs to be virtually zero to go ahead
with the idea. I think it probably is. Note that like the memcmp()
thing, we check string length first, before a memcmp().

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-08-04 20:37:04 Re: Dependency between bgw_notify_pid and bgw_flags
Previous Message Jeff Janes 2015-08-04 20:04:29 Re: FSM versus GIN pending list bloat