Re: texteq/byteaeq: avoid detoast [REVIEW]

From: Jim Nasby <jim(at)nasby(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Andy Colson <andy(at)squeakycode(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: texteq/byteaeq: avoid detoast [REVIEW]
Date: 2011-01-17 20:36:56
Message-ID: A5370FA2-AB83-48E9-83CB-4F3683CB26A2@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan 17, 2011, at 9:22 AM, Noah Misch wrote:
> On Mon, Jan 17, 2011 at 07:35:52AM +0100, Magnus Hagander wrote:
>> On Mon, Jan 17, 2011 at 06:51, Itagaki Takahiro
>> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>>> On Mon, Jan 17, 2011 at 04:05, Andy Colson <andy(at)squeakycode(dot)net> wrote:
>>>> This is a review of:
>>>> https://commitfest.postgresql.org/action/patch_view?id=468
>>>>
>>>> Purpose:
>>>> ========
>>>> Equal and not-equal _may_ be quickly determined if their lengths are
>>>> different. ? This _may_ be a huge speed up if we don't have to detoast.
>>>
>>> We can skip detoast to compare lengths of two text/bytea values
>>> with the patch, but we still need detoast to compare the contents
>>> of the values.
>>>
>>> If we always generate same toasted byte sequences from the same raw
>>> values, we don't need to detoast at all to compare the contents.
>>> Is it possible or not?
>>
>> For bytea, it seems it would be possible.
>>
>> For text, I think locales may make that impossible. Aren't there
>> locale rules where two different characters can "behave the same" when
>> comparing them? I know in Swedish at least w and v behave the same
>> when sorting (but not when comparing) in some variants of the locale.
>>
>> In fact, aren't there cases where the *length test* also fails? I
>> don't know this for sure, but unless we know for certain that two
>> different length strings can never be the same *independent of
>> locale*, this whole patch has a big problem...
>
> Just to be clear, the code already has these length tests today. This patch
> just moves them before the detoast.

Any reason we can't do this for other varlena? I'm wondering if it makes more sense to have some generic toast comparison functions that don't care what the data in toast actually is...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2011-01-17 20:39:16 Re: psql: Add \dL to show languages
Previous Message Tom Lane 2011-01-17 20:33:57 Re: texteq/byteaeq: avoid detoast [REVIEW]