Making the C collation less inclined to abort abbreviation

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Making the C collation less inclined to abort abbreviation
Date: 2015-11-29 21:02:57
Message-ID: CAM3SWZTXyTCChND8B3AsXCKPsn4vYhapKf3fqHFyy+5eRPK6zA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The C collation is treated exactly the same as other collations when
considering whether the generation of abbreviated keys for text should
continue. This doesn't make much sense. With text, the big cost that
we are concerned about going to waste should abbreviated keys not
capture sufficient entropy is the cost of n strxfrm() calls. However,
the C collation doesn't use strxfrm() -- it uses memcmp(), which is
far cheaper.

With other types, like numeric and now UUID, the cost of generating an
abbreviated key is significantly lower than text when using collations
other than the C collation. Their cost models reflect this, and abort
abbreviation far less aggressively than text's, even though the
trade-off is very similar when text uses the C collation.

Attached patch fixes this inconsistency by making it significantly
less likely that abbreviation will be aborted when the C collation is
in use. The behavior with other collations is unchanged. This should
be backpatched to 9.5 as a bugfix, IMV.

--
Peter Geoghegan

Attachment Content-Type Size
0001-Abort-C-collation-text-abbreviation-less-frequently.patch text/x-patch 1.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-11-29 21:28:12 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Previous Message Tom Lane 2015-11-29 19:35:32 Re: Segfault while using an array domain