From: | Matthew Kelly <mkelly(at)tripadvisor(dot)com> |
---|---|
To: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Cc: | Peter Geoghegan <pg(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Matthew Spilich" <mspilich(at)tripadvisor(dot)com> |
Subject: | Re: Collations and Replication; Next Steps |
Date: | 2014-09-17 13:07:56 |
Message-ID: | 76A634FB-0BEC-4FCF-AC9C-B6EA2C50C290@tripadvisor.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here is where I think the timezone and PostGIS cases are fundamentally different:
I can pretty easily make sure that all my servers run in the same timezone. That's just good practice. I'm also going to install the same version of PostGIS everywhere in a cluster. I'll build PostGIS and its dependencies from the exact same source files, regardless of when I build the machine.
Timezone is a user level setting; PostGIS is a user level library used by a subset.
glibc is a system level library, and text is a core data type, however. Changing versions to something that doesn't match the kernel can lead to system level instability, broken linkers, etc. (I know because I tried). Here are some subtle other problems that fall out:
* Upgrading glibc, the kernel, and linker through the package manager in order to get security updates can cause the corruption.
* A basebackup that is taken in production and placed on a backup server might not be valid on that server, or your desktop machine, or on the spare you keep to do PITR when someone screws up.
* Unless you keep _all_ of your clusters on the same OS, machines from your database spare pool probably won't be the right OS when you add them to the cluster because a member failed.
Keep in mind here, by OS I mean CentOS versions. (we're running a mix of late 5.x and 6.x, because of our numerous issues with the 6.x kernel)
The problem with LC_IDENTIFICATION is that every machine I have seen reports revision "1.0", date "2000-06-24". It doesn't seem like the versioning is being actively maintained.
I'm with Martjin here, lets go ICU, if only because it moves sorting to a user level library, instead of a system level. Martjin do you have a link to the out of tree patch? If not I'll find it. I'd like to apply it to a branch and start playing with it.
- Matt K
On Sep 17, 2014, at 7:39 AM, Martijn van Oosterhout <kleptog(at)svana(dot)org>
wrote:
> On Tue, Sep 16, 2014 at 02:57:00PM -0700, Peter Geoghegan wrote:
>> On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> Clearly, this is worth documenting, but I don't think we can completely
>>> prevent the problem. There has been talk of a built-in index integrity
>>> checking tool. That would be quite useful.
>>
>> We could at least use the GNU facility for versioning collations where
>> available, LC_IDENTIFICATION [1]. By not versioning collations, we are
>> going against the express advice of the Unicode consortium (they also
>> advise to do a strcmp() tie-breaker, something that I think we
>> independently discovered in 2005, because of a bug report - this is
>> what I like to call "the Hungarian issue". They know what our
>> constraints are.). I recognize it's a tricky problem, because of our
>> historic dependence on OS collations, but I think we should definitely
>> do something. That said, I'm not volunteering for the task, because I
>> don't have time. While I'm not sure of what the long term solution
>> should be, it *is not* okay that we don't version collations. I think
>> that even the best possible B-Tree check tool is a not a solution.
>
> Personally I think we should just support ICU as an option. FreeBSD has
> been maintaining an out of tree patch for 10 years now so we know it
> works.
>
> The FreeBSD patch is not optimal though, these days ICU supports UTF-8
> directly so many of the push-ups FreeBSD does are no longer necessary.
> It is often faster than glibc and the key sizes for strxfrm are more
> compact [1] which is relevent for the recent optimisation patch.
>
> Lets solve this problem for once and for all.
>
> [1] http://site.icu-project.org/charts/collation-icu4c48-glibc
>
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
>> He who writes carelessly confesses thereby at the very outset that he does
>> not attach much importance to his own thoughts.
> -- Arthur Schopenhauer
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-09-17 13:17:23 | Re: Collations and Replication; Next Steps |
Previous Message | Martijn van Oosterhout | 2014-09-17 12:39:04 | Re: Collations and Replication; Next Steps |