Re: Hash Functions

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-14 21:07:34
Message-ID: CAH2-WzkcZAmJqxyY=dAL8myC4kyTGxPphJFn00gM176uDPYXdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 13, 2017 at 9:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The latter is
> generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what
> about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing
> list discussions, I'm under the impression that it is sometimes
> debatable how a character in one encoding should be converted to some
> other encoding, either because it's not clear whether there is a
> mapping at all or it's unclear what mapping should be used.

The express goal of the Unicode consortium is to replace all existing
encodings with Unicode. My personal opinion is that a Unicode
monoculture would be a good thing, provided reasonable differences can
be accommodated. So, it might be that there is ambiguity about how one
codepoint can be converted to another in another encoding, but that's
because encodings like SJIS and BIG5 are needlessly ambiguous. It has
nothing to do with cultural preferences leaving the question
undecidable (at least by a panel of natural language experts), and
everything to do with these regional character encoding systems being
objectively bad. They richly deserve to die, and are in fact dying.

Encoding actually *is* a property of the machine, even though regional
encodings obfuscate things. There is a reason why MacOS and Java use
UTF-16 rather than UTF-8, and there is a reason why the defacto
standard on the web is UTF-8, and those reasons are completely
technical. AFAICT, whatever non-technical reasons remain are actually
technical debt in disguise.

Where this leaves hash partitioning, I cannot say.

--
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-05-14 21:22:16 Re: PG10 pgindent run
Previous Message Andres Freund 2017-05-14 20:06:03 Re: Hash Functions