Re: Pre-proposal: unicode normalized text

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, Nico Williams <nico(at)cryptonector(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-05 13:10:23
Message-ID: CAMsGm5fFwAa1=kUxwODk0dNhj7b57Q54BgdxL+Cep+tCHTWi-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 5 Oct 2023 at 07:32, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> But I do think that sometimes users are reluctant to perform encoding
> conversions on the data that they have. Sometimes they're not
> completely certain what encoding their data is in, and sometimes
> they're worried that the encoding conversion might fail or produce
> wrong answers. In theory, if your existing data is validly encoded and
> you know what encoding it's in and it's easily mapped onto UTF-8,
> there's no problem. You can just transcode it and be done. But a lot
> of times the reality is a lot messier than that.
>

In the case you describe, the users don’t have text at all; they have
bytes, and a vague belief about what encoding the bytes might be in and
therefore what characters they are intended to represent. The correct way
to store that in the database is using bytea. Text types should be for when
you know what characters you want to store. In this scenario, the
implementation detail of what encoding the database uses internally to
write the data on the disk doesn't matter, any more than it matters to a
casual user how a table is stored on disk.

Similarly, I don't believe we have a "YMD" data type which stores year,
month, and day, without being specific as to whether it's Gregorian or
Julian; if you have that situation, make a 3-tuple type or do something
else. "Date" is for when you actually know what day you want to record.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-10-05 13:13:30 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Robert Haas 2023-10-05 12:51:40 Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}