Quick Links

Re: SQL/JSON path: collation for comparisons, minor typos in docs

From:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To:	Markus Winand <markus(dot)winand(at)winand(dot)at>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SQL/JSON path: collation for comparisons, minor typos in docs
Date:	2019-08-08 00:27:38
Message-ID:	CAPpHfdtqpf5rd_E7tT2PPc3ixkr-GZY3AXSwGED5QEjkLpLH-w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Aug 8, 2019 at 3:05 AM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Thu, Aug 8, 2019 at 12:55 AM Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > On Wed, Aug 7, 2019 at 4:11 PM Alexander Korotkov
> > <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > > On Wed, Aug 7, 2019 at 2:25 PM Markus Winand <markus(dot)winand(at)winand(dot)at> wrote:
> > > > I was playing around with JSON path quite a bit and might have found one case where the current implementation doesn’t follow the standard.
> > > >
> > > > The functionality in question are the comparison operators except ==. They use the database default collation rather then the standard-mandated "Unicode codepoint collation” (SQL-2:2016 9.39 General Rule 12 c iii 2 D, last sentence in first paragraph).
> > >
> > > Thank you for pointing! Nikita is about to write a patch fixing that.
> >
> > Please, see the attached patch.
> >
> > Our idea is to not sacrifice "==" operator performance for standard
> > conformance. So, "==" remains per-byte comparison. For consistency
> > in other operators we compare code points first, then do per-byte
> > comparison. In some edge cases, when same Unicode codepoints have
> > different binary representations in database encoding, this behavior
> > diverges standard. In future we can implement strict standard
> > conformance by normalization of input JSON strings.
>
> Previous version of patch has buggy implementation of
> compareStrings(). Revised version is attached.

Nikita pointed me that for UTF-8 strings per-byte comparison result
matches codepoints comparison result. That allows simplify patch a
lot.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Use-Unicode-codepoint-collation-in-jsonpath-4.patch	application/octet-stream	16.0 KB

In response to

Re: SQL/JSON path: collation for comparisons, minor typos in docs at 2019-08-08 00:05:08 from Alexander Korotkov

Responses

Re: SQL/JSON path: collation for comparisons, minor typos in docs at 2019-08-08 08:53:20 from Markus Winand

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2019-08-08 01:01:38	Re: no default hash partition
Previous Message	Alexander Korotkov	2019-08-08 00:17:29	Re: Rethinking opclass member checks and dependency strength