Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

From: Evan Jones <evan(dot)jones(at)datadoghq(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific
Date: 2023-06-17 14:57:05
Message-ID: CA+HWA9bTRDf52DHyU+JOoqEALgRGRo5uHUYTFuduoj3cBfer+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Unfortunately I just noticed a possible "bug" with this change. The
scanner_isspace() function only recognizes *five* ASCII space characters: '
' \t \n \r \f. It *excludes* VTAB \v, which the C standard function
isspace() includes. This means this patch changed the behavior of hstore
parsing for some "unusual" cases where the \v character was previously
ignored, and now is not, such as: "select 'k=>\vv'::hstore" . It seems
unlikely to me that anyone would be depending on this. The
application/programming language library would need to be explicitly
depending on VTAB being ignored as leading/trailing characters for hstore
key/values. I am hopeful that most implementations encode hstore values the
same way Postgres does: always using quoted strings, which avoids this
problem.

However, if we think this change could be a problem, one fix would be to
switch scanner_isspace() to array_isspace(), which returns true for these
*six* ASCII characters. I am happy to submit a patch to do this.

However, I am now wondering if the fact that scanner_isspace() and
array_isspace() disagree with each other could be problematic somewhere,
but so far I haven found anything.

Problematic example before my hstore change:

$ printf "select 'k=>\vv'::hstore" | psql
hstore
----------
"k"=>"v"
(1 row)

Same example after my hstore change on postgres master commit a14e75eb0b
from 2023-06-16:

$ printf "select 'k=>\vv'::hstore" | psql
hstore
--------------
"k"=>"\x0Bv"
(1 row)

On Sun, Jun 11, 2023 at 8:18 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Tue, Jun 06, 2023 at 10:16:09AM -0400, Evan Jones wrote:
> > I did a quick look at the places found with "git grep isspace"
> yesterday. I
> > agree with the comment from commit 9ae2661: "I've left alone isspace()
> > calls in places that aren't really expecting any non-ASCII input
> > characters, such as float8in()." There are a number of other calls where
> I
> > think it would likely be safe, and possibly even a good idea, to replace
> > isspace() with scanner_isspace(). However, I couldn't find any where I
> > could cause a bug like the one I hit in hstore parsing.
>
> Yes, I agree with this feeling. Like 9ae2661, I can't get really
> excited about plastering more of that, especially if it were for
> timezone value input or dictionary options. One area with a new
> isspace() since 2017 is multirangetypes.c, but it is just a copy of
> rangetypes.c.
>
> > Original mailing list post for commit 9ae2661 in case it is helpful for
> > others:
> https://www.postgresql.org/message-id/10129.1495302480@sss.pgh.pa.us
>
> I have reproduced the original problem reported on macOS 13.4, which
> is close to the top of what's available.
>
> Passing to pg_regress some options to use something else than UTF-8
> leads to a failure in the tests, so we need a split like
> fussyztrmatch to test that:
> REGRESS_OPTS='--encoding=SQL_ASCII --no-locale' make check
>
> An other error pattern without a split could be found on Windows, as
> of:
> select E'key\u0105=>value'::hstore;
> - hstore
> ------------------
> - "keyÄ…"=>"value"
> -(1 row)
> -
> +ERROR: character with byte sequence 0xc4 0x85 in encoding "UTF8" has
> no equivalent in encoding "WIN1252"
> +LINE 1: select E'key\u0105=>value'::hstore;
>
> We don't do that for unaccent, actually, leading to similar failures..
> I'll launch a separate thread about that shortly.
>
> With that fixed, the fix has been applied and backpatched. Thanks for
> the report, Evan!
> --
> Michael
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tommy Pavlicek 2023-06-17 15:45:10 [PATCH] ltree hash functions
Previous Message Gurjeet Singh 2023-06-17 14:40:18 Re: test_extensions: fix inconsistency between meson.build and Makefile