Re: [GENERAL] psql weird behaviour with charset encodings

From: Noah Misch <noah(at)leadboat(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, hgonzalez(at)gmail(dot)com
Subject: Re: [GENERAL] psql weird behaviour with charset encodings
Date: 2015-06-18 00:47:56
Message-ID: 20150618004756.GB415598@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Jun 03, 2015 at 05:25:45PM +0900, Michael Paquier wrote:
> On Tue, Jun 2, 2015 at 4:19 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Sun, May 24, 2015 at 2:43 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > It would be good to purge the code of precisions on "s" conversion specifiers,
> > > then Assert(!pointflag) in fmtstr() to catch new introductions. I won't plan
> > > to do it myself, but it would be a nice little defensive change.
> >
> > This sounds like a good protection idea, but as it impacts existing
> > backend code relying in sprintf's port version we should only do the
> > assertion in HEAD in my opinion, and mention it in the release notes of the
> > next major version at the time a patch in this area is applied. I guess

Adding the assertion would be master-only. We don't necessarily release-note
C API changes.

> > that we had better backpatch the places using .*s though on back-branches.

I would tend to back-patch only the ones that cause interesting bugs. For
example, we should not reach the read.c elog() calls anyway, so it's not a big
deal if the GNU libc bug makes them a bit less helpful in back branches.
(Thanks for the list of code sites; it was more complete than anything I had.)
So far, only tar.c looks harmed enough to back-patch.

> Attached is a patch purging a bunch of places from using %.*s, this will
> make the code more solid when facing non-ASCII strings. I let pg_upgrade
> and pg_basebackup code paths alone as it reduces the code lisibility by
> moving out of this separator. We may want to fix them though if file path
> names have non-ASCII characters, but it seems less critical.

To add the assertion, we must of course fix all uses.

Having seen the patch I requested, I don't like the result as much as I
expected to like it. The patched code is definitely harder to read and write:

> @@ -1534,7 +1541,10 @@ parseNodeString(void)
> return_value = _readDeclareCursorStmt();
> else
> {
> - elog(ERROR, "badly formatted node string \"%.32s\"...", token);
> + char buf[33];
> + memcpy(buf, token, 32);
> + buf[33] = '\0';
> + elog(ERROR, "badly formatted node string \"%s\"...", buf);
> return_value = NULL; /* keep compiler quiet */
> }

(Apropos, that terminator belongs in buf[32], not buf[33].)

Perhaps we're better off setting aside the whole idea, or forcing use of
snprintf.c on configurations having the bug?

Thanks,
nm

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Moran 2015-06-18 01:55:08 Re: Momentary Delay
Previous Message Миша Тюрин 2015-06-17 21:14:20 writable cte triggers reverse order

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-06-18 03:37:48 Re: [GENERAL] psql weird behaviour with charset encodings
Previous Message Brendan Jurd 2015-06-17 23:15:00 Re: [PATCH] Function to get size of asynchronous notification queue