Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.

From: Spyridon Dimitrios Agathos <spyridon(dot)dimitrios(dot)agathos(at)gmail(dot)com>
To: Nikolay Shaplov <dhyan(at)nataraj(dot)su>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Subject: Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.
Date: 2022-07-16 18:38:14
Message-ID: CAFM5RaqekstfZ9OTOw2Kn5LWs_T9WnC2FfEnB55tzDdF7EGPEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

this is to verify that the .patch proposed here:

https://www.postgresql.org/message-id/flat/2318797.1638558730%40sss.pgh.pa.us

fixes the issue. I applied the patch and:
1) The build type doesn't affect the result of the query result
2) The valgrind doesn't complain about out of bound reads
3) The output of the "faulty" insertion is shown in "\ooo format".

Looking forward to the next steps.

--
Spiros
(ServiceNow)

Στις Πέμ 14 Ιουλ 2022 στις 2:08 μ.μ., ο/η Nikolay Shaplov <dhyan(at)nataraj(dot)su>
έγραψε:

> В письме от среда, 13 июля 2022 г. 16:14:39 MSK пользователь Aleksander
> Alekseev написал:
>
> Hi! Let me join the review process. Postgres data types is field of
> expertise I
> am interested in.
>
> 0. Though it looks like a steady bug, I can't reproduce it. Not using
> valgrind, not using ASan (address sanitizer should catch reading out of
> bounds
> too). I am running Debian Bullseye, and tried to build both postgresl 14.4
> and
> current master.
>
> Never the less I would dig into this issue. And start with the parts that
> is
> not covered by the patch, but seems to be important for me.
>
> 1. typename "char" (with quotes) is very-very-very confusing. it is
> described
> in documentation, but you need to be postgres expert or careful
> documentation
> reader, to notice important difference between "char" and char.
> What is the history if "char" type? Is it specified by some standard? May
> be it
> is good point to create more understandable alias, like byte_char,
> ascii_char
> or something for usage in practice, and keep "char" for backward
> compatibility
> only.
>
> 2. I would totally agree with Tom Lane and Isaac Morland, that problem
> should
> be also fixed on the side of type conversion. There is whole big thread
> about
> it. Guess we should come to some conclusion there
>
> 3.Fixing out of bound reading for broken unicode is also important.
> Though
> for now I am not quite sure it is possible.
>
>
> > - p += pg_mblen(p);
> > + {
> > + int t = pg_mblen(p);
> > + p += t;
> > + max_copy_bytes -= t;
> > + }
>
> Minor issue: Here I would change variable name from "t" to "char_len" or
> something, to make code more easy to understand.
>
> Major issue: is pg_mblen function safe to call with broken encoding at the
> end
> of buffer? What if last byte of the buffer is 0xF0 and you call pg_mblen
> for it?
>
>
> >+ copy_bytes = p - s;
> >+ if(copy_bytes > max_copy_bytes)
> >+ copy_bytes = max_copy_bytes;
>
> Here I would suggest to add comment about broken utf encoding case. That
> would
> explain why we might come to situation when we can try to copy more than
> we
> have.
>
> I would also suggest to issue a warning here. I guess person that uses
> postgres would prefer to know that he managed to stuff into postgres a
> string
> with broken utf encoding, before it comes to some terrible consequences.
>
> > Hi Spyridon,
> >
> > > The column "single_byte_col" is supposed to store only 1 byte.
> > > Nevertheless, the INSERT command implicitly casts the '🀆' text into
> > > "char". This means that only the first byte of '🀆' ends up stored in
> the
> > > column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is
> expected
> > > since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy
> will
> > > copy 4 bytes instead of 1, hence an out of bounds memory read happens
> for
> > > pointer 's', which effectively copies random bytes.
> > Many thanks for reporting this!
> >
> > > - OS: Ubuntu 20.04
> > > - PSQL version 14.4
> >
> > I can confirm the bug exists in the `master` branch as well and
> > doesn't depend on the platform.
> >
> > Although the bug is easy to fix for this particular case (see the
> > patch) I'm not sure if this solution is general enough. E.g. is there
> > something that generally prevents pg_mblen() from doing out of bound
> > reading in cases similar to this one? Should we prevent such an INSERT
> > from happening instead?
>
>
> --
> Nikolay Shaplov aka Nataraj
> Fuzzing Engineer at Postgres Professional
> Matrix IM: @dhyan:nataraj.su
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-07-16 19:54:56 Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Previous Message Tom Lane 2022-07-16 17:09:12 Re: The "char" type versus non-ASCII characters