Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.

From: Nikolay Shaplov <dhyan(at)nataraj(dot)su>
To: Spyridon Dimitrios Agathos <spyridon(dot)dimitrios(dot)agathos(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Subject: Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.
Date: 2022-07-14 11:07:56
Message-ID: 2710588.JKtg82POfv@thinkpad-pgpro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

В письме от среда, 13 июля 2022 г. 16:14:39 MSK пользователь Aleksander
Alekseev написал:

Hi! Let me join the review process. Postgres data types is field of expertise I
am interested in.

0. Though it looks like a steady bug, I can't reproduce it. Not using
valgrind, not using ASan (address sanitizer should catch reading out of bounds
too). I am running Debian Bullseye, and tried to build both postgresl 14.4 and
current master.

Never the less I would dig into this issue. And start with the parts that is
not covered by the patch, but seems to be important for me.

1. typename "char" (with quotes) is very-very-very confusing. it is described
in documentation, but you need to be postgres expert or careful documentation
reader, to notice important difference between "char" and char.
What is the history if "char" type? Is it specified by some standard? May be it
is good point to create more understandable alias, like byte_char, ascii_char
or something for usage in practice, and keep "char" for backward compatibility
only.

2. I would totally agree with Tom Lane and Isaac Morland, that problem should
be also fixed on the side of type conversion. There is whole big thread about
it. Guess we should come to some conclusion there

3.Fixing out of bound reading for broken unicode is also important. Though
for now I am not quite sure it is possible.

> - p += pg_mblen(p);
> + {
> + int t = pg_mblen(p);
> + p += t;
> + max_copy_bytes -= t;
> + }

Minor issue: Here I would change variable name from "t" to "char_len" or
something, to make code more easy to understand.

Major issue: is pg_mblen function safe to call with broken encoding at the end
of buffer? What if last byte of the buffer is 0xF0 and you call pg_mblen for it?

>+ copy_bytes = p - s;
>+ if(copy_bytes > max_copy_bytes)
>+ copy_bytes = max_copy_bytes;

Here I would suggest to add comment about broken utf encoding case. That would
explain why we might come to situation when we can try to copy more than we
have.

I would also suggest to issue a warning here. I guess person that uses
postgres would prefer to know that he managed to stuff into postgres a string
with broken utf encoding, before it comes to some terrible consequences.

> Hi Spyridon,
>
> > The column "single_byte_col" is supposed to store only 1 byte.
> > Nevertheless, the INSERT command implicitly casts the '🀆' text into
> > "char". This means that only the first byte of '🀆' ends up stored in the
> > column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected
> > since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy will
> > copy 4 bytes instead of 1, hence an out of bounds memory read happens for
> > pointer 's', which effectively copies random bytes.
> Many thanks for reporting this!
>
> > - OS: Ubuntu 20.04
> > - PSQL version 14.4
>
> I can confirm the bug exists in the `master` branch as well and
> doesn't depend on the platform.
>
> Although the bug is easy to fix for this particular case (see the
> patch) I'm not sure if this solution is general enough. E.g. is there
> something that generally prevents pg_mblen() from doing out of bound
> reading in cases similar to this one? Should we prevent such an INSERT
> from happening instead?

--
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Koval 2022-07-14 11:20:16 Re: enable/disable broken for statement triggers on partitioned tables
Previous Message Gareth Palmer 2022-07-14 10:45:33 Re: [PATCH] Implement INSERT SET syntax