| From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> | 
|---|---|
| To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> | 
| Cc: | ranier(dot)vf(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c) | 
| Date: | 2022-02-17 07:50:09 | 
| Message-ID: | 20220217075009.73wxg4cetspkgexz@jrouhaud | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Thu, Feb 17, 2022 at 03:51:26PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 17 Feb 2022 14:58:38 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in 
> > (Sorry for the broken mail...)
> > > >
> > > Ok, if -1 is wrong, what should the value of return if
> > > somebody calling this function like:
> > > pg_encoding_max_length(63);
> > 
> > Should result in assertion failure, I think.  If that fails, the
> > caller side is anyhow broken.  On the other hand we haven't had a
> > complain about that, maybe.
> > 
> > > Of course, with patch applied, because with original code
> > > has memory corruption, if built and run without DEBUG.
> > 
> > So we don't assume corruption in production build.  It should be
> > logically guaranteed.
> > 
> > I'll dig into that further.
> 
> The number comes from pg_char_to_encoding, which is the internal
> function of the SQL function PG_char_encoding(). It looks fine but I
> can confirm that for all possible encoding names in pg_encname_tbl[].
> 
> =# select * from (
> select e as name, PG_char_to_encoding(e) enc
> from unnest(array['abc', 'alt', 'big5', 'euccn', 'eucjis2004', 'eucjp',
>     'euckr', 'euctw', 'gb18030', 'gbk', 'iso88591', 'iso885910', 'iso885913',
> 	'iso885914', 'iso885915', 'iso885916', 'iso88592', 'iso88593', 'iso88594',
> 	'iso88595', 'iso88596', 'iso88597', 'iso88598', 'iso88599', 'johab',
> 	'koi8', 'koi8r', 'koi8u', 'latin1', 'latin10', 'latin2', 'latin3',
> 	'latin4', 'latin5', 'latin6', 'latin7', 'latin8', 'latin9', 'mskanji',
> 	'muleinternal', 'shiftjis', 'shiftjis2004', 'sjis', 'sqlascii', 'tcvn',
> 	'tcvn5712', 'uhc', 'unicode', 'utf8', 'vscii', 'win', 'win1250', 'win1251',
> 	'win1252', 'win1253', 'win1254', 'win1255', 'win1256', 'win1257',
> 	'win1258', 'win866', 'win874', 'win932', 'win936', 'win949', 'win950',
> 	'windows1250', 'windows1251', 'windows1252', 'windows1253', 'windows1254',
> 	'windows1255', 'windows1256', 'windows1257', 'windows1258', 'windows866',
> 	'windows874', 'windows932', 'windows936', 'windows949', 'windows950',
> 	'hoge']) as e) as t where enc < 0 or enc > 41;
>  name | enc 
> ------+-----
>  hoge |  -1
> (1 row)
> 
> So, the function doesn't return 63 for all registered names and wrong
> names.
> 
> So other possibilities I can think of are..
> - Someone had broken pg_encname_tbl[]
> - Cosmic ray hit, or ill memory cell.
> - Coverity worked wrong way.
> 
> Could you show the workload for the Coverity warning here?
The 63 upthread was hypothetical right?  pg_encoding_max_length() shouldn't be
called with user-dependent data (unlike pg_encoding_max_length_sql()), so I
also don't see any value spending cycles in release builds.  The error should
only happen with bogus code, and assert builds are there to avoid that, or
corrupted memory and in that case we can't make any promise.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2022-02-17 07:52:49 | Re: Time to drop plpython2? | 
| Previous Message | Kyotaro Horiguchi | 2022-02-17 07:50:01 | Re: Make mesage at end-of-recovery less scary. |