Re: A bug when use get_bit() function for a long bytea string

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, ashutosh(dot)bapat(at)2ndquadrant(dot)com
Subject: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-03-12 16:19:25
Message-ID: CAExHW5vUSi4-oPvgMdFWvzuyaTmMtbY=JgdZ+u3T7ZdzQmubig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

On Thu, Mar 12, 2020 at 9:21 AM movead(dot)li(at)highgo(dot)ca <movead(dot)li(at)highgo(dot)ca> wrote:
>
> Hello hackers,
>
> I found an issue about get_bit() and set_bit() function,here it is:
> ################################
> postgres=# select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
> 2020-03-12 10:05:23.296 CST [10549] ERROR: index 0 out of valid range, 0..-1
> 2020-03-12 10:05:23.296 CST [10549] STATEMENT: select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
> ERROR: index 0 out of valid range, 0..-1
> postgres=# select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
> 2020-03-12 10:05:27.959 CST [10549] ERROR: index 0 out of valid range, 0..-1
> 2020-03-12 10:05:27.959 CST [10549] STATEMENT: select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
> ERROR: index 0 out of valid range, 0..-1
> postgres=#
> ################################
> PostgreSQL can handle bytea size nearby 1G, but now it reports an
> error when 512M. And I research it and found it is byteaSetBit() and
> byteaGetBit(), it uses an 'int32 len' to hold bit numbers for the long
> bytea data, and obvious 512M * 8bit is an overflow for an int32.
> So I fix it and test ok, as below.
> ################################

Thanks for the bug report and the analysis. The analysis looks correct.

> postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1),0); get_bit --------- 1 (1 row) postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,0),0); get_bit --------- 0 (1 row) postgres=#
> ################################
>
>
> And I do a check about if anything else related bytea has this issue, several codes have the same issue:
> 1. byteaout() When formatting bytea as an escape, the 'len' variable should be int64, or
> it may use an overflowing number. 2. esc_enc_len() Same as above, the 'len' variable should be int64, and the return type
> should change as int64. Due to esc_enc_len() has same call struct with pg_base64_enc_len() and hex_enc_len(), so I want to change the return value of the two function. And the opposite function esc_dec_len() seem nothing wrong. 3. binary_encode() and binary_decode() Here use an 'int32 resultlen' to accept an 'unsigned int' function return, which seem unfortable.
> I fix all mentioned above, and patch attachments.
> How do you think about that?

Why have you used size? Shouldn't we use int64?

Also in the change
@@ -3458,15 +3458,15 @@ byteaGetBit(PG_FUNCTION_ARGS)
int32 n = PG_GETARG_INT32(1);
int byteNo,
bitNo;
- int len;
+ Size len;

If get_bit()/set_bit() accept the second argument as int32, it can not
be used to set bits whose number does not fit 32 bits. I think we need
to change the type of the second argument as well.

Also, I think declaring len to be int is fine since 1G would fit an
int, but what does not fit is len * 8, when performing that
calculation, we have to widen the result. So, instead of changing the
datatype of len, it might be better to perform the calculation as
(int64)len * 8. If we use int64, we could also use INT64_FORMAT
instead of using %ld.

Since this is a bug it shouldn't wait another commitfest, but still
add this patch to the commitfest so that it's not forgotten.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-12 16:39:41 Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library
Previous Message Tom Lane 2020-03-12 16:01:24 Re: control max length of parameter values logged