Re: A bug when use get_bit() function for a long bytea string

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"ashutosh(dot)bapat" <ashutosh(dot)bapat(at)2ndquadrant(dot)com>,"pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-04-02 16:58:59
Message-ID: 84525289-3891-4368-9157-5452f53a6908@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

movead(dot)li(at)highgo(dot)ca wrote:

> A little update for the patch, and patches for all stable avilable.

Some comments about the set_bit/get_bit parts.
I'm reading long_bytea_string_bug_fix_ver6_pg10.patch, but that
applies probably to the other files meant for the existing releases
(I think you could get away with only one patch for backpatching
and one patch for v13, and committers will sort out how
to deal with release branches).

byteaSetBit(PG_FUNCTION_ARGS)
{
bytea *res = PG_GETARG_BYTEA_P_COPY(0);
- int32 n = PG_GETARG_INT32(1);
+ int64 n = PG_GETARG_INT64(1);
int32 newBit = PG_GETARG_INT32(2);

The 2nd argument is 32-bit, not 64. PG_GETARG_INT32(1) must be used.

+ errmsg("index "INT64_FORMAT" out of valid range, 0.."INT64_FORMAT,
+ n, (int64)len * 8 - 1)));

The cast to int64 avoids the overflow, but it may also produce a
result that does not reflect the actual range, which is limited to
2^31-1, again because the bit number is a signed 32-bit value.

I believe the formula for the upper limit in the 32-bit case is:
(len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX;

These functions could benefit from a comment mentioning that
they cannot reach the full extent of a bytea, because of the size limit
on the bit number.

--- a/src/test/regress/expected/bit.out
+++ b/src/test/regress/expected/bit.out
@@ -656,6 +656,40 @@ SELECT set_bit(B'0101011000100100', 15, 1);

SELECT set_bit(B'0101011000100100', 16, 1); -- fail
ERROR: bit index 16 out of valid range (0..15)+SELECT get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 0)
+ ,0);
+ get_bit
+---------
+ 0
+(1 row)
+
+SELECT get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 1)
+ ,0);
+ get_bit
+---------
+ 1
+(1 row)
+

These 2 tests need to allocate big chunks of contiguous memory, so they
might fail for lack of memory on tiny machines, and even when not failing,
they're pretty slow to run. Are they worth the trouble?

+select get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea,
+ 512::bigint * 1024 * 1024 * 8 - 1, 0)
+ ,512::bigint * 1024 * 1024 * 8 - 1);
+ get_bit
+---------
+ 0
+(1 row)
+
+select get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea,
+ 512::bigint * 1024 * 1024 * 8 - 1, 1)
+ ,512::bigint * 1024 * 1024 * 8 - 1);
+ get_bit
+---------
+ 1
+(1 row)

These 2 tests are supposed to fail in existing releases because set_bit()
and get_bit() don't take a bigint as the 2nd argument.
Also, the same comment as above on how much they allocate.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-04-02 17:03:16 Re: control max length of parameter values logged
Previous Message Andres Freund 2020-04-02 16:40:15 Re: snapshot too old issues, first around wraparound and then more.