Re: Re: A bug when use get_bit() function for a long bytea string

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-03-16 16:28:24
Message-ID: CAG-ACPXeQEuvdVBzz+NWgFYoBNwcJfv1W=pJExp4LM_3fgar6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 13 Mar 2020 at 08:48, movead(dot)li(at)highgo(dot)ca <movead(dot)li(at)highgo(dot)ca>
wrote:

> Thanks for the reply.
>
> >Why have you used size? Shouldn't we use int64?
> Yes, thanks for the point, I have changed the patch.
>
>

Thanks for the patch.

> >If get_bit()/set_bit() accept the second argument as int32, it can not
> >be used to set bits whose number does not fit 32 bits. I think we need
> >to change the type of the second argument as well.
> Because int32 can cover the length of bytea that PostgreSQL support,
>

I think the second argument indicates the bit position, which would be max
bytea length * 8. If max bytea length covers whole int32, the second
argument needs to be wider i.e. int64.

Some more comments on the patch
struct pg_encoding
{
- unsigned (*encode_len) (const char *data, unsigned dlen);
+ int64 (*encode_len) (const char *data, unsigned dlen);
unsigned (*decode_len) (const char *data, unsigned dlen);
unsigned (*encode) (const char *data, unsigned dlen, char *res);
unsigned (*decode) (const char *data, unsigned dlen, char *res);

Why not use return type of int64 for rest of the functions here as well?

res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));

/* Make this FATAL 'cause we've trodden on memory ... */
- if (res > resultlen)
+ if ((int64)res > resultlen)

if we change return type of all those functions to int64, we won't need
this cast.

Right now we are using int64 because bytea can be 1GB, but what if we
increase
that limit tomorrow, will int64 be sufficient? That may be unlikely in the
near
future, but nevertheless a possibility. Should we then define a new datatype
which resolves to int64 right now but really depends upon the bytea length.
I
am not suggesting that we have to do it right now, but may be something to
think about.

hex_enc_len(const char *src, unsigned srclen)
{
- return srclen << 1;
+ return (int64)(srclen << 1);

why not to convert srclen also to int64. That will also change the
pg_encoding
member definitions. But if encoded length requires int64 to fit the possibly
values, same would be true for the source lengths. Why can't the source
length
also be int64?

If still we want the cast, I think it should be ((int64)srclen << 1) rather
than casting the result.

/* 3 bytes will be converted to 4, linefeed after 76 chars */
- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
similar comments as above.

SELECT set_bit(B'0101011000100100', 16, 1); -- fail
ERROR: bit index 16 out of valid range (0..15)
+SELECT get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 0)
+ ,0);
+ get_bit
+---------
+ 0
+(1 row)

It might help to add a test where we could pass the second argument
something
greater than 1G. But it may be difficult to write such a test case.

--
Best Wishes,
Ashutosh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thunder 2020-03-16 16:33:36 Standby got fatal after the crash recovery
Previous Message Tom Lane 2020-03-16 16:12:52 Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library