Re: A bug when use get_bit() function for a long bytea string

From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: ashutosh(dot)bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-03-18 02:48:09
Message-ID: 2020031810480643107833@highgo.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello thanks for the detailed review,

>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.
Yes, it makes sence and followed.

> Some more comments on the patch
> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
> unsigned (*decode_len) (const char *data, unsigned dlen);
> unsigned (*encode) (const char *data, unsigned dlen, char *res);
> unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
> /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this cast.
I change the 'encode' function, it needs an int64 return type, but keep other
functions in 'pg_encoding', because I think it of no necessary reason.

>Right now we are using int64 because bytea can be 1GB, but what if we increase
>that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
>future, but nevertheless a possibility. Should we then define a new datatype
>which resolves to int64 right now but really depends upon the bytea length. I
>am not suggesting that we have to do it right now, but may be something to
>think about.
I decide to use int64 because if we really want to increase the limit, it should be
the same change with 'text', 'varchar' which have the same limit. So it may need
a more general struct. Hence I give up the idea.

> hex_enc_len(const char *src, unsigned srclen)
> {
>- return srclen << 1;
>+ return (int64)(srclen << 1);
>
>why not to convert srclen also to int64. That will also change the pg_encoding
>member definitions. But if encoded length requires int64 to fit the possibly
>values, same would be true for the source lengths. Why can't the source length
>also be int64?
>If still we want the cast, I think it should be ((int64)srclen << 1) rather
>than casting the result.
I prefer the '((int64)srclen << 1)' way.

> /* 3 bytes will be converted to 4, linefeed after 76 chars */
>- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
>+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
>similar comments as above.
Followed.

>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.
Add two test cases.

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachment Content-Type Size
long_bytea_string_bug_fix_ver3.patch application/octet-stream 9.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-18 02:48:37 Re: ALTER tbl rewrite loses CLUSTER ON index
Previous Message Michael Paquier 2020-03-18 02:31:53 Re: Standby got fatal after the crash recovery