Re: A bug when use get_bit() function for a long bytea string

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-03-31 14:04:27
Message-ID: CAG-ACPWtjb2y5RaoWP8f76OmvSvNP3xwT82spa9VDLkNJEOdDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the changes,
+ int64 res,resultlen;

It's better to have them on separate lines.

-unsigned
+int64
hex_decode(const char *src, unsigned len, char *dst)

Do we want to explicitly cast the return value to int64? Will build on some
platform crib if not done so? I don't know of such a platform but my
knowledge in this area is not great.

+ byteNo = (int)(n / 8);
+ bitNo = (int)(n % 8);
some comment explaining why this downcasting is safe here?

- proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int4',
+ proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
prosrc => 'byteaGetBit' },
{ oid => '724', descr => 'set bit',
- proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int4
int4',
+ proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int8
int4',
prosrc => 'byteaSetBit' },

Shouldn't we have similar changes for following entries as well?
{ oid => '3032', descr => 'get bit',
proname => 'get_bit', prorettype => 'int4', proargtypes => 'bit int4',
prosrc => 'bitgetbit' },
{ oid => '3033', descr => 'set bit',
proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
prosrc => 'bitsetbit' },

The tests you have added are for bytea variant which ultimately calles
byteaGet/SetBit(). But I think we also need tests for bit variants which
will ultimately call bitgetbit and bitsetbit functions.

Once you address these comments, I think the patch is good for a committer.
So please mark the commitfest entry as such when you post the next version
of patch.

On Sat, 28 Mar 2020 at 14:40, movead(dot)li(at)highgo(dot)ca <movead(dot)li(at)highgo(dot)ca>
wrote:

> I want to resent the mail, because last one is in wrong format and hardly
>> to read.
>>
>> In addition, I think 'Size' or 'size_t' is rely on platform, they may
>> can't work on 32bit
>> system. So I choose 'int64' after ashutosh's review.
>>
>> >>I think the second argument indicates the bit position, which would be
>> max bytea length * 8. If max bytea length covers whole int32, the second
>> argument >needs to be wider i.e. int64.
>> >Yes, it makes sence and followed.
>>
>>
> >>I think we need a similar change in byteaGetBit() and byteaSetBit() as
> well.
> Sorry, I think it's my mistake, it is the two functions above should be
> changed.
>
>
>> >>Some more comments on the patch
>> >> struct pg_encoding
>> >> {
>> >>- unsigned (*encode_len) (const char *data, unsigned dlen);
>> >>+ int64 (*encode_len) (const char *data, unsigned dlen);
>> >> unsigned (*decode_len) (const char *data, unsigned dlen);
>> >> unsigned (*encode) (const char *data, unsigned dlen, char *res);
>> >> unsigned (*decode) (const char *data, unsigned dlen, char *res);
>> >> Why not use return type of int64 for rest of the functions here as
>> well?
>> >> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>> >> /* Make this FATAL 'cause we've trodden on memory ... */
>> >>- if (res > resultlen)
>> >>+ if ((int64)res > resultlen)
>> >>
>> >>if we change return type of all those functions to int64, we won't
>> need this cast.
>> >I change the 'encode' function, it needs an int64 return type, but keep
>> other
>>
>> >functions in 'pg_encoding', because I think it of no necessary reason.
>>
>>
> >>Ok, let's leave it for a committer to decide.
> Well, I change all of them this time, because Tom Lane supports on next
> mail.
>
>
> >Some more review comments.
> >+ int64 res,resultlen;
> >We need those on separate lines, possibly.
> Done
>
> >+ byteNo = (int32)(n / BITS_PER_BYTE);
> >Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise,
> please
> >add a comment explaining the reason for the cast. The comment applies at
> other
> >places where this change appears.
> >- int len;
> >+ int64 len;
> >Why do we need this change?
> > int i;
> It is my mistake as describe above, it should not be 'bitgetbit()/
> bitsetbit()' to be changed.
>
>
>>
>> >>It might help to add a test where we could pass the second argument
>> something
>> >>greater than 1G. But it may be difficult to write such a test case.
>> >Add two test cases.
>>
>>
> >+
> >+select get_bit(
> >+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024
> * 1024 * 1024 + 1, 0)
> >+ ,1024 * 1024 * 1024 + 1);
>
> >This bit position is still within int4.
> >postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
> > pg_column_size
> >----------------
> > 4
> >(1 row)
>
> >You want something like
> >postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
> > pg_column_size
> >----------------
> > 8
> >(1 row)
> I intend to test size large then 1G, and now I think you give a better
> idea and followed.
>
>
>
> ------------------------------
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
>
>

--
Best Wishes,
Ashutosh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-03-31 14:08:55 Re: WAL usage calculation patch
Previous Message Dilip Kumar 2020-03-31 14:02:35 Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)