Re: A bug when use get_bit() function for a long bytea string

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-03-26 13:31:59
Message-ID: CAG-ACPUbQwpCHhn_KkLAGEc08A91yZhARazj2BGjXvK+eFCVog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Mar 2020 at 08:18, movead(dot)li(at)highgo(dot)ca <movead(dot)li(at)highgo(dot)ca>
wrote:

>
> Hello thanks for the detailed review,
>
> >I think the second argument indicates the bit position, which would be
> max bytea length * 8. If max bytea length covers whole int32, the second
> argument >needs to be wider i.e. int64.
> Yes, it makes sence and followed.
>
>
I think we need a similar change in byteaGetBit() and byteaSetBit() as well.

>
> > Some more comments on the patch
> > struct pg_encoding
> > {
> >- unsigned (*encode_len) (const char *data, unsigned dlen);
> >+ int64 (*encode_len) (const char *data, unsigned dlen);
> > unsigned (*decode_len) (const char *data, unsigned dlen);
> > unsigned (*encode) (const char *data, unsigned dlen, char *res);
> > unsigned (*decode) (const char *data, unsigned dlen, char *res);
> > Why not use return type of int64 for rest of the functions here as well?
> > res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
> > /* Make this FATAL 'cause we've trodden on memory ... */
> >- if (res > resultlen)
> >+ if ((int64)res > resultlen)
> >
> >if we change return type of all those functions to int64, we won't need
> this cast.
> I change the 'encode' function, it needs an int64 return type, but keep
> other
>
> functions in 'pg_encoding', because I think it of no necessary reason.
>
>
Ok, let's leave it for a committer to decide.

> >Right now we are using int64 because bytea can be 1GB, but what if we
> increase
> >that limit tomorrow, will int64 be sufficient? That may be unlikely in
> the near
> >future, but nevertheless a possibility. Should we then define a new
> datatype
> >which resolves to int64 right now but really depends upon the bytea
> length. I
> >am not suggesting that we have to do it right now, but may be something to
> >think about.
> I decide to use int64 because if we really want to increase the limit, it
> should be
> the same change with 'text', 'varchar' which have the same limit. So it may
> need
> a more general struct. Hence I give up the idea.
>
>
Hmm, Let's see what a committer says.

Some more review comments.
+ int64 res,resultlen;

We need those on separate lines, possibly.

+ byteNo = (int32)(n / BITS_PER_BYTE);
Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise,
please
add a comment explaining the reason for the cast. The comment applies at
other
places where this change appears.

- int len;
+ int64 len;
Why do we need this change?
int i;

>
>
> >It might help to add a test where we could pass the second argument
> something
> >greater than 1G. But it may be difficult to write such a test case.
> Add two test cases.
>
>
+
+select get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 *
1024 * 1024 + 1, 0)
+ ,1024 * 1024 * 1024 + 1);

This bit position is still within int4.
postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
pg_column_size
----------------
4
(1 row)

You want something like
postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
pg_column_size
----------------
8
(1 row)

--
Best Wishes,
Ashutosh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-26 13:50:11 Re: pgsql: Provide a TLS init hook
Previous Message Julien Rouhaud 2020-03-26 13:22:42 Re: Planning counters in pg_stat_statements (using pgss_store)