Re: A bug when use get_bit() function for a long bytea string

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: "ashutosh(dot)bapat" <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A bug when use get_bit() function for a long bytea string
Date: 2020-04-01 16:12:24
Message-ID: 20077.1585757544@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca> writes:
> [ long_bytea_string_bug_fix_ver5.patch ]

I don't think this has really solved the overflow hazards. For example,
in binary_encode we've got

resultlen = enc->encode_len(VARDATA_ANY(data), datalen);
result = palloc(VARHDRSZ + resultlen);

and all you've done about that is changed resultlen from int to int64.
On a 64-bit machine, sure, palloc will be able to detect if the
result exceeds what can be allocated --- but on a 32-bit machine
it'd be possible for the size_t argument to overflow altogether.
(Or if you want to argue that it can't overflow because no encoder
expands the data more than 4X, then we don't need to be making this
change at all.)

I don't think there's really any way to do that safely without an
explicit check before we call palloc.

I also still find the proposed signatures for the encoding-specific
functions to be just plain weird:

- unsigned (*encode_len) (const char *data, unsigned dlen);
- unsigned (*decode_len) (const char *data, unsigned dlen);
- unsigned (*encode) (const char *data, unsigned dlen, char *res);
- unsigned (*decode) (const char *data, unsigned dlen, char *res);
+ int64 (*encode_len) (const char *data, unsigned dlen);
+ int64 (*decode_len) (const char *data, unsigned dlen);
+ int64 (*encode) (const char *data, unsigned dlen, char *res);
+ int64 (*decode) (const char *data, unsigned dlen, char *res);

Why did you change the outputs from unsigned to signed? Why didn't
you change the dlen inputs? I grant that we know that the input
can't exceed 1GB in Postgres' usage, but these signatures are just
randomly inconsistent, and you didn't even add a comment explaining
why. Personally I think I'd make them like

uint64 (*encode_len) (const char *data, size_t dlen);

which makes it explicit that the dlen argument describes the length
of a chunk of allocated memory, while the result might exceed that.

Lastly, there is a component of this that can be back-patched and
a component that can't --- we do not change system catalog contents
in released branches. Cramming both parts into the same patch
is forcing the committer to pull them apart, which is kind of
unfriendly.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-01 16:13:54 Re: wraparound dangers in snapshot too old
Previous Message David Steele 2020-04-01 16:03:35 Re: pgbench - add pseudo-random permutation function