Re: custom function for converting human readable sizes to bytes

From: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom function for converting human readable sizes to bytes
Date: 2016-01-19 03:45:10
Message-ID: CAKOSWN=32VE2BQ9D2YA=VwDKv7Y8pCgD5Ordnj+0rcit7v8coQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/4/16, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <oleksandr(dot)shulgin(at)zalando(dot)de> :
>> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr(dot)shulgin(at)zalando(dot)de>:
>> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> >>
>> >> I'm also inclined on dropping that explicit check for empty string
>> >> below and let numeric_in() error out on that. Does this look OK, or can
>> >> it confuse someone:
>> >> postgres=# select pg_size_bytes('');
>> >> ERROR: invalid input syntax for type numeric: ""
>> >
>> > both fixed
>>
>> Hm...
>>
>> > + switch (*strptr)
>> > + {
>> > + /* ignore plus symbol */
>> > + case '+':
>> > + case '-':
>> > + *bufptr++ = *strptr++;
>> > + break;
>> > + }
>>
>> Well, to that amount you don't need any special checks, I'm just not sure
>> if reported error message is not misleading if we let numeric_in() handle
>> all the errors. At least it can cope with the leading spaces, +/- and
>> empty input quite well.
>>
>
> I don't would to catch a exception from numeric_in - so I try to solve some
> simple situations, where I can raise possible better error message.

There are several cases where your behavior gives strange errors (see below).

Next batch of notes:

src/include/catalog/pg_proc.h:
---
+ DATA(insert OID = 3317 ( pg_size_bytes...
now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free

---
+ DESCR("convert a human readable text with size units to big int bytes");
May be the best way is to copy the first sentence from the doc?
("convert a size in human-readable format with size units into bytes")

====
src/backend/utils/adt/dbsize.c:
+ text *arg = PG_GETARG_TEXT_PP(0);
+ char *str = text_to_cstring(arg);
...
+ /* working buffer cannot be longer than original string */
+ buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
Is there any reason to get TEXT for only converting it to cstring and
call VARSIZE_ANY_EXHDR instead of strlen?

---
+ text *arg = PG_GETARG_TEXT_PP(0);
+ char *str = text_to_cstring(arg);
+ char *strptr = str;
+ char *buffer;
There are wrong offsets of variable names after their types (among all
body of the "pg_size_bytes" function).
See variable declarations in nearby functions (
>> "make the new code look like the existing code around it"
http://www.postgresql.org/docs/devel/static/source-format.html
)

---
+ errmsg("\"%s\" is not number", str)));
s/is not number/is not a number/
(the second version can be found in a couple places besides translations)

---
+ if (*strptr != '\0')
...
+ while (*strptr && !isspace(*strptr))
Sometimes it explicitly compares to '\0', sometimes implicitly.
Common use is explicit comparison and it is preferred due to different
compilers (their conversions to boolean).

---
+ /* Skip leading spaces */
...
+ /* ignore plus symbol */
...
+ /* copy digits to working buffer */
...
+ /* allow whitespace between integer and unit */
I'm also inclined on dropping that explicit skipping spaces, checking
for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
and let numeric_in() error out on that.

It allows to get correct error messages for something like:
postgres=# select pg_size_bytes('.+912');
ERROR: invalid input syntax for type numeric: ".+912"
postgres=# select pg_size_bytes('+912+ kB');
ERROR: invalid input syntax for type numeric: "+912+ "
postgres=# select pg_size_bytes('++123 kB');
ERROR: invalid input syntax for type numeric: "++123 "

instead of current:
postgres=# select pg_size_bytes('.+912');
ERROR: invalid input syntax for type numeric: "."
postgres=# select pg_size_bytes('+912+ kB');
ERROR: invalid unit: "+ kB"
postgres=# select pg_size_bytes('++123 kB');
ERROR: invalid input syntax for type numeric: "+"

---
+ while (isspace((unsigned char) *strptr))
...
+ while (isspace(*strptr))
...
+ while (*strptr && !isspace(*strptr))
...
+ while (isspace(*strptr))
The first occurece of isspace's parameter is casting to "unsigned
char" whereas the others are not.
Note:
"The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF"
Proof:
http://en.cppreference.com/w/c/string/byte/isspace

---
+ pfree(buffer);
+ pfree(str);
pfree-s here are not necessary. See:
http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)

--
Best regards,
Vitaly Burovoy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-01-19 03:57:33 Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
Previous Message Amit Kapila 2016-01-19 03:41:30 Re: RFC: replace pg_stat_activity.waiting with something more descriptive