Re: custom function for converting human readable sizes to bytes

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom function for converting human readable sizes to bytes
Date: 2016-01-20 18:44:20
Message-ID: CAFj8pRA5_JQ+2ytwkFefeenWG8v=1+jo_MmYc86BUr_t3vwtmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2016-01-19 4:45 GMT+01:00 Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>:

> On 1/4/16, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> > 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <
> oleksandr(dot)shulgin(at)zalando(dot)de> :
> >> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> >> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <
> oleksandr(dot)shulgin(at)zalando(dot)de>:
> >> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> >> >>
> >> >> I'm also inclined on dropping that explicit check for empty string
> >> >> below and let numeric_in() error out on that. Does this look OK, or
> can
> >> >> it confuse someone:
> >> >> postgres=# select pg_size_bytes('');
> >> >> ERROR: invalid input syntax for type numeric: ""
> >> >
> >> > both fixed
> >>
> >> Hm...
> >>
> >> > + switch (*strptr)
> >> > + {
> >> > + /* ignore plus symbol */
> >> > + case '+':
> >> > + case '-':
> >> > + *bufptr++ = *strptr++;
> >> > + break;
> >> > + }
> >>
> >> Well, to that amount you don't need any special checks, I'm just not
> sure
> >> if reported error message is not misleading if we let numeric_in()
> handle
> >> all the errors. At least it can cope with the leading spaces, +/- and
> >> empty input quite well.
> >>
> >
> > I don't would to catch a exception from numeric_in - so I try to solve
> some
> > simple situations, where I can raise possible better error message.
>
> There are several cases where your behavior gives strange errors (see
> below).
>
> Next batch of notes:
>
> src/include/catalog/pg_proc.h:
> ---
> + DATA(insert OID = 3317 ( pg_size_bytes...
> now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free
>

fixed

>
> ---
> + DESCR("convert a human readable text with size units to big int bytes");
> May be the best way is to copy the first sentence from the doc?
> ("convert a size in human-readable format with size units into bytes")
>

fixed

> ====
> src/backend/utils/adt/dbsize.c:
> + text *arg = PG_GETARG_TEXT_PP(0);
> + char *str = text_to_cstring(arg);
> ...
> + /* working buffer cannot be longer than original string */
> + buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
> Is there any reason to get TEXT for only converting it to cstring and
> call VARSIZE_ANY_EXHDR instead of strlen?
>

performance - but these strings should be short, so I can use strlen

fixed

>
> ---
> + text *arg = PG_GETARG_TEXT_PP(0);
> + char *str = text_to_cstring(arg);
> + char *strptr = str;
> + char *buffer;
> There are wrong offsets of variable names after their types (among all
> body of the "pg_size_bytes" function).
> See variable declarations in nearby functions (
> >> "make the new code look like the existing code around it"
> http://www.postgresql.org/docs/devel/static/source-format.html
> )
>
>
fixed

> ---
> + errmsg("\"%s\" is not number",
> str)));
> s/is not number/is not a number/
> (the second version can be found in a couple places besides translations)
>

fixed

but this message can be little bit no intuitive - better text is "is not a
valid number"

>
> ---
> + if (*strptr != '\0')
> ...
> + while (*strptr && !isspace(*strptr))
> Sometimes it explicitly compares to '\0', sometimes implicitly.
> Common use is explicit comparison and it is preferred due to different
> compilers (their conversions to boolean).
>

fixed

>
> ---
> + /* Skip leading spaces */
> ...
> + /* ignore plus symbol */
> ...
> + /* copy digits to working buffer */
> ...
> + /* allow whitespace between integer and unit */
> I'm also inclined on dropping that explicit skipping spaces, checking
> for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
> and let numeric_in() error out on that.
>

This is difficult - you have to divide string to two parts and first world
is number, second world is unit.

For example "+912+ kB" is correct number +912 and broken unit "+ kB".

> It allows to get correct error messages for something like:
> postgres=# select pg_size_bytes('.+912');
> ERROR: invalid input syntax for type numeric: ".+912"
> postgres=# select pg_size_bytes('+912+ kB');
> ERROR: invalid input syntax for type numeric: "+912+ "
> postgres=# select pg_size_bytes('++123 kB');
> ERROR: invalid input syntax for type numeric: "++123 "
>
> instead of current:
> postgres=# select pg_size_bytes('.+912');
> ERROR: invalid input syntax for type numeric: "."
> postgres=# select pg_size_bytes('+912+ kB');
> ERROR: invalid unit: "+ kB"
> postgres=# select pg_size_bytes('++123 kB');
> ERROR: invalid input syntax for type numeric: "+"
>
>
I redesigned this check. Now it is

popostgres=# select pg_size_bytes('.+912');
ERROR: 22023: ".+912" is not a valid number
stgres=# select pg_size_bytes('++123 kB');
ERROR: 22023: "++123 kB" is not a valid number

> ---
> + while (isspace((unsigned char) *strptr))
> ...
> + while (isspace(*strptr))
> ...
> + while (*strptr && !isspace(*strptr))
> ...
> + while (isspace(*strptr))
> The first occurece of isspace's parameter is casting to "unsigned
> char" whereas the others are not.
> Note:
> "The behavior is undefined if the value of ch is not representable as
> unsigned char and is not equal to EOF"
>

> Proof:
> http://en.cppreference.com/w/c/string/byte/isspace

fixed

>
>
> ---
> + pfree(buffer);
> + pfree(str);
> pfree-s here are not necessary. See:
> http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)
>

Automatic memory deallocation doesn't cover all possible situations where
the function can be used - for example DirectFunctionCall - so explicit
deallocation can descrease a memory requirements when you call these
functions from C.

New version is attached

Regards

Pavel

>
> --
> Best regards,
> Vitaly Burovoy
>

Attachment Content-Type Size
pg-size-bytes-09.patch text/x-patch 12.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-20 18:44:44 Re: Releasing in September
Previous Message Andres Freund 2016-01-20 18:29:29 Re: Releasing in September