Re: custom function for converting human readable sizes to bytes

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom function for converting human readable sizes to bytes
Date: 2016-01-22 06:48:13
Message-ID: CAFj8pRBNnaPi7Yy7poxo7v5ps_uDrR9134Yj0xWO5Sh0ywztBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2016-01-22 7:03 GMT+01:00 Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>:

> On 1/20/16, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> > ...
> > New version is attached
> >
> > Regards
> > Pavel
>
> Hello!
>
> 1. Why the function is marked as VOLATILE? It depends only on input
> value, it does change nothing in the DB.
> I guess it should be IMMUTABLE for efficient caching its result.
>
> 2.
> > text *arg = PG_GETARG_TEXT_PP(0);
> ...
> > str = text_to_cstring(arg);
> >
> Hmm... My question was _why_ you get TEXT and convert it instead of
> getting an argument of type cstring (not about usage of
> VARSIZE_ANY_EXHDR).
> It was enough to give me a link[1] and leave the usage of
> VARSIZE_ANY_EXHDR as is.
>
>
>
> I think all the other claims below are mostly cosmetic. Main behavior
> is OK (considering its usage will not be in heavy queries).
>
> ===
> Documentation:
> Besides currently added row in a table I think it is worth to add
> detailed comment after a block "<function>pg_size_pretty</> can be
> used" similar to (but the best way is to get advice for a correct
> phrase):
> "pg_size_bytes can be used to get a size in bytes as bigint from a
> human-readable format for a comparison purposes (it is the opposite to
> pg_size_pretty function). The format is numeric with an optional size
> unit: kB, MB, GB or TB."
> (and delete unit sizes from the table row).
>
> ===
> Tests:
> Since there was a letter with an explanation why units bigger than
> "TB" can't be used[2], it is a good reason to add that size units
> ("PB", "EB", "ZB") in tests with a note to update the documentation
> part when that unit are supported.
>
> ===
> Code style:
> 1. When you declare pointers, you must align _names_ by the left
> border, i.e. asterisks must be at one position to the left from the
> aligned names, i.e. one to three spaces instead of TAB before them.
>
> 2.
> > int multiplier;
> One more TAB to align it with the name at the next line.
>
> ===
> Code:
>
> > /* allow whitespace between integer and unit */
> May be "numeric" instead of "integer"?
>
> > "\"%s\" is not in expected format"
> It is not clear what format is expected.
>
> > /* ignore plus symbol */
> It seems it is not ignored, but copied... ;-)
>
> > to ger hintstr
> s/ger/get/
> > Elsewhere is used space trimmed buffer.
> I'd write it as "Otherwise trimmed value is used."
> I'm not good at English, but that full block looks little odd for me.
> I tried to reword it in the attachment single_buffer.c, but I don't
> think it is enough.
>
> > We allow ending spaces.
> "trailing"?
>
> > but this message can be little bit no intuitive - better text is "is not
> a valid number"
> It was a block with a comment "don't allow empty string", i.e. absence
> of a number...
>
> > Automatic memory deallocation doesn't cover all possible situations where
> > the function can be used - for example DirectFunctionCall - so explicit
> > deallocation can descrease a memory requirements when you call these
> > functions from C.
> DirectFunctionCall uses a memory context of the caller. For example,
> you don't free "num" allocated by numeric_in and numeric_mul, also
> mul_num allocated by int8_numeric...
> I'd ask experienced hackers for importance of freeing (or leaving)
> "str" and "buffer".
>

Sometimes, the memory can be released by caller only - or the memory usage
is pretty minimal. But when you can release, memory then you should to do
it to decrease push to memory allocator.

>
> > postgres=# select pg_size_bytes('+912+ kB');
> > ERROR: invalid unit: "+ kB"
> > This is difficult - you have to divide string to two parts and first
> world is number, second world is unit.
> Your code looks like a duplicated (not by lines, but by behavior).
> numeric_in has similar checks, let it do them for you. The goal of
> your function is just split mantissa and size unit, and if the last
> one is found, turn it into an exponent.
> See my example in the attachment check_mantissa_by_numeric_in.c. There
> is just searching non-space char that can't be part of numeric. Then
> all before it passes to numeric_in and let it check anything it
> should. I even left first spaces to show full numeric part to a user
> if an error occurs (for any case).
> The second attachment single_buffer.c is an attempt to avoid
> allocating the second buffer (the first is "str" allocated by
> text_to_cstring) and copying into it. I don't think it gives a big
> improvement, but nevertheless.
>

This code is little bit more complex than it is necessary (but few lines
more) to produce readable error messages. I am thinking so it is correct.

I'll look on this patch next week.

Thank you for review

Regards

Pavel

>
> ===
> [1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us
> [2]
> http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com
>
> --
> Best regards,
> Vitaly Burovoy
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-01-22 07:46:44 Re: WIP: Failover Slots
Previous Message Vitaly Burovoy 2016-01-22 06:31:03 Re: Extracting fields from 'infinity'::TIMESTAMP[TZ]