Re: custom function for converting human readable sizes to bytes

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom function for converting human readable sizes to bytes
Date: 2016-01-26 07:05:00
Message-ID: CAFj8pRAvQMcHi-vm0jaF8_DGXG7h83PJqt4932WqCLY+mpCq-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2016-01-26 6:25 GMT+01:00 Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>:

> Hello!
>
>
> Regression tests are present (see p.II).
>
> pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.
>

this is not a part of patch - only a commiter knows CATALOG_VERSION_NO

>
> Code comments, error message string, DESCR in pg_proc.h and doc
> changes need proof-reading by a native English speaker, which I am
> not.
>
>
> ===
> The last patch fixes only part of my last notes, so I can only return
> it to the next round with the same notes.
>
> The list of my notes has not been done (all points listed below was
> mentioned in my last review[1]):
>
> 1. The documentation has duplicate parts of size units (in table and
> in the text). I think that one of them should be deleted (from the
> table).
>

fixed

>
> 2. Test has a note about sharing(using) size units via(from)
> memory_unit_conversion_table and a requirement to update the
> documentation when new units are supported, but the note will not be
> seen because there is no tests for that fail for units ("PB", "EB",
> "ZB") which will change in the future by the
> "memory_unit_conversion_table".
>

fixed

> 3. Code design is looked a little odd for me.
>
> 3a. There is an attempt to check a correctness of a numeric value
> instead of using numeric_in which does the same thing. In the last
> case the code looks more compact and nice (see the full explanation in
> the previous review[1] and both PoC there).
>
> Possibly that way is chosen to emit its own error messages ("invalid
> size:" instead of "invalid input syntax for type numeric:"), but it
> still leads errors like 'invalid unit: "+ kB"' for input value "+912+
> kB" (see "dbsize.out").
>

In tried to enhanced this point (I don't agree, so this is a bug)

>
> 3b. There is an extra copying from the buffer "str" to the "buffer"
> (it can be avoided and shown in an PoC "single_buffer.c").
>
> 3c. There is freeing buffers "str" and "buffer" that looks as a
> counter-design since MemoryContext frees them at once with other
> expired buffers (like "num" from numeric_in and numeric_mul; also
> "mul_num" from int8_numeric)
>

Numeric operations are on critical path for OLAP applications - so the code
is rewritten to minimize palloc/free operations. I don't expect usage
pg_size_bytes in OLAP critical path.

>
>
> ===
> While I was writing that text I found five additional notes. Both
> parts (above and below) are important (see concusion).
>
> I. The new version of the patch (pg-size-bytes-10.patch) has the
> simplest typo in an updated (from v9) row "inavalid size".
>

fixed

>
> II. There is no test for an empty input value, but it works as
> expected (an error 'invalid size: ""').
>

fixed

>
> III. There is no support of 'bytes' unit, it seems such behavior got
> majority approval[2].
>

No, because I have to use the supported units by configuration. The
configuration supports only three chars long units. Support for "bytes" was
removed, when I removed proprietary unit table.

>
> IV. Code design of the function "parse_memory_unit" is returning
> "bool" value and set "multiplier" via pointer. But while Tom Lane was
> reviewed my patch[3] in the similar case he changed code that now
> returns (see NonFiniteTimestampTzPart) not a flag, but a value (where
> zero still means an error). Be honest it looks better, but the current
> code is written like nearby functions.
>

It is a commiter decision. This is a implementation detail, and I don't see
any variant as significant better - using 0 as error flag isn't the best
too. But this is really detail.

>
> V. The documentation lacks a note that the base of the "pg_size_bytes"
> is 1024 whereas the base of the "pg_size_pretty" is 1000.
>

It isn't true, base for both are 1024. It was designed when special table
was used for pg_size_bytes. But when we share same control table, then is
wrong to use different table. The result can be optically different, but
semantically same.

postgres=# select pg_size_pretty(pg_size_bytes('1MB'));
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 1024 kB │
└────────────────┘
(1 row)

Time: 0.787 ms
postgres=# select pg_size_pretty(pg_size_bytes('1024MB'));
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 1024 MB │
└────────────────┘
(1 row)

>
> ===
> Concusion:
>
> I *can't* mark this patch as "Ready for Committer" with all previous
> notes, and I ask experienced hackers for a help.
> I'm not a strong Postgres hacker yet and sending to the next round
> with the same notes seems a nitpicking for me.
>
> So questions to the hackers:
>
> a. Are the rest of the notes from the previous review[1] (and listed
> in pp.1-3c) a nitpicking or should be done?
>
> b. Does the function in PoC "single_buffer.c" look more complex or easer?
>
> c. Is it worth for p.II to change error message like 'Empty input
> value is not a valid size for pg_size_bytes: "%s"'?
>

It is simply to implement, but It is looking like overengineering

>
> d. Should the function "pg_size_bytes" handle "bytes" as a size unit
> (see p.III)?
> There was no disagreement, but if I saw that thread before I became a
> reviewer and saw silent approval I would have argued. Moreover
> supporting of negative sizes was explained[4] by a symmetry for the
> "pg_size_pretty", which should include all size units produced by the
> "pg_size_pretty". On the other hand those functions can't be symmetric
> due to difference in bases (see p.V and [5])
>

negative values is fully supported.

support of "bytes" depends on support "bytes" unit by GUC. When "bytes"
unit will be supported, then it can be used in pg_size_bytes immediately.

updated patch attached

Regards

Pavel

>
> e. What way is preferred for p.IV?
>
> f. If p.V is actual, how to write it in the documentation (be honest
> I've no idea)? Should a description of the "pg_size_pretty" be changed
> too?
>
>
> [1]
> http://www.postgresql.org/message-id/CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com
> [2]
> http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com
> [3]
> http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a
> [4]
> http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com
> [5]
> http://www.postgresql.org/message-id/CA+TgmoYgS_xixUQy+xyw9futtMrTbC_S0c1C0_wOBwVbNf8aZA@mail.gmail.com
>

Attachment Content-Type Size
pg-size-bytes-11.patch text/x-patch 14.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-01-26 07:22:24 Re: proposal: PL/Pythonu - function ereport
Previous Message Simon Riggs 2016-01-26 06:52:06 Re: Add generate_series(date, date) and generate_series(date, date, integer)