Re: pgbench - allow to specify scale as a size

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Mark Wong <mark(at)2ndQuadrant(dot)com>, Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - allow to specify scale as a size
Date: 2018-03-04 09:09:11
Message-ID: alpine.DEB.2.20.1803040947500.12500@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>>> Now the overhead is really 60-65%. Although the specification is unambiguous,
>>> but we still need some maths to know whether it fits in buffers or memory...
>>> The point of Karel regression is to take this into account.
>>>
>>> Also, whether this option would be more admissible to Tom is still an open
>>> question. Tom?
>>
>> Here is a version with this approach: the documentation talks about
>> "actual data size, without overheads", and points out that storage
>> overheads are typically an additional 65%.
>
> I think when deciding on a size for a test database for benchmarking,
> you want to size it relative to RAM or other storage layers. So a
> feature that allows you to create a database of size N but it's actually
> not going to be anywhere near N seems pretty useless for that.

Hmmm.

At least the option say the size of the useful data, which should be what
the user be really interested in:-) You have a developer point of view
about the issue. From a performance point of view, ISTM that useful data
storage size is an interesting measure, which allows to compare between
(future) storage engines and show the impact of smaller overheads, for
instance.

The other option can only be some kind of approximation, and would require
some kind of maintenance (eg a future zheap overhead would be different
that the heap overhead, the overhead depends on the size itself, and it
could also depend on other options). This has been rejected, and I agree
with the rejection (incredible:-).

So ISTM that the patch is dead because it is somehow necessarily
imprecise. People will continue to do some wild guessing on how to
translate scale to anything related to size.

> (Also, we have, for better or worse, settled on a convention for byte
> unit prefixes in guc.c. Let's not introduce another one.)

Hmmm. Indeed for worse, as it is soooo much better to invent our own units
than to reuse existing ones which were not confusing enough:-)

- SI units: 1kB = 1000 bytes (*small* k)
- IEC units: 1KiB = 1024 bytes
- JEDEC units: 1KB = 1024 bytes (*capital* k)

But postgres documentation uses "kB" for 1024 bytes, too bad:-(

The gucs are about memory, which is measured in 1024, but the storage is
usually measured in 1000, and this option was about storage, hence I felt
it better to avoid confusion.

Conclusion: mark the patch as rejected?

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-03-04 09:27:01 Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?
Previous Message David Rowley 2018-03-04 08:43:23 Re: [HACKERS] Removing LEFT JOINs in more cases