Re: pgbench - allow to specify scale as a size

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - allow to specify scale as a size
Date: 2018-02-19 07:43:52
Message-ID: alpine.DEB.2.20.1802190832140.10483@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Alvaro & Tom,

>>> Why not then insert a "few" rows, measure size, truncate the table,
>>> compute the formula and then insert to the desired user requested
>>> size? (or insert what should be the minimum, scale 1, measure, and
>>> extrapolate what's missing). It doesn't sound too complicated to me,
>>> and targeting a size is something that I believe it's quite good for
>>> user.
>>
>> The formula I used approximates the whole database, not just one table.
>> There was one for the table, but this is only part of the issue. In
>> particular, ISTM that index sizes should be included when caching is
>> considered.
>>
>> Also, index sizes are probably in n ln(n), so some level of
>> approximation is inevitable.
>>
>> Moreover, the intrinsic granularity of TPC-B as multiple of 100,000
>> rows makes it not very precise wrt size anyway.
>
> Sure, makes sense, so my second suggestion seems more reasonable: insert
> with scale 1, measure there (ok, you might need to crete indexes only to
> later drop them), and if computed scale > 1 then insert whatever is left
> to insert. Shouldn't be a big deal to me.

I could implement that, even if it would lead to some approximation
nevertheless: ISTM that the very large scale regression performed by
Kaarel is significantly more precise than testing with scale 1 (typically
a few MiB) and extrapolation that to hundreds of GiB.

Maybe it could be done with kind of an open ended dichotomy, but creating
and recreating index looks like an ugly solution, and what should be
significant is the whole database size, including tellers & branches
tables and all indexes, so I'm not convinced. Now as tellers & branches
tables have basically the same structure as accounts, it could be just
scaled by assuming that it would incur the same storage per row.

Anyway, even if I do not like it, it could be better than nothing. The key
point for me is that if Tom is dead set against the feature the patch is
dead anyway.

Tom, would Alvaro approach be more admissible to you that a fixed formula
that would need updating, keeping in mind that such a feature implies
some level approximation?

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-02-19 09:02:24 Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Previous Message Ashutosh Bapat 2018-02-19 07:40:04 Re: spelling of enable_partition_wise_join