Re: Extend pgbench partitioning to pgbench_history

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Gabriele Bartolini <gabriele(dot)bartolini(at)enterprisedb(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extend pgbench partitioning to pgbench_history
Date: 2024-02-16 20:14:37
Message-ID: CAAKRu_Zo8ST-Qk8VQ4KFkbMQcqJsQQz5r+YRRbecS3avgkoZhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 16, 2024 at 12:50 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Hello Gabriele,
>
> I think the improvement makes sense (it's indeed a bit strange to not
> partition the history table), and the patch looks good.
>
> I did think about whether this should be optional in some way - that is,
> separate from partitioning the accounts table, and users would have to
> explicitly enable (or disable) it. But I don't think we need to do that.
>
> The vast majority of users simply want to partition everything. And this
> is just one way to partition the database anyway, it's our opinion on
> how to do that, but there's many other options how we might partition
> the tables, and we don't (and don't want too) have options for that.

I wonder how common it would be to partition a history table by
account ID? I sort of imagined the most common kind of partitioning
for an audit table is by time (range). Anyway, I'm not objecting to
doing it by account ID, just asking if there is a reason to do so.

Speaking of which, Tomas said users might want to "partition
everything" -- so any reason not to also partition tellers and
branches?

This change to the docs seems a bit misleading:

<listitem>
<para>
- Create a partitioned <literal>pgbench_accounts</literal> table with
- <replaceable>NUM</replaceable> partitions of nearly equal size for
- the scaled number of accounts.
+ Create partitioned <literal>pgbench_accounts</literal> and
<literal>pgbench_history</literal>
+ tables with <replaceable>NUM</replaceable> partitions of
nearly equal size for
+ the scaled number of accounts and future history records.
Default is <literal>0</literal>, meaning no partitioning.
</para>
</listitem>

It says that partitions of "future history records" will be equal in
size. While it's true that at the end of a pgbench run, if you use a
random distribution for aid, the pgbench_history partitions should be
roughly equally sized, it is confusing to say it will "create
pgbench_history with partitions of equal size". Maybe it would be
better to write a new sentence about partitioning pgbench_history
without worrying about mirroring the sentence structure of the
existing sentence.

> The only case that I can think of where this might matter is when
> running a benchmarks that will be compared to some earlier results
> (executed using an older pgbench version). That will be affected by
> this, but I don't think we make many promises about compatibility in
> this regard ... it's probably better to always compare results only from
> the same pgbench version, I guess.

As a frequent pgbench user, I always use the same pgbench version even
when comparing different versions of Postgres. Other changes have made
it difficult to compare results across pgbench versions without
providing it as an option (see 06ba4a63b85e). So, I don't think it is
a problem if it is noted in release notes.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2024-02-16 20:16:53 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Previous Message Nathan Bossart 2024-02-16 20:09:51 Re: glibc qsort() vulnerability