Re: Use COPY for populating all pgbench tables

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Tristan Partin <tristan(at)neon(dot)tech>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use COPY for populating all pgbench tables
Date: 2023-06-09 13:24:31
Message-ID: CAHLJuCW4gapAYZMc2vuy9mL2jdhNj3SLL9ef9U+stD=8EXfV=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 23, 2023 at 1:33 PM Tristan Partin <tristan(at)neon(dot)tech> wrote:

> We (Neon) have noticed that pgbench can be quite slow to populate data
> in regard to higher latency connections. Higher scale factors exacerbate
> this problem. Some employees work on a different continent than the
> databases they might be benchmarking. By moving pgbench to use COPY for
> populating all tables, we can reduce some of the time pgbench takes for
> this particular step.
>

When latency is continent size high, pgbench should be run with server-side
table generation instead of using COPY at all, for any table. The default
COPY based pgbench generation is only intended for testing where the client
and server are very close on the network.

Unfortunately there's no simple command line option to change just that one
thing about how pgbench runs. You have to construct a command line that
documents each and every step you want instead. You probably just want
this form:

$ pgbench -i -I dtGvp -s 500

That's server-side table generation with all the usual steps. I use this
instead of COPY in pgbench-tools so much now, basically whenever I'm
talking to a cloud system, that I have a simple 0/1 config option to switch
between the modes, and this long weird one is the default now.

Try that out, and once you see the numbers my bet is you'll see extending
which tables get COPY isn't needed by your use case anymore. Basically, if
you are close enough to use COPY instead of server-side generation, you are
close enough that every table besides accounts will not add up to enough
time to worry about optimizing the little ones.

--
Greg Smith greg(dot)smith(at)crunchydata(dot)com
Director of Open Source Strategy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2023-06-09 14:09:42 Re: Partial aggregates pushdown
Previous Message Gregory Smith 2023-06-09 12:52:33 Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15