Re: Use COPY for populating all pgbench tables

From: "Tristan Partin" <tristan(at)neon(dot)tech>
To: "Gregory Smith" <gregsmithpgsql(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Use COPY for populating all pgbench tables
Date: 2023-06-09 14:55:25
Message-ID: CT87G1JPVKU7.KGLR2XD5Z5OU@gonk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri Jun 9, 2023 at 8:24 AM CDT, Gregory Smith wrote:
> On Tue, May 23, 2023 at 1:33 PM Tristan Partin <tristan(at)neon(dot)tech> wrote:
>
> > We (Neon) have noticed that pgbench can be quite slow to populate data
> > in regard to higher latency connections. Higher scale factors exacerbate
> > this problem. Some employees work on a different continent than the
> > databases they might be benchmarking. By moving pgbench to use COPY for
> > populating all tables, we can reduce some of the time pgbench takes for
> > this particular step.
> >
>
> When latency is continent size high, pgbench should be run with server-side
> table generation instead of using COPY at all, for any table. The default
> COPY based pgbench generation is only intended for testing where the client
> and server are very close on the network.
>
> Unfortunately there's no simple command line option to change just that one
> thing about how pgbench runs. You have to construct a command line that
> documents each and every step you want instead. You probably just want
> this form:
>
> $ pgbench -i -I dtGvp -s 500
>
> That's server-side table generation with all the usual steps. I use this
> instead of COPY in pgbench-tools so much now, basically whenever I'm
> talking to a cloud system, that I have a simple 0/1 config option to switch
> between the modes, and this long weird one is the default now.
>
> Try that out, and once you see the numbers my bet is you'll see extending
> which tables get COPY isn't needed by your use case anymore. Basically, if
> you are close enough to use COPY instead of server-side generation, you are
> close enough that every table besides accounts will not add up to enough
> time to worry about optimizing the little ones.

Thanks for your input Greg. I'm sure you're correct that server-side data
generation would probably fix the problem. I guess I am trying to
understand if there are any downsides to just committing this anyway. I
haven't done any more testing, but David's email only showed a 4%
performance drop in the workload he tested. Combine this with his hex
patch and we would see an overall performance improvement when
everything is said and done.

It seems like this patch would still be good for client-side high scale
factor data generation (when server and client are close), but I would
need to do testing to confirm.

--
Tristan Partin
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tristan Partin 2023-06-09 14:59:47 Re: Use COPY for populating all pgbench tables
Previous Message David Steele 2023-06-09 14:46:48 Re: Views no longer in rangeTabls?