Re: pgbench \for or similar loop

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench \for or similar loop
Date: 2011-04-22 06:23:00
Message-ID: 4DB11EC4.6010408@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Why do we have pgbench at all in the first place? Surely we could
> rewrite it in plpgsql with proper stored procedures.
>

pgbench gives you a driver program with the following useful properties:

1) Multiple processes are spawned and each gets its own connection
2) A time/transaction limit is enforced across all of the connections at
once
3) Timing information is written to a client-side log file
4) The work of running the clients can happen on a remote system, so
that it's possible to just test the server-side performance
5) The program is similar enough to any other regular client, using the
standard libpq interface, that connection-related overhead should be
similar to a real workload.

All of those have some challenges before you could duplicate them in a
stored procedure context.

My opinion of this feature is similar to the one Aiden already
expressed: there's already so many ways to do this sort of thing using
shell-oriented approaches (as well as generate_series) that it's hard to
get too excited about implementing it directly in pgbench. Part of the
reason for adding the \shell and \setshell commands way to make tricky
things like this possible without having to touch the pgbench code
further. I for example would solve the problem you're facing like this:

1) Write a shell script that generates the file I need
2) Call it from pgbench using \shell, passing the size it needs. Have
that write a delimited file with the data required.
3) Import the whole thing with COPY.

And next thing you know you've even got the CREATE/COPY optimization as
a possibility to avoid WAL, as well as the ability to avoid creating the
data file more than once if the script is smart enough.

Sample data file generation can be difficult; most of the time I'd
rather solve in a general programming language. The fact that simple
generation cases could be done with the mechanism you propose is true.
However, this only really helps cases that are too complicated to
express with generate_series, yet not so complicated that you really
want a full programming language to generate the data. I don't think
there's that much middle ground in that use case.

But if this is what you think makes your life easier, I'm not going to
tell you you're wrong. And I don't feel that your desire for this
features means you must tackle a more complicated thing instead--even
though what I personally would much prefer is something making this sort
of thing easier to do in regression tests, too. That's a harder
problem, though, and you're only volunteering to solve an easier one
than that.

Stepping aside from debate over usefulness, my main code concern is that
each time I look at the pgbench code for yet another tacked on bit, it's
getting increasingly creakier and harder to maintain. It's never going
to be a good benchmark driver program capable of really complicated
tasks. And making it try keeps piling on the risk of breaking it for
its intended purpose of doing simple tests. If you can figure out how
to keep the code contortions to implement the feature under control,
there's some benefit there. I can't think of a unique reason for it;
again, lots of ways to solve this already. But I'd probably use it if
it were there.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-04-22 06:48:02 Re: pgbench \for or similar loop
Previous Message Greg Smith 2011-04-22 03:51:49 Re: fsync reliability