On Thu, Jan 19, 2012 at 10:18 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 12.01.2012 14:31, Simon Riggs wrote:
>>> In order to simulate real-world clog contention, we need to use
>>> benchmarks that deal with real world situations.
>>> Currently, pgbench pre-loads data using COPY and executes a VACUUM so
>>> that all hint bits are set on every row of every page of every table.
>>> Thus, as pgbench runs it sees zero clog accesses from historical data.
>>> As a result, clog access is minimised and the effects of clog
>>> contention in the real world go unnoticed.
>>> The following patch adds a pgbench option -I to load data using
>>> INSERTs, so that we can begin benchmark testing with rows that have
>>> large numbers of distinct un-hinted transaction ids. With a database
>>> pre-created using this we will be better able to simulate and thus
>>> more easily measure clog contention. Note that current clog has space
>>> for 1 million xids, so a scale factor of greater than 10 is required
>>> to really stress the clog.
>> No doubt this is handy for testing this particular area, but overall I feel
>> this is too much of a one-trick pony to include in pgbench.
>> Alternatively, you could do something like this:
> I think the one-trick pony is pgbench. It has exactly one starting
> condition for its tests and that isn't even a real world condition.
> The main point of including the option into pgbench is to have a
> utility that produces as initial test condition that works the same
> for everyone, so we can accept each others benchmark results. We both
> know that if someone posts that they have done $RANDOMSQL on a table
> before running a test, it will just be ignored and people will say
> user error. Some people will get it wrong when reproducing things and
> we'll have chaos.
> The patch exists as a way of testing the clog contention improvement
> patches and provides a route to long term regression testing that the
> solution(s) still work.
I agree: I think this is useful.
However, I think we should follow the precedent of some of the other
somewhat-obscure options we've added recently and have only a long
form option for this: --inserts.
Also, I don't think the behavior described here should be joined at
the hip to --inserts:
+ * We do this after a load by COPY, but before a load via INSERT
+ * This is done deliberately to ensure that no heap or index hints are
+ * set before we start running the benchmark. This emulates the case
+ * where data has arrived row at a time by INSERT, rather than being
+ * bulkloaded prior to update.
I think that's also a useful behavior, but if we're going to have it,
we should have a separate option for it, like --create-indexes-early.
Otherwise, someone who wants to (for example) test only the impact of
doing inserts vs. COPY will get misleading answers.
Finally, it's occurred to me that it would be useful to make pgbench
respect -n even in initialization mode, and the SGML doc changes imply
that this patch does that somewhere or other, but maybe only when
you're doing INSERTS and not when you're doing copy, which would be
odd; and there's no sgml doc update for -n, and no command-line help
In short, I think the reason this patch seems like it's implementing
something fairly arbitrary it's really three pretty good ideas that
are somewhat jumbled together.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2012-01-19 15:52:36|
|Subject: Re: Publish checkpoint timing and sync files summary data
|Previous:||From: Noah Misch||Date: 2012-01-19 15:37:28|
|Subject: Re: Add minor version to v3 protocol to allow changes withoutbreaking backwards compatibility|