Re: pgbench more operators & functions

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench more operators & functions
Date: 2017-01-25 08:31:05
Message-ID: alpine.DEB.2.20.1701250825110.29470@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> As it stands right now you haven't provided enough context for this patch
>> and only the social difficulty of actually marking a patch rejected has
>> prevented its demise in its current form - because while it has interesting
>> ideas its added maintenance burden for -core without any in-core usage.
>> But it you make it the first patch in a 3-patch series that implements the
>> per-spec tpc-b the discussion moves away from these support functions and
>> into the broader framework in which they are made useful.
>
> I think Fabien already did post something of the sort, or at least
> discussion towards it,

Yep.

> and there was immediately objection as to whether his idea of TPC-B
> compliance was actually right. I remember complaining that he had a
> totally artificial idea of what "fetching a data value" requires.

Yep.

I think that the key misunderstanding is that you are honest and assume
that other people are honest too. This is naïve: There is a long history
of vendors creatively "cheating" to get better than deserve benchmark
results. Benchmark specifications try to prevent such behaviors by laying
careful requirements and procedures.

In this instance, you "know" that when pg has returned the result of the
query the data is actually on the client side, so you considered it is
fetched. That is fine for you, but from a benchmarking perspective with
external auditors your belief is not good enough.

For instance, the vendor could implement a new version of the protocol
where the data are only transfered on demand, and the result just tells
that the data is indeed somewhere on the server (eg on "SELECT abalance"
it could just check that the key exists, no need to actually fetch the
data from the table, so no need to read the table, the index is
enough...). That would be pretty stupid for real application performance,
but the benchmark would could get better tps by doing so.

Without even intentionnaly cheating, this could be part of a useful
"streaming mode" protocol option which make sense for very large results
but would be activated for a small result.

Another point is that decoding the message may be a little expensive, so
that by not actually extracting the data into the client but just keeping
it in the connection/OS one gets better performance.

Thus, TPC-B 2.0.0 benchmark specification says:

"1.3.2 Each transaction shall return to the driver the Account_Balance
resulting from successful commit of the transaction.

Comment: It is the intent of this clause that the account balance in the
database be returned to the driver, i.e., that the application retrieve
the account balance."

For me the correct interpretation of "the APPLICATION retrieve the account
balance" is that the client application code, pgbench in this context, did
indeed get the value from the vendor code, here "libpq" which is handling
the connection.

Having the value discarded from libpq by calling PQclear instead of
PQntuples/PQgetvalue/... skips a key part of the client code that no real
application would skip. This looks strange and is not representative of
real client code: as a potential auditor, because of this I would not
check the corresponding item in the audit check list:

"11.3.1.2 Verify that transaction inputs and outputs satisfy Clause 1.3."

So the benchmark implementation would not be validated.

Another trivial reason to be able to actually retrieve data is that for
benchmarking purpose it is very easy to want to test a scenario where you
want to do different things based on data received, which imply that the
data can be manipulated somehow on the benchmarking client side, which is
currently not possible.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-01-25 08:36:54 Re: pgbench more operators & functions
Previous Message Pavel Stehule 2017-01-25 08:26:27 Re: patch: function xmltable