Re: Commitfest remaining "Needs Review" items

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Commitfest remaining "Needs Review" items
Date: 2015-08-25 13:40:01
Message-ID: 55DC7031.60107@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 08/25/2015 02:44 PM, Michael Paquier wrote:
> On Tue, Aug 25, 2015 at 6:05 PM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>>
>>> -- merging pgbench logs: returned with feedback or bump? Fabien has
>>> concerns about performance regarding fprintf when merging the logs.
>>> Fabien, Tomas, thoughts?
>>> -- pgbench - per-transaction and aggregated logs: returned with
>>> feedback or bump to next CF? Fabien, Tomas, thoughts?
>>
>>
>> I think that both features are worthwhile so next CF would be
>> better,but it really depends on Tomas.
>
> OK, so let's wait for input from Tomas for now.

Let's move them to the next CF.

>
>> The key issue was the implementation complexity and maintenance burden which
>> was essentially driven by fork-based thread emulation compatibility, but it
>> has gone away as the emulation has been taken out of pgbench and it is now
>> possible to do a much simpler implementation of these features.

To some extent, yes. It makes logging into a single file simpler, but
the overhead it introduces is still an open question and it does not
really simplify the other patch (writing both raw and aggregated logs).

>>
>> The performance issue is that if you have many threads which perform
>> monstruous tps and try to log them then logging becomes a bottle neck, both
>> the "printf" time and the eventual file locking... Well, that is life, it is
>> well know that experimentators influence experiments they are looking at
>> since Schrödinger, and moreover the --sampling-rate option is already here
>> to alleviate this problem if needed, so I do not think that it is an issue
>> to address by keeping the code complex.
>
> Honestly, I don't like the idea of having a bottleneck at logging
> level even if we can leverage it with a logging sample rate, that's a
> recipe for making pgbench become a benchmark to measure its own
> contention, while it should put the backend into pressure,
> particularly when short transactions are used.

I'd like to point out this overhead would not be a new thing - the
locking is already there (at least with glibc) to a large degree. See:

http://www.gnu.org/software/libc/manual/html_node/Streams-and-Threads.html

So fprintf does locking, and that has overhead even when the lock is
uncontended (e.g. when using one file per thread). And it has nothing to
do with the thread emulation - that was mostly about code complexity,
not about locking overhead.

The logging system was designed with a single log in mind, so it's not
quite compatible with features like this - I think we may need to
redesign it, and I think it nicely matches the producer/consumer
pattern, about like this:

1) each thread (-j) is a producer

- producing transaction details (un-formatted)
- possibly batches the data to minimize overhead

2) each log type is a separate consumer

- may be a dedicated thread or just a function
- gets the raw transaction details (in batches)
- either just writes the data to a file (raw), aggregates them or
does something else about that (e.g prints progress)

Data is passed through queues (hopefully with low overhead thanks to the
batching).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-08-25 14:02:23 Re: [COMMITTERS] pgsql: Change TAP test framework to not rely on having a chmod executab
Previous Message Michael Paquier 2015-08-25 13:08:38 Re: Additional role attributes && superuser review