Re: pgbench - exclude pthread_create() from connection start timing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, pavel(dot)stehule(at)gmail(dot)com
Subject: Re: pgbench - exclude pthread_create() from connection start timing
Date: 2013-09-26 11:41:01
Message-ID: alpine.DEB.2.02.1309260852540.29589@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> pgbench changes, when adding the throttling stuff. Having the start time
>> taken when the thread really starts is just sanity, and I needed that
>> just to rule out that it was the source of the "strange" measures.
>
> I don't get it; why is taking the time just after pthread_create() more sane
> than taking it just before pthread_create()?

Thread create time seems to be expensive as well, maybe up 0.1 seconds
under some conditions (?). Under --rate, this create delay means that
throttling is laging behind schedule by about that time, so all the first
transactions are trying to catch up.

> typically far more expensive than pthread_create(). The patch for threaded
> pgbench made the decision to account for pthread_create() as though it were
> part of establishing the connection. You're proposing to not account for it
> all. Both of those designs are reasonable to me, but I do not comprehend the
> benefit you anticipate from switching from one to the other.
>
>> -j 800 vs -j 100 : ITM that if I you create more threads, the time delay
>> incurred is cumulative, so the strangeness of the result should worsen.
>
> Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
> calling pthread_create(). However, thread 0 is a special case; we set its
> start time first and actually start it last. Your observation of cumulative
> delay fits those facts.

Yep, that must be thread 0 which has a very large delay. I think it is
simpler that each threads record its start time when it has started,
without exception.

> Initializing the thread-0 start time later, just before calling its
> threadRun(), should clear this anomaly without changing other aspects of
> the measurement.

Always taking the thread start time when the thread is started does solve
the issue as well, and it is homogeneous for all cases, so the solution I
suggest seems reasonable and simple.

> While pondering this area of the code, it occurs to me -- shouldn't we
> initialize the throttle rate trigger later, after establishing
> connections and sending startup queries? As it stands, we build up a
> schedule deficit during those tasks. Was that deliberate?

On the principle, I agree with you.

The connection creation time is another thing, but it depends on the
options set. Under some options the connection is open and closed for
every transaction, so there is no point in avoiding it in the measure or
in the scheduling, and I want to avoid having to distinguish those cases.
Morover, ISTM that one of the thread reuse the existing connection while
other recreate is. So I left it "as is".

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-09-26 11:43:15 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Michael Paquier 2013-09-26 11:40:40 Re: Support for REINDEX CONCURRENTLY