Re: TAP tests take a long time

From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: TAP tests take a long time
Date: 2017-04-12 05:27:01
Message-ID: CAMsr+YG-JzqEpAF1ZpJwD7P8Ma8bzY4_FCdH2eVc8qdM53=+mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 April 2017 at 08:22, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
> <craig(dot)ringer(at)2ndquadrant(dot)com> wrote:
>> Well, you can get a lot of information on timings in crake's latest
>> builds for example, or nightjar's.
>>
>> Here's an interesting fact: almost all the time taken up in the scripts
>> test set seems to be consumed in running initdb over and over.
>>
>> Is it worth adding an init(cache => 1) option to PostgresNode, where we
>> stash an initdb'd db in tmp_check/ and just do a simple fs copy of it ?
>
> This looks like a good idea to me, but I don't like much the idea of
> an option in the init() routine as that's hard to maintain.

How so?

> It would
> make sense to me to be able to override the initialization with an
> environment variable instead

Yeah, that's reasonable.

Patch attached. It supports setting CACHE_INITDB=0 to disable the cache.

With cache: 3m55.063s (user 7.7s, sys 0m11.833s)

Without cache: 4m11.229s (user 0m16.963s, sys 0m12.384s)

so in a serial run it doesn't make a ton of difference.

Adding some timing on initdb runs shows that initdb takes about 1s, a
cached copy about 0.1s with our Perl based recursive copy code. So
it's hardly an overwhelming benefit, but might be more beneficial with
prove -j .

BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
collect more data from the buildfarm on what takes a while. Sample
output:

[13:20:47] t/001_stream_rep.pl .................. ok 50445 ms (
0.01 usr 0.00 sys + 2.16 cusr 3.57 csys = 5.74 CPU)
[13:21:38] t/002_archiving.pl ................... ok 5883 ms (
0.01 usr 0.00 sys + 0.19 cusr 0.45 csys = 0.65 CPU)
[13:21:44] t/003_recovery_targets.pl ............ ok 26197 ms (
0.00 usr 0.00 sys + 0.70 cusr 1.75 csys = 2.45 CPU)
[13:22:10] t/004_timeline_switch.pl ............. ok 10049 ms (
0.01 usr 0.00 sys + 0.24 cusr 0.53 csys = 0.78 CPU)
[13:22:20] t/005_replay_delay.pl ................ ok 10224 ms (
0.00 usr 0.01 sys + 0.18 cusr 0.43 csys = 0.62 CPU)
[13:22:30] t/006_logical_decoding.pl ............ ok 7570 ms (
0.00 usr 0.00 sys + 0.25 cusr 0.32 csys = 0.57 CPU)
[13:22:38] t/007_sync_rep.pl .................... ok 20693 ms (
0.00 usr 0.00 sys + 0.51 cusr 1.24 csys = 1.75 CPU)
[13:22:58] t/008_fsm_truncation.pl .............. ok 25399 ms (
0.01 usr 0.00 sys + 0.26 cusr 0.52 csys = 0.79 CPU)
[13:23:24] t/009_twophase.pl .................... ok 58531 ms (
0.01 usr 0.00 sys + 0.49 cusr 0.92 csys = 1.42 CPU)
[13:24:22] t/010_logical_decoding_timelines.pl .. ok 8135 ms (
0.00 usr 0.01 sys + 0.34 cusr 0.76 csys = 1.11 CPU)
[13:24:30] t/011_crash_recovery.pl .............. ok 4334 ms (
0.00 usr 0.00 sys + 0.12 cusr 0.16 csys = 0.28 CPU)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
cache-initdb-tap-v1.patch text/x-patch 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-04-12 05:34:04 Re: Why does logical replication launcher set application_name?
Previous Message Masahiko Sawada 2017-04-12 04:48:19 Re: Interval for launching the table sync worker