Some other CLOBBER_CACHE_ALWAYS culprits

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Some other CLOBBER_CACHE_ALWAYS culprits
Date: 2021-05-11 16:03:33
Message-ID: 852314.1620749013@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In some recent threads I complained about how CLOBBER_CACHE_ALWAYS
test runs have gotten markedly slower over the past couple of release
cycles [1][2][3]. It'd be impossibly time-consuming to investigate the
causes by repeating the whole test corpus, but I've had some success in
bisecting while measuring the runtime of just a single test script.
In this report I'm looking at src/pl/plpgsql/src/sql/plpgsql_control.sql,
which is a useful candidate because it hasn't changed at all since v11.
Despite that, hyrax's latest runs show these runtimes:

HEAD:
test plpgsql_control ... ok 56105 ms
REL_13_STABLE:
test plpgsql_control ... ok 46879 ms
REL_12_STABLE:
test plpgsql_control ... ok 30809 ms

so we have clearly made CCA runs a whole lot worse since v12.
(Non-CCA buildfarm members show runtimes that are about the same
across all three branches.)

I've reproduced (some of) these results on my shiny new M1 mini,
which is a tad faster than hyrax's host: it can do the test on HEAD
(049e1e2ed) in 15.413s. (Note: this, and the numbers following, are
median-of-3-runs; the run variance is enough that I wouldn't trust
them to less than a tenth of a second.) The run time at 615cebc94
(v12 branchoff point) is 11.861s. Bisecting, I found that there were
three commits that accounted for almost all of the slowdown since v12:

0d861bbb7 Add deduplication to nbtree
11.836s -> 12.339s
(that's runtime on the preceding commit -> runtime on this commit)

8f59f6b9c(+fbc7a7160) Improve performance of "simple expressions" in PL/pgSQL
12.334s -> 14.158s

911e70207 Implement operator class parameters
14.263s -> 15.415s

One thing that confuses me, though, is that all of these are v13-era
commits (they all went into the tree during March 2020). I don't
see any real difference in the runtime from the v13 branchoff point
to now, which doesn't square with hyrax's results. Could there be
that much inter-platform variation in the overhead of CCA? It might
be useful for somebody with patience and a fast Intel machine to try
to replicate these results.

Anyway, it seems like these three features deserve some study as to
why they caused so much slowdown under CCA. It's not so surprising
that 8f59f6b9c would have an effect on a test of plpgsql control
logic, but I find it surprising and rather disturbing that either
of the others would.

BTW, I was also tracking initdb runtime under CCA while I did this,
and while I didn't formally bisect on that basis, I did notice that
911e70207 had quite a negative impact on that too: 180s -> 195s.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/242172.1620494497%40sss.pgh.pa.us#eab25bb83bdcdd0f58b2d712b4971fcd
[2] https://www.postgresql.org/message-id/flat/292305.1620503097%40sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/flat/575884.1620626638%40sss.pgh.pa.us

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-05-11 16:06:02 Re: Enhanced error message to include hint messages for redundant options error
Previous Message Masahiko Sawada 2021-05-11 15:56:33 Re: Performance degradation of REFRESH MATERIALIZED VIEW