Re: Perplexing, regular decline in performance

From: Hugh Ranalli <hugh(at)whtc(dot)ca>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Perplexing, regular decline in performance
Date: 2019-06-26 19:00:43
Message-ID: CAAhbUMPVgOyiTBDU-b9MaoixDEU3jZB4s63QrpefzGtLVbtKOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 25 Jun 2019 at 12:23, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> What kernel? Version? OS?
>
Ubuntu 18.04; current kernel is 4.15.0-51-generic4

If Linux, I wonder if transparent hugepages or KSM are enabled ? It seems
> possible that truncating the table is clearing enough RAM to mitigate the
> issue, similar to restarting the DB.
> tail /sys/kernel/mm/ksm/run
> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
> /sys/kernel/mm/transparent_hugepage/enabled
> /sys/kernel/mm/transparent_hugepage/defrag
>
> https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

==> /sys/kernel/mm/ksm/run <==
0
==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <==
1
==> /sys/kernel/mm/transparent_hugepage/enabled <==
always [madvise] never
==> /sys/kernel/mm/transparent_hugepage/defrag <==
always defer defer+madvise [madvise] never

From my research in preparing for the upgrade, I understood transparent
huge pages were a good thing, and should be enabled. Is this not correct?

11.2 would have parallel query, and enabled by default. Are there other
> settings you've changed (or not changed)?
> https://wiki.postgresql.org/wiki/Server_Configuration

I've just posted the parameters we are changing from the default in a
previous reply, so I won't repeat them here unless you want me to.

> It's possible that the "administrative" queries are using up lots of your
> shared_buffers, which are (also/more) needed by the customer-facing
> queries. I
> would install pg_buffercache to investigate. Or, just pause the admin
> queries
> and see if that the issue goes away during that interval ?
>

Pausing the admin queries isn't an option in our environment, especially as
the issue reveals itself over the course of days, not minutes or hours.
?column? | count | count | datname | coalesce
| toast | dirtyfrac | avg
------------------------+---------+-------+-----------+-------------------------+----------------+----------------------------+--------------------
0.24904101286779650995 | 1044545 | 0 | mydb | position
| | 0.000000000000000000000000 | 4.8035517857057379
0.16701241622795295199 | 700495 | 0 | mydb | stat_position_click
| | 0.000000000000000000000000 | 1.9870234619804567
0.09935032779251879171 | 416702 | 6964 | mydb | pg_toast_19788
| harvested_job | 0.01671218280689797505 | 1.9346079452462431
0.06979762146872315533 | 292750 | 0 | mydb | url
| | 0.000000000000000000000000 | 4.9627873612297182
0.03795774662998486745 | 159205 | 0 | mydb |
stat_sponsored_position | | 0.000000000000000000000000 |
1.8412361420809648
0.02923155381784048663 | 122605 | 0 | mydb | pg_toast_20174
| page | 0.000000000000000000000000 | 3.0259532645487541
0.02755283459406156353 | 115564 | 0 | mydb | location
| | 0.000000000000000000000000 | 4.9953532241874632
0.02015273698468076320 | 84526 | 1122 | mydb | harvested_job
| | 0.01327402219435439983 | 4.9922154130090150
0.01913348905375406298 | 80251 | 0 | mydb | pg_toast_20257
| position_index | 0.000000000000000000000000 | 4.9880001495308470

harvested_job is the rapidly growing "problematic" table I am talking
about. page is the 355 GB table that gets referenced on the public
searches. I'll google, but is there a place I should look to understand
what I am seeing here? Also, Should pg_buffercache perhaps be run at the
beginning and end of the week, to see if there is a significant difference?

> Could you send query plan for the slow (customer-facing) queries?
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN
>

I can, but can I ask why this would matter? I'm not looking to optimise the
query (although I'm sure it could be; this is a legacy system with lots of
barnacles). The problem is that the same query performs increasingly slowly
over the course of a week, seemingly in sync with the rows with a large
toast column added to one particular table (which, as I mentioned, isn't
referenced by the query in question). Wouldn't the plan be the same at both
the start of the week (when the problematic table is essentially empty) and
at the end (when it is much larger)?

Thanks!
Hugh

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hugh Ranalli 2019-06-26 19:02:15 Re: Perplexing, regular decline in performance
Previous Message Peter Geoghegan 2019-06-26 18:52:46 Re: Perplexing, regular decline in performance