BUG #16280: dead tuples (probably) effect plan and query performance

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: iserbin(at)bostonsd(dot)ru
Subject: BUG #16280: dead tuples (probably) effect plan and query performance
Date: 2020-02-27 11:28:11
Message-ID: 16280-e393fbb744eae7aa@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16280
Logged by: Ilya Serbin
Email address: iserbin(at)bostonsd(dot)ru
PostgreSQL version: 11.6
Operating system: Centos 7.4
Description:

Hello all,
Faced some strange plan changes with a query (query is quite bad, however I
can't understand why the plan changes).
Context:

db1=> \d+ table1
Table
"db1.table1"
Column | Type | Collation | Nullable |
Default | Storage | Stats target | Description
------------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
id | bigint | | not null |
nextval('table1_id_seq'::regclass) | plain | |
table1_id | character varying(255) | | not null |
| extended | |
content | jsonb | | |
| extended | |
created_at | timestamp without time zone | | not null | now()
| plain | |
updated_at | timestamp without time zone | | not null | now()
| plain | |
queued_at | timestamp with time zone | | |
| plain | |
Indexes:
"table1_pkey" PRIMARY KEY, btree (id)
"uk_table1_id" UNIQUE CONSTRAINT, btree (table1_id)
"content_idx" gin (content jsonb_path_ops)
Referenced by:
TABLE "collection_table1s" CONSTRAINT "fk_collection_table1s_table1_id"
FOREIGN KEY (table1_id) REFERENCES table1(id)
TABLE "db1_table1s" CONSTRAINT "fk_db1_table1s_table1_id" FOREIGN KEY
(table1s_id) REFERENCES table1(id)
TABLE "table1_sort" CONSTRAINT "fk_table1_sort_table1_id" FOREIGN KEY
(table1_id) REFERENCES table1(table1_id)
name | setting
--------------------------------+-----------
constraint_exclusion | partition
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
default_statistics_target | 200
effective_cache_size | 1572864
enable_bitmapscan | on
enable_hashagg | on
enable_hashjoin | on
enable_indexonlyscan | on
enable_indexscan | on
enable_material | on
enable_mergejoin | on
enable_nestloop | on
enable_parallel_append | on
enable_parallel_hash | on
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_seqscan | on
enable_sort | on
enable_tidscan | on
random_page_cost | 1.1
seq_page_cost | 1
shared_buffers | 524288
(25 rows)

First plan: Good plan and problematic query:
https://explain.tensor.ru/archive/explain/1cf3c0181a9574bf2fd06d6fd07dc201:0:2020-02-27#context
Second plan: Same query and it's bad plan:
https://explain.tensor.ru/archive/explain/3770d39f786135e38bebeb1a8a4b1da9:0:2020-02-27#context

Issue description:
After some minimal activity in database first plan changes to the second.
Analyze on table1 do not help (tried with various default_statistics_target
values).
content_idx index recreation helps for some time, but several minutes later
plan degrades back to second one.
The only thing helped (surprisingly) is vacuum. It also helps for some time,
but once number of dead tuples reaches something like 300-500 - plan
changes back to second one.

Table details with bad plan:

db1=> select * from pg_stat_user_tables where relname='table1';
-[ RECORD 1 ]-------+------------------------------
relid | 74813598
schemaname | db1
relname | table1
seq_scan | 1167
seq_tup_read | 315158718
idx_scan | 23116536
idx_tup_fetch | 42353778
n_tup_ins | 409288
n_tup_upd | 313114963
n_tup_del | 0
n_tup_hot_upd | 4683
n_live_tup | 409288
n_dead_tup | 379
n_mod_since_analyze | 1169
last_vacuum | 2020-02-12 08:58:06.147247+01
last_autovacuum | 2020-02-20 12:29:31.427082+01
last_analyze |
last_autoanalyze | 2020-02-20 10:12:11.494305+01
vacuum_count | 2
autovacuum_count | 702
analyze_count | 0
autoanalyze_count | 20

Table details with good plan after vacuum:
-[ RECORD 1 ]-------+------------------------------
relid | 74813598
schemaname | db1
relname | table1
seq_scan | 1167
seq_tup_read | 315158718
idx_scan | 23116662
idx_tup_fetch | 42360694
n_tup_ins | 409288
n_tup_upd | 313114963
n_tup_del | 0
n_tup_hot_upd | 4683
n_live_tup | 409288
n_dead_tup | 0
n_mod_since_analyze | 0
last_vacuum | 2020-02-26 08:10:49.884625+01
last_autovacuum | 2020-02-20 12:29:31.427082+01
last_analyze | 2020-02-26 08:13:07.253307+01
last_autoanalyze | 2020-02-20 10:12:11.494305+01
vacuum_count | 3
autovacuum_count | 702
analyze_count | 1
autoanalyze_count | 20

I can confirm that issue exists on 11.4 and 11.6 (updated to 11.6 since I
thought it was a bu that may have been fixed in 11.5 "Fix possible failure
of planner's index endpoint probes (Tom Lane)"), it is always reproducible,
and can be reproducted on prod and all lower environments.
I can't understand why it happens. As far as I understand there is something
to do with visibility map, but as per documention it should effect
index-only scans, not my case.
Main two questions are:
1) Is it a bug?
2) If it is expected behaviour - can someone please explain why it happens
and if there is any way to keep the good plan (without installing
extensions to force pin plans)

Best regards,
Ilya

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-02-27 12:49:52 Re: BUG #16276: Server crash on an invalid attempt to attach a partition to an index
Previous Message Amit Langote 2020-02-27 08:14:28 Re: BUG #16276: Server crash on an invalid attempt to attach a partition to an index