Re: BUG #14948: cost overflow

From: Jan Schulz <jasc(at)gmx(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14948: cost overflow
Date: 2017-12-05 17:48:42
Message-ID: CAAc324hPZuLJiwQK_zVxEp3PEcpxpgnee-hxD55jUVp=x2hhCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello!

On 5 December 2017 at 16:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm not sure what to make of that, but it seems to be broken in more ways
> than just the wacko cost estimates.

We run our server with basically no data security in mind as we do
regenerate all tables from the source systems each time. So the OOM kill
might have broken some tables/data? m_dim wasn't regenerated since then
because the regeneration of that schema triggers the OOM.

relevant postgresql.conf entries:

wal_level = minimal
# doesn't start if wal_level=minimal and this is >0
max_wal_senders = 0
fsync = off
synchronous_commit = off
full_page_writes = off
wal_buffers = -1
autovacuum_max_workers = 1

> Can you get a similarly broken plan if you try something involving just
> this table, say "select * from converting_touchpoints_attribution where
> touchpoint_fk = 42 and performance_attribution_model_fk = 2" ?

EXPLAIN ( ANALYSE, BUFFERS )
select * from m_dim.converting_touchpoints_attribution where
touchpoint_fk = 42 and performance_attribution_model_fk = 2

Bitmap Heap Scan on converting_touchpoints_attribution
(cost=-25769803761.69..-25769803757.67 rows=1 width=16) (actual
time=0.005..0.005 rows=0 loops=1)
Recheck Cond: ((performance_attribution_model_fk = 2) AND (touchpoint_fk
= 42))
-> BitmapAnd (cost=-25769803761.69..-25769803761.69 rows=1 width=0)
(actual time=0.004..0.004 rows=0 loops=1)
-> Bitmap Index Scan on
converting_touchpoints_attribution__performance_attribution_mod
(cost=0.00..-12884901880.97 rows=1 width=0) (actual
time=0.003..0.003 rows=0 loops=1)
Index Cond: (performance_attribution_model_fk = 2)
-> Bitmap Index Scan on
converting_touchpoints_attribution__touchpoint_fk
(cost=0.00..-12884901880.97 rows=1 width=0) (never executed)
Index Cond: (touchpoint_fk = 42)
Planning time: 0.122 ms
Execution time: 0.036 ms

This is how it looks on a working system (generated with 2MB work_mem...)

Index Scan using
converting_touchpoints_attribution__touchpoint_fk_performance_a on
converting_touchpoints_attribution
(cost=0.56..2.58 rows=1 width=16) (actual time=2.314..2.314 rows=0
loops=1)
Index Cond: ((touchpoint_fk = 42) AND (performance_attribution_model_fk =
2))
Buffers: shared hit=6 read=4
Planning time: 5.371 ms
Execution time: 2.342 ms

> Could we see the pg_stats rows for those two columns? What is in
> pg_class.reltuples and relpages for this table and these two indexes?

SELECT
nspname || '.' ||relname,
relpages,
reltuples
FROM pg_class
JOIN pg_catalog.pg_namespace n
ON n.oid = pg_class.relnamespace
WHERE relname ILIKE 'converting_touchpoints_attributio%';

+-----------------------------------------------------------------------+----------+-----------+
| nspname+relname |
relpages | reltuples |
+-----------------------------------------------------------------------+----------+-----------+
| m_dim.converting_touchpoints_attribution__touchpoint_fk_performance_a |
84153 | 24282896 |
| m_dim.converting_touchpoints_attribution___day_id_performance_attribu |
84153 | 24282896 |
| m_dim.converting_touchpoints_attribution__touchpoint_fk |
6 | 1026 |
| m_dim.converting_touchpoints_attribution__performance_attribution_mod |
5 | 1026 |
| m_dim.converting_touchpoints_attribution |
131259 | 24282892 |
+-----------------------------------------------------------------------+----------+-----------+
5 rows in set.

On a good system it looks like this:

+-----------------------------------------------------------------------+----------+-----------+
| nspname+relname |
relpages | reltuples |
+-----------------------------------------------------------------------+----------+-----------+
| m_dim.converting_touchpoints_attribution___day_id_performance_attribu |
84656 | 24428338 |
| m_dim.converting_touchpoints_attribution |
132046 | 24428338 |
| m_dim.converting_touchpoints_attribution__touchpoint_fk_performance_a |
84656 | 24428338 |
| m_dim.converting_touchpoints_attribution__touchpoint_fk |
6 | 1032 |
| m_dim.converting_touchpoints_attribution__performance_attribution_mod |
5 | 1032 |
| m_tmp.converting_touchpoints_attribution |
132046 | 24428338 |
+-----------------------------------------------------------------------+----------+-----------+
6 rows in set.

-> Seems like no real difference (the data has a few days differences
now + m_tmp.converting_touchpoints_attribution
is an UNLOGGED table so won't survive the crash)

Best regards,

Jan
--
Jan Schulz
mail: jasc(at)gmx(dot)net

>
> regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jan Schulz 2017-12-05 17:50:38 Fwd: BUG #14948: cost overflow
Previous Message Bossart, Nathan 2017-12-05 16:52:40 Re: BUG #14941: Vacuum crashes