Unnecessarily imprecise stats for very small tables leading to bad plans

From: Andreas Seltenreich <andreas(dot)seltenreich(at)credativ(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Unnecessarily imprecise stats for very small tables leading to bad plans
Date: 2018-06-06 08:38:01
Message-ID: 87o9gos42e.fsf@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

one of our customers suffers from excessively bad plans when joining a
big table (100G) against a temp table with only a single digit number of
rows. I found a way to reproduce it, and it seems to me like the
optimizer cannot do better since the statistics says it can early-out a
merge-join because the maximum value is missing from the stats.

Below is the data, statistics an plan for the bad merge join as well as
the expected nested loop.

Intuitively, I'd say pg_statistic can represent such a small dataset
perfectly. But instead, most_common_values only contains one of the two
distinct values. This seems like a bug to me. I looked at the code for
why this is the case, but couldn't find an answer right-away. Any
input?

The customer uses latest 9.6, the testcase was against 10.0.

regards
Andreas

=# create table big(c int primary key);
=# insert into big select generate_series(1,1000000);
=# analyze big;
=# create table small(c int);
=# insert into small values (10), (10), (800000);
=# analyze verbose small;
=# explain analyze select * from big b join small s on (b.c=s.c);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.49..1.82 rows=3 width=8) (actual time=0.110..151.939 rows=3 loops=1)
Merge Cond: (b.c = s.c)
-> Index Only Scan using big_pkey on big b (cost=0.42..30408.42 rows=1000000 width=4) (actual time=0.067..122.192 rows=800001 loops=1)
Heap Fetches: 800001
-> Sort (cost=1.05..1.06 rows=3 width=4) (actual time=0.027..0.028 rows=3 loops=1)
Sort Key: s.c
Sort Method: quicksort Memory: 25kB
-> Seq Scan on small s (cost=0.00..1.03 rows=3 width=4) (actual time=0.009..0.011 rows=3 loops=1)
Planning time: 0.514 ms
Execution time: 151.989 ms
(10 rows)

=# select * from pg_stats where tablename='small' \gx
-[ RECORD 1 ]----------+-----------
schemaname | public
tablename | small
attname | c
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -0.666667
most_common_vals | {10} -- here, I expected to see 800000 as well
most_common_freqs | {0.666667}
histogram_bounds | ⊥
correlation | 1
most_common_elems | ⊥
most_common_elem_freqs | ⊥
elem_count_histogram | ⊥

=# insert into small values (800000);
=# analyze verbose small;
=# explain analyze select * from big b join small s on (b.c=s.c);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..34.81 rows=4 width=8) (actual time=0.037..0.068 rows=4 loops=1)
-> Seq Scan on small s (cost=0.00..1.04 rows=4 width=4) (actual time=0.012..0.015 rows=4 loops=1)
-> Index Only Scan using big_pkey on big b (cost=0.42..8.44 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=4)
Index Cond: (c = s.c)
Heap Fetches: 4
Planning time: 0.478 ms
Execution time: 0.133 ms
(7 rows)

=# select * from pg_stats where tablename='small' \gx
-[ RECORD 1 ]----------+------------
schemaname | public
tablename | small
attname | c
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -0.5
most_common_vals | {10,800000}
most_common_freqs | {0.5,0.5}
histogram_bounds | ⊥
correlation | 1
most_common_elems | ⊥
most_common_elem_freqs | ⊥
elem_count_histogram | ⊥

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Frits Jalvingh 2018-06-06 10:15:25 Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size 1073741824 / Where: parallel worker
Previous Message Amit Langote 2018-06-06 08:00:55 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation