From: | Abadie Lana <Lana(dot)Abadie(at)iter(dot)org> |
---|---|
To: | Abadie Lana <Lana(dot)Abadie(at)iter(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | RE: select query does not pick up the right index |
Date: | 2019-01-04 08:58:57 |
Message-ID: | 399cd3f79a704b21a47e4305cebfa727@iter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
-----Original Message-----
From: pgsql-performance-owner+M22888-112441(at)lists(dot)postgresql(dot)org <pgsql-performance-owner+M22888-112441(at)lists(dot)postgresql(dot)org> On Behalf Of Abadie Lana
Sent: 04 January 2019 09:18
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>; pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: [Possible Spoof] RE: select query does not pick up the right index
Warning: This message was sent by pgsql-performance-owner+M22888-112441(at)lists(dot)postgresql(dot)org supposedly on behalf of Abadie Lana <Lana(dot)Abadie(at)iter(dot)org>. Please contact
-----Original Message-----
From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Sent: 04 January 2019 00:48
To: Abadie Lana <Lana(dot)Abadie(at)iter(dot)org>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>; pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: select query does not pick up the right index
On Thu, Jan 03, 2019 at 12:57:27PM +0000, Abadie Lana wrote:
> Main parameters : effective_cache_size : 4GB, shared_buffers 4GB,
> work_mem 4MB
I doubt it will help much, but you should consider increasing work_mem, unless you have many expensive queries running at once.
Could you also send the rest of the pg_statistic for that table ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='...' AND tablename='...' ORDER BY 1 DESC;
Hmm. Is it normal that the couple (tablename,attname ) is not unique? I'm surprised to see sample_{ctrl,util,buil} quoted twice
css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 1 DESC;
frac_mcv | tablename | attname | null_frac | n_distinct | n_mcv | n_hist
----------+-------------------+------------+-----------+------------+-------+--------
1 | sample_buil_year | channel_id | 0 | 16 | 16 |
0.98249 | sample_ctrl | channel_id | 0 | 26 | 17 | 9
0.982333 | sample_ctrl_month | channel_id | 0 | 34 | 17 | 17
0.981533 | sample_ctrl | channel_id | 0 | 28 | 18 | 10
0.9371 | sample_ctrl_year | channel_id | 0 | 38 | 16 | 22
0.928767 | sample_buil_month | channel_id | 0 | 940 | 54 | 101
0.92535 | sample | channel_id | 0 | 2144 | 167 | 1001
0.907501 | sample_buil | channel_id | 0 | 565 | 43 | 101
0.8876 | sample_util_year | channel_id | 0 | 501 | 45 | 101
0.815 | sample_util | channel_id | 0 | 557 | 82 | 101
0.807667 | sample_buil | channel_id | 0 | 164 | 31 | 101
0.806267 | sample_util | channel_id | 0 | 732 | 100 | 101
0.803766 | sample_util_month | channel_id | 0 | 731 | 100 | 101
(13 rows)
Ah...sample_ctrl_year and sample_buil_year have n_distinct -1? Unlike sample_util_year. Could that explain the wrong choice?
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='smpl_time' AND tablename like 'sample%' ORDER BY 1 DESC;
frac_mcv | tablename | attname | null_frac | n_distinct | n_mcv | n_hist
------------+-------------------+-----------+-----------+-------------+-------+--------
| sample_ctrl_month | smpl_time | 0 | -1 | | 101
| sample_ctrl_year | smpl_time | 0 | -1 | | 101
| sample_ctrl | smpl_time | 0 | -1 | | 101
| sample_ctrl | smpl_time | 0 | -1 | | 101
| sample_buil_year | smpl_time | 0 | -1 | | 101
0.0154667 | sample_buil_month | smpl_time | 0 | 1.03857e+06 | 100 | 101
0.0154523 | sample_buil | smpl_time | 0 | 854250 | 100 | 101
0.0115 | sample_util | smpl_time | 0 | 405269 | 100 | 101
0.0112333 | sample_util | smpl_time | 0 | 537030 | 100 | 101
0.0106667 | sample_util_month | smpl_time | 0 | 539001 | 100 | 101
0.00946667 | sample_buil | smpl_time | 0 | -0.328554 | 100 | 101
0.00852342 | sample | smpl_time | 0 | 1.5125e+07 | 1000 | 1001
0.00780001 | sample_util_year | smpl_time | 0 | 1.73199e+06 | 100 | 101
(13 rows)
Based on your feedback...i rerun analyse directly on the two table sample_ctrl_year and sample_buil_year
The new values are
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 1 DESC;
frac_mcv | tablename | attname | null_frac | n_distinct | n_mcv | n_hist
----------+-------------------+------------+-----------+------------+-------+--------
0.99987 | sample_buil_year | channel_id | 0 | 76 | 16 | 60
0.999632 | sample_ctrl_year | channel_id | 0 | 132 | 31 | 101
0.999628 | sample_ctrl_month | channel_id | 0 | 84 | 23 | 61
0.999627 | sample_ctrl | channel_id | 0 | 132 | 31 | 101
0.999599 | sample_ctrl | channel_id | 0 | 42 | 22 | 20
0.998074 | sample_buil | channel_id | 0 | 493 | 122 | 371
0.997693 | sample_util | channel_id | 0 | 1379 | 509 | 870
0.991841 | sample_buil | channel_id | 0 | 9867 | 107 | 9740
0.991567 | sample_util_month | channel_id | 0 | 5716 | 504 | 5209
0.990369 | sample_util_year | channel_id | 0 | 4946 | 255 | 4689
0.990062 | sample_util | channel_id | 0 | 5804 | 641 | 5160
0.972386 | sample_buil_month | channel_id | 0 | 19946 | 148 | 10001
0.967391 | sample | channel_id | 0 | 7597 | 409 | 7178
(13 rows)
Now when running the query again, only for sample_buil_year table the wrong index is picked up...
explain (analyze, buffers) select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------
Limit (cost=13.40..30.01 rows=5 width=112) (actual time=13554.536..13554.570 rows=3 loops=1)
Buffers: shared hit=26626389 read=17
InitPlan 1 (returns $0)
-> Index Scan using unique_chname on channel (cost=0.41..8.43 rows=1 width=8) (actual time=26.858..26.860 rows=1 loop
s=1)
Index Cond: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text)
Buffers: shared hit=2 read=2
-> Result (cost=4.96..5131208.65 rows=1544048 width=112) (actual time=13554.534..13554.567 rows=3 loops=1)
Buffers: shared hit=26626389 read=17
-> Merge Append (cost=4.96..5115768.17 rows=1544048 width=80) (actual time=13554.531..13554.562 rows=3 loops=1)
Sort Key: c.smpl_time DESC
Buffers: shared hit=26626389 read=17
-> Index Scan Backward using smpl_time_qa_idx on sample c (cost=0.12..8.14 rows=1 width=326) (actual time=0
.005..0.005 rows=0 loops=1)
Filter: (channel_id = $0)
Buffers: shared hit=1
-> Index Scan Backward using sample_time_b_idx on sample_buil c_1 (cost=0.42..7775.26 rows=2096 width=320)
(actual time=38.931..38.932 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=3 read=4
-> Index Scan Backward using sample_time_c_idx on sample_ctrl c_2 (cost=0.42..77785.57 rows=22441 width=320
) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=3
-> Index Scan Backward using sample_time_u_idx on sample_util c_3 (cost=0.43..14922.72 rows=3830 width=320)
(actual time=8.939..8.939 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=1 read=2
-> Index Scan Backward using sample_time_bm_idx on sample_buil_month c_4 (cost=0.56..2967.10 rows=740 width
=74) (actual time=260.282..260.311 rows=3 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=3 read=5
-> Index Scan Backward using smpl_time_bx2_idx on sample_buil_year c_5 (cost=0.56..2023054.76 rows=665761 w
idth=75) (actual time=13216.589..13216.589 rows=0 loops=1)
Filter: (channel_id = $0)
Rows Removed by Filter: 50597834
Buffers: shared hit=26626368
-> Index Scan Backward using sample_time_cm_idx on sample_ctrl_month c_6 (cost=0.56..759241.36 rows=217585
width=75) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=4
-> Index Scan Backward using sample_time_cy_idx on sample_ctrl_year c_7 (cost=0.57..2097812.02 rows=602872
width=76) (actual time=0.007..0.007 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=4
-> Index Scan Backward using sample_time_um_idx on sample_util_month c_8 (cost=0.57..48401.65 rows=12418 wi
dth=75) (actual time=18.999..19.000 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=1 read=3
-> Index Scan Backward using sample_time_uy_idx on sample_util_year c_9 (cost=0.57..54293.22 rows=16304 wid
th=74) (actual time=10.739..10.739 rows=0 loops=1)
Index Cond: (channel_id = $0)
Buffers: shared hit=1 read=3
Planning time: 0.741 ms
Execution time: 13554.666 ms
(44 rows)
Looking more closely to the sample_buil_year table
select count(distinct channel_id),count(*) from sample_buil_year;
count | count
-------+----------
100 | 50597834
(1 row)
Now, the channel name I gave has no entries in sample_buil_year...(and when I run the query directly against sample_buil_year the right index is picked up).... So maybe something related with the partitioning?
select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample_buil_year c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;
?column? | smpl_time | nanosecs | float_val | num_val | str_val | datatype | array_val
----------+-----------+----------+-----------+---------+---------+----------+-----------
(0 rows)
css_archive_3_0_0=# explain analyze select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample_buil_year c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------
Limit (cost=9.00..21.31 rows=5 width=107) (actual time=0.055..0.055 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using unique_chname on channel (cost=0.41..8.43 rows=1 width=8) (actual time=0.038..0.040 rows=1 loops=
1)
Index Cond: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text)
-> Index Scan Backward using sample_time_by_idx on sample_buil_year c (cost=0.56..1639944.37 rows=665761 width=107) (ac
tual time=0.054..0.054 rows=0 loops=1)
Index Cond: (channel_id = $0)
Planning time: 0.178 ms
Execution time: 0.088 ms
(8 rows)
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2019-01-05 04:23:57 | Re: select query does not pick up the right index |
Previous Message | Abadie Lana | 2019-01-04 08:17:46 | RE: select query does not pick up the right index |