Re: Improving Performance of Query ~ Filter by A, Sort by B

From: Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Improving Performance of Query ~ Filter by A, Sort by B
Date: 2018-07-16 21:29:00
Message-ID: CABcidkJoH-Uv+nZ-mYLpfvAzuyOFbeKwYC47XUvsFZt+hFgQcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom and Jeff,

Thanks very much for the suggestions!

Here's what I've found so far after playing around for a few more days:

What is your default_statistics_target? What can you tell us about the
> distribution of parent_id? (exponential, power law, etc?). Can you show
> the results for select * from pg_stats where tablename='a' and
> attname='parent_id' \x\g\x ?

The default_statistics_target is 500, which I agree seems quite
insufficient for these purposes. I bumped this up to 2000, and saw some
improvement in the row count estimation, but pretty much the same query
plans. Unfortunately the distribution of counts is not intended to be
correlated to parent_id, which is one reason I imagine the histograms might
not be particularly effective unless theres one bucket for every value.
Here is the output you requested:

select * from pg_stats where tablename='a' and attname='parent_id';

schemaname | public
tablename | a
attname | parent_id
inherited | t
null_frac | 0
avg_width | 4
n_distinct | 18871
most_common_vals | {15503,49787,49786,24595,49784,17549, ...} (2000
values)
most_common_freqs |
{0.0252983,0.02435,0.0241317,0.02329,0.019095,0.0103967,0.00758833,0.004245,
...} (2000 values)
histogram_bounds |
{2,12,17,24,28,36,47,59,74,80,86,98,108,121,135,141,147,160,169,177,190,204,
...} (2001 values)
correlation | -0.161576
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |

Interestingly, the number of elements in these most_common_vals is as
expected (2000) for the parent table, but it's lower for the partition
tables, despite the statistics level being the same.

SELECT attstattarget
FROM pg_attribute
WHERE attrelid in ('a_partition1'::regclass, 'a'::regclass)
AND attname = 'parent_id';
-[ RECORD 1 ]-+-----
attstattarget | 2000
-[ RECORD 2 ]-+-----
attstattarget | 2000

select * from pg_stats where tablename='a_partition1' and
attname='parent_id';

schemaname | public
tablename | a_partition1
attname | parent_id
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 3969
most_common_vals |
{15503,49787,49786,24595,49784,10451,20136,17604,9683, ...} (400-ish values)
most_common_freqs |
{0.0807067,0.0769483,0.0749433,0.073565,0.0606433,0.0127917,0.011265,0.0112367,
...} (400-ish values)
histogram_bounds | {5,24,27,27,33,38,41,69,74,74, ...} (1500-ish
values)
correlation | 0.402414
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |

A few questions re: statistics:
1) would it be helpful to bump column statistics to, say, 20k (the number
of distinct values of parent_id)?
2) is the discrepancy between the statistics on the parent and child table
be expected? certainly I would think that the statistics would be
different, but I would've imagined they would have histograms of the same
size given the settings being the same.
3) is there a way to manually specify the the distribution of rows to be
even? that is, set the frequency of each value to be ~ n_rows/n_distinct.
This isn't quite accurate, but is a reasonable assumption about the
distribution, and might generate better query plans.

The same thing could be obtained by doing a dummy operation, such as ORDER
> BY tmstmp + '0 seconds' DESC. I prefer that method, as it is more
> obviously a tuning trick. Adding in "id" looks more like a legitimate
> desire to break any ties that might occasionally occur in tmstmp.

I 100% agree that that is more clear. Thanks for the suggestion!

Finally, could you rewrite it as a join to a VALUES list, rather than as an
> in-list?

I should've mentioned this in my initial post, but I tried doing this
without much success.

You could try reversing the order and adding a column to be (tmstmp,
> parent_id, id) and keeping the table well vacuumed. This would allow the
> slow plan to still walk the indexes in tmstmp order but do it as an
> index-only scan, so it could omit the extra trip to the table. That trip to
> the table must be awfully slow to explain the numbers you show later in the
> thread.

Just to clarify, do you mean building indexes like:
CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on
"a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
That seems promising! Is the intuition here that we want the first key of
the index to be the one we are ultimately ordering by? Sounds like I make
have had that flipped initially. My understanding of this whole situation
(and please do correct me if this doesn't make sense) is the big bottleneck
here is reading pages from disk (when looking at stopped up queries, the
wait_event is DataFileRead), and so anything that can be done to minimize
the pages read will be valuable. Which is why I would love to get the query
plan to use the tmstmp index without having to filter thereafter by
parent_id.

What happens when you remove that extra order by phrase that you added?
> The original slow plan should become much faster when the number of
> parent_ids is large (it has to dig through fewer index entries before
> accumulating 20 good ones), so you should try going back to that.

Unfortunately, I've found that even when the number of parent_ids is large
(2000), it's still prohibitively slow (haven't got it to finish), and
maintains a query plan that involves an Index Scan Backward across the
a_tmstmp_idxs (with a filter for parent_id).

And if not, could you just get rid of the partitioning altogether? 1e7 row
> is not all that many and doesn't generally need partitioning. Unless it is
> serving a specific purpose, it is probably costing you more than you are
> getting.

Unfortunately, the partitioning is serving a specific purpose (to avoid
some writing overhead, it's useful to be able to drop GIN indexes on one
partition at a time during heavy writes). But given that the queries are
slow on a single partition anyway, I suspect the partitioning isn't the
main problem here.

But if they don't, is there any chance you could redesign your partitioning
> so that all parent_id queries together will always be in the same partition?

In effect, the parent_ids are distributed by part_key, so I've found that
at least a stopgap solution is manually splitting apart the parent_ids by
partition when generating the query. This has given the following query
plan, which still doesn't use the tmstmp index (which would be ideal and to
my understanding allow proper use of the LIMIT to minimize reads), but it
does make things an order of magnitude faster by forcing the use of Bitmap.
Apologies for the giant output; this approach balloons the size of the
query itself.

SELECT "a"."id"
FROM "a" WHERE (
("a"."parent_id" IN (

14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100
)
AND
"a"."part_key" = 1
)
OR (
"a"."parent_id" IN (

50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348
)
AND
"a"."part_key" = 2
)
OR (
"a"."parent_id" IN (

33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156
)
AND
"a"."part_key" = 3
)
OR (
"a"."parent_id" IN (

42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071
)
AND
"a"."part_key" = 4
)
OR (
"a"."parent_id" IN (

11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765
)
AND
"a"."part_key" = 5
)
)
ORDER BY
"a"."tmstmp" DESC, "a"."id" DESC
LIMIT 20;

Limit (cost=449977.86..449977.91 rows=20 width=1412) (actual
time=8967.465..8967.477 rows=20 loops=1)
Output: a_partition1.id, a_partition1.tmstmp
Buffers: shared hit=1641 read=125625 written=13428
-> Sort (cost=449977.86..450397.07 rows=167683 width=1412) (actual
time=8967.464..8967.468 rows=20 loops=1)
Output: a_partition1.id, a_partition1.tmstmp
Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC
Sort Method: top-N heapsort Memory: 85kB
Buffers: shared hit=1641 read=125625 written=13428
-> Append (cost=2534.33..445515.88 rows=167683 width=1412)
(actual time=1231.579..8756.610 rows=145077 loops=1)
Buffers: shared hit=1641 read=125625 written=13428
-> Bitmap Heap Scan on public.a_partition1
(cost=2534.33..246197.54 rows=126041 width=1393) (actual
time=1231.578..4414.027 rows=115364 loops=1)
Output: a_partition1.id, a_partition1.tmstmp
Recheck Cond: ((a_partition1.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
OR (a_partition1.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
OR (a_partition1.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
OR (a_partition1.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
OR (a_partition1.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
Filter: (((a_partition1.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
AND (a_partition1.part_key = 1)) OR ((a_partition1.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
AND (a_partition1.part_key = 2)) OR ((a_partition1.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
AND (a_partition1.part_key = 3)) OR ((a_partition1.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
AND (a_partition1.part_key = 4)) OR ((a_partition1.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
AND (a_partition1.part_key = 5)))
Heap Blocks: exact=93942
Buffers: shared hit=397 read=94547 written=6930
-> BitmapOr (cost=2534.33..2534.33 rows=192032
width=0) (actual time=1214.569..1214.569 rows=0 loops=1)
Buffers: shared hit=397 read=605 written=10
-> Bitmap Index Scan on a_parent_id_idx1
(cost=0.00..1479.43 rows=126041 width=0) (actual time=1091.952..1091.952
rows=115364 loops=1)
Index Cond: (a_partition1.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
Buffers: shared hit=82 read=460 written=8
-> Bitmap Index Scan on a_parent_id_idx1
(cost=0.00..193.55 rows=14233 width=0) (actual time=26.911..26.911 rows=0
loops=1)
Index Cond: (a_partition1.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
Buffers: shared hit=66 read=33
-> Bitmap Index Scan on a_parent_id_idx1
(cost=0.00..275.65 rows=20271 width=0) (actual time=41.874..41.874 rows=0
loops=1)
Index Cond: (a_partition1.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
Buffers: shared hit=97 read=45 written=1
-> Bitmap Index Scan on a_parent_id_idx1
(cost=0.00..152.49 rows=11214 width=0) (actual time=23.542..23.542 rows=0
loops=1)
Index Cond: (a_partition1.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
Buffers: shared hit=52 read=26 written=1
-> Bitmap Index Scan on a_parent_id_idx1
(cost=0.00..275.65 rows=20271 width=0) (actual time=30.271..30.271 rows=0
loops=1)
Index Cond: (a_partition1.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
Buffers: shared hit=100 read=41
-> Bitmap Heap Scan on public.a_partition2
(cost=850.51..78634.20 rows=19852 width=1485) (actual
time=316.458..2166.105 rows=16908 loops=1)
Output: a_partition2.id, a_partition1.tmstmp
Recheck Cond: ((a_partition2.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
OR (a_partition2.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
OR (a_partition2.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
OR (a_partition2.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
OR (a_partition2.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
Filter: (((a_partition2.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
AND (a_partition2.part_key = 1)) OR ((a_partition2.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
AND (a_partition2.part_key = 2)) OR ((a_partition2.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
AND (a_partition2.part_key = 3)) OR ((a_partition2.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
AND (a_partition2.part_key = 4)) OR ((a_partition2.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
AND (a_partition2.part_key = 5)))
Heap Blocks: exact=17769
Buffers: shared hit=402 read=18034 written=3804
-> BitmapOr (cost=850.51..850.51 rows=59567 width=0)
(actual time=313.191..313.191 rows=0 loops=1)
Buffers: shared hit=402 read=265 written=40
-> Bitmap Index Scan on a_parent_id_idx2
(cost=0.00..155.81 rows=11177 width=0) (actual time=65.671..65.671 rows=0
loops=1)
Index Cond: (a_partition2.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
Buffers: shared hit=84 read=57 written=11
-> Bitmap Index Scan on a_parent_id_idx2
(cost=0.00..272.08 rows=19852 width=0) (actual time=116.974..116.974
rows=18267 loops=1)
Index Cond: (a_partition2.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
Buffers: shared hit=68 read=98 written=18
-> Bitmap Index Scan on a_parent_id_idx2
(cost=0.00..155.81 rows=11177 width=0) (actual time=58.915..58.915 rows=0
loops=1)
Index Cond: (a_partition2.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
Buffers: shared hit=100 read=41 written=5
-> Bitmap Index Scan on a_parent_id_idx2
(cost=0.00..86.19 rows=6183 width=0) (actual time=25.370..25.370 rows=0
loops=1)
Index Cond: (a_partition2.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
Buffers: shared hit=53 read=25 written=3
-> Bitmap Index Scan on a_parent_id_idx2
(cost=0.00..155.81 rows=11177 width=0) (actual time=46.254..46.254 rows=0
loops=1)
Index Cond: (a_partition2.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
Buffers: shared hit=97 read=44 written=3
-> Bitmap Heap Scan on public.a_partition3
(cost=692.99..56079.33 rows=13517 width=1467) (actual
time=766.172..1555.761 rows=7594 loops=1)
Output: a_partition3.id, a_partition1.tmstmp
Recheck Cond: ((a_partition3.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
OR (a_partition3.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
OR (a_partition3.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
OR (a_partition3.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
OR (a_partition3.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
Filter: (((a_partition3.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
AND (a_partition3.part_key = 1)) OR ((a_partition3.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
AND (a_partition3.part_key = 2)) OR ((a_partition3.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
AND (a_partition3.part_key = 3)) OR ((a_partition3.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
AND (a_partition3.part_key = 4)) OR ((a_partition3.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
AND (a_partition3.part_key = 5)))
Heap Blocks: exact=7472
Buffers: shared hit=432 read=7682 written=1576
-> BitmapOr (cost=692.99..692.99 rows=42391 width=0)
(actual time=764.238..764.238 rows=0 loops=1)
Buffers: shared hit=432 read=210 written=51
-> Bitmap Index Scan on a_parent_id_idx3
(cost=0.00..138.53 rows=8870 width=0) (actual time=118.907..118.907 rows=0
loops=1)
Index Cond: (a_partition3.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
Buffers: shared hit=91 read=52 written=7
-> Bitmap Index Scan on a_parent_id_idx3
(cost=0.00..97.27 rows=6228 width=0) (actual time=26.656..26.656 rows=0
loops=1)
Index Cond: (a_partition3.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
Buffers: shared hit=71 read=28 written=3
-> Bitmap Index Scan on a_parent_id_idx3
(cost=0.00..225.13 rows=13517 width=0) (actual time=393.115..393.115
rows=7594 loops=1)
Index Cond: (a_partition3.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
Buffers: shared hit=107 read=74 written=25
-> Bitmap Index Scan on a_parent_id_idx3
(cost=0.00..76.64 rows=4907 width=0) (actual time=36.979..36.979 rows=0
loops=1)
Index Cond: (a_partition3.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
Buffers: shared hit=56 read=22 written=9
-> Bitmap Index Scan on a_parent_id_idx3
(cost=0.00..138.53 rows=8870 width=0) (actual time=188.574..188.574 rows=0
loops=1)
Index Cond: (a_partition3.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
Buffers: shared hit=107 read=34 written=7
-> Bitmap Heap Scan on public.a_partition4
(cost=709.71..64604.81 rows=8273 width=1470) (actual time=268.111..543.570
rows=5211 loops=1)
Output: a_partition4.id, a_partition1.tmstmp
Recheck Cond: ((a_partition4.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
OR (a_partition4.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
OR (a_partition4.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
OR (a_partition4.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
OR (a_partition4.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
Filter: (((a_partition4.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
AND (a_partition4.part_key = 1)) OR ((a_partition4.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
AND (a_partition4.part_key = 2)) OR ((a_partition4.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
AND (a_partition4.part_key = 3)) OR ((a_partition4.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
AND (a_partition4.part_key = 4)) OR ((a_partition4.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
AND (a_partition4.part_key = 5)))
Heap Blocks: exact=5153
Buffers: shared hit=410 read=5362 written=1118
-> BitmapOr (cost=709.71..709.71 rows=48654 width=0)
(actual time=267.028..267.028 rows=0 loops=1)
Buffers: shared hit=410 read=209 written=50
-> Bitmap Index Scan on a_parent_id_idx4
(cost=0.00..153.69 rows=10908 width=0) (actual time=60.586..60.586 rows=0
loops=1)
Index Cond: (a_partition4.parent_id = ANY
('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
Buffers: shared hit=90 read=52 written=11
-> Bitmap Index Scan on a_parent_id_idx4
(cost=0.00..107.91 rows=7659 width=0) (actual time=47.041..47.041 rows=0
loops=1)
Index Cond: (a_partition4.parent_id = ANY
('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
Buffers: shared hit=64 read=35 written=7
-> Bitmap Index Scan on a_parent_id_idx4
(cost=0.00..153.69 rows=10908 width=0) (actual time=54.352..54.352 rows=0
loops=1)
Index Cond: (a_partition4.parent_id = ANY
('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
Buffers: shared hit=101 read=40 written=8
-> Bitmap Index Scan on a_parent_id_idx4
(cost=0.00..130.39 rows=8273 width=0) (actual time=54.690..54.690 rows=5220
loops=1)
Index Cond: (a_partition4.parent_id = ANY
('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
Buffers: shared hit=56 read=40 written=13
-> Bitmap Index Scan on a_parent_id_idx4
(cost=0.00..153.69 rows=10908 width=0) (actual time=50.353..50.353 rows=0
loops=1)
Index Cond: (a_partition4.parent_id = ANY
('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
Buffers: shared hit=99 read=42 written=11
Planning time: 8.002 ms
Execution time: 8967.750 ms

When I remove the ID sorting hack from this query, it goes back to a nasty
index scan on tmstmp with a filter key on the whole WHERE clause.

Thanks again for your help!

On Sat, Jul 14, 2018 at 11:25 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Tue, Jul 10, 2018 at 11:07 AM, Lincoln Swaine-Moore <
> lswainemoore(at)gmail(dot)com> wrote:
>
>>
>>
>>
>> Something about the estimated row counts (this problem persisted after I
>> tried ANALYZEing)
>>
>
> What is your default_statistics_target? What can you tell us about the
> distribution of parent_id? (exponential, power law, etc?). Can you show
> the results for select * from pg_stats where tablename='a' and
> attname='parent_id' \x\g\x ?
>
>
>> forces usage of the tmstmp index and Merge Append (which seems wise) but
>> also a filter condition on parent_id over an index condition, which is
>> apparently prohibitively slow.
>>
>> I tried creating a multicolumn index like:
>>
>> CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2"
>> USING btree ("parent_id", "tmstmp" DESC);
>>
>> But this didn't help (it wasn't used).
>>
>
> You could try reversing the order and adding a column to be (tmstmp,
> parent_id, id) and keeping the table well vacuumed. This would allow the
> slow plan to still walk the indexes in tmstmp order but do it as an
> index-only scan, so it could omit the extra trip to the table. That trip to
> the table must be awfully slow to explain the numbers you show later in the
> thread.
>
> ...
>
>
>> This query plan (which is the same as when LIMIT is removed) has been a
>> good short term solution when the number of "parent_id"s I'm using is still
>> relatively small, but unfortunately queries grow untenably slow as the
>> number of "parent_id"s involved increases:
>>
>
> What happens when you remove that extra order by phrase that you added?
> The original slow plan should become much faster when the number of
> parent_ids is large (it has to dig through fewer index entries before
> accumulating 20 good ones), so you should try going back to that.
>
> ...
>
>
>> I'd be very grateful for help with one or both of these questions:
>> 1) Why is adding an unnecessary (from the perspective of result
>> correctness) ORDER BY valuable for forcing the parent_id index usage, and
>> does that indicate that there is something wrong with my
>> table/indexes/statistics/etc.?
>>
>
> It finds the indexes on tmstmp to be falsely attractive, as it can walk in
> tmstmp order and so avoid the sort. (Really not the sort itself, but the
> fact that sort has to first read every row to be sorted, while walking an
> index can abort once the LIMIT is satisfied). Adding an extra phrase to
> the ORDER BY means the index is no longer capable of delivering rows in the
> needed order, so it no longer looks falsely attractive. The same thing
> could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0
> seconds' DESC. I prefer that method, as it is more obviously a tuning
> trick. Adding in "id" looks more like a legitimate desire to break any
> ties that might occasionally occur in tmstmp.
>
> As Tom pointed out, there clearly is something wrong with your statistics,
> although we don't know what is causing it to go wrong. Fixing the
> statistics isn't guaranteed to fix the problem, but it would be a good
> start.
>
>
>
>
>> 2) Is there any way I can improve my query time when there are many
>> "parent_id"s involved? I seem to only be able to get the query plan to use
>> at most one of the parent_id index and the tmstmp index at a time. Perhaps
>> the correct multicolumn index would help?
>>
>
> A few things mentioned above might help.
>
> But if they don't, is there any chance you could redesign your
> partitioning so that all parent_id queries together will always be in the
> same partition? And if not, could you just get rid of the partitioning
> altogether? 1e7 row is not all that many and doesn't generally need
> partitioning. Unless it is serving a specific purpose, it is probably
> costing you more than you are getting.
>
> Finally, could you rewrite it as a join to a VALUES list, rather than as
> an in-list?
>
> Cheers,
>
> Jeff
>

--
Lincoln Swaine-Moore

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neto pr 2018-07-17 05:00:41 Why HDD performance is better than SSD in this case
Previous Message Jeff Janes 2018-07-15 03:25:21 Re: Improving Performance of Query ~ Filter by A, Sort by B