Re: Changing ORDER BY column slows query dramatically

From: brick pglists <brickpglists(at)gmail(dot)com>
To: sthomas(at)optionshouse(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Changing ORDER BY column slows query dramatically
Date: 2013-04-12 20:38:00
Message-ID: CALXym+K1QmKPnvBRs+c2M7J+VPTAdpobDoTrqibG6+9mpW8oaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Shaun,

On Fri, Apr 12, 2013 at 12:59 PM, Shaun Thomas <sthomas(at)optionshouse(dot)com> wrote:
> On 04/12/2013 11:51 AM, brick pglists wrote:
>
> Since it doesn't show up in your GUC list, you should probably increase your
> default_statistics_target to 400 or more, analyze, and try again. The
> heuristics for the dates aren't complete enough, so it thinks there are few
> matches. If that doesn't work and you want a quick, but ugly fix for this,
> you can create the following index:
>
> CREATE INDEX event_20130406_id_desc_tstamp_utc_idx
> ON event_20130406 (id DESC, tstamp_utc);

Thanks for your suggestions. Bumping up the default_statistics_target
several times all the way to 4000 (ANALYZEd each time) did not help,
however, adding the index you suggested helped with that query. It is
still over a magnitude slower than the version that sorts by
tstamp_utc, but it's a start. I created a similar index (CREATE INDEX
event_20130406_id_desc_tstamp_utc_desc_idx ON event_20130406 (id DESC,
tstamp_utc DESC)) where both columns were sorted DESCm and given the
choice between those two, it chose the latter.
Setting enable_mergejoin to false results in a plan much closer to the
original fast one, and further changing cpu_tuple_cost up to 1 results
in a query about 3x slower than the original fast one.

The ORDER BY e.id query, with the new index, enable_mergejoin
disabled, and cpu_tuple_cost bumped up to 1:

Limit (cost=125386.16..126640.02 rows=100 width=42) (actual
time=220.807..221.864 rows=100 loops=1)
Buffers: shared hit=49171 read=6770
I/O Timings: read=44.980
-> Nested Loop (cost=0.00..7734858.92 rows=616883 width=42)
(actual time=110.718..213.923 rows=10100 loops=1)
Buffers: shared hit=49171 read=6770
I/O Timings: read=44.980
-> Index Scan using
event_20130406_id_desc_tstamp_utc_desc_idx on event_20130406 e
(cost=0.00..2503426.81 rows=1851068 width=34) (actual
time=110.690..139.001 rows=10100 loops=1)
Index Cond: ((tstamp_utc >= '2013-04-06
10:00:00'::timestamp without time zone) AND (tstamp_utc <= '2013-04-06
18:00:00'::timestamp without time zone))
Filter: ((date_utc = '2013-04-06'::date) AND (org_id = 216471))
Rows Removed by Filter: 1554
Buffers: shared hit=8647 read=6770
I/O Timings: read=44.980
-> Index Scan using
notification_counts_by_status_20130406_event_id_org_id_pk on
notification_counts_by_status_20130406 ncbs (cost=0.00..1.83 rows=1
width=16) (actual time=0.003..0.004 rows=1 loops=10100)
Index Cond: (event_id = e.id)
Filter: ((event_creation_tstamp_utc >= '2013-04-06
10:00:00'::timestamp without time zone) AND (event_creation_tstamp_utc
<= '2013-04-06 18:00:00'::timestamp without time zone) AND (status =
'DELIVERED'::text))
Buffers: shared hit=40524
Total runtime: 222.127 ms
(17 rows)

Still not at the ~90ms from the "ORDER BY e.tstamp_utc DESC" version,
but not too bad. Now I need to figure out how I can get the best plan
choice without monkeying around with enable_mergejoin and changing
cpu_tuple_cost too much.

If any more suggestions are forthcoming, I am all ears!

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nik Tek 2013-04-12 21:05:40 Recommended Swap space
Previous Message Shaun Thomas 2013-04-12 19:59:03 Re: Changing ORDER BY column slows query dramatically