Re: Better performance possible for a pathological query?

From: Alexis Lê-Quôc <alq(at)datadoghq(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Better performance possible for a pathological query?
Date: 2013-08-07 16:28:45
Message-ID: CAAGz8TPfEP10twyxyWDkv1u2Gvhd9NE9EaXyM9x68-=1O2ip=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Aug 7, 2013 at 12:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= <alq(at)datadoghq(dot)com> writes:
> > The query itself is very simple: a primary key lookup on a 1.5x10^7 rows.
> > The issue is that we are looking up over 11,000 primary keys at once,
> > causing the db to consume a lot of CPU.
>
> It looks like most of the runtime is probably going into checking the
> c.key = ANY (ARRAY[...]) construct. PG isn't especially smart about that
> if it fails to optimize the construct into an index operation --- I think
> it's just searching the array linearly for each row meeting the other
> restrictions on c.
>
> You could try writing the test like this:
> c.key = ANY (VALUES (1), (17), (42), ...)
> to see if the sub-select code path gives better results than the array
> code path. In a quick check it looked like this might produce a hash
> join, which seemed promising anyway.
>
> regards, tom lane
>

Thank you very much Tom, your suggestion is spot on. Runtime decreased
100-fold, from 20s to 200ms with a simple search-and-replace.

Here's the updated plan for the record.

Nested Loop (cost=168.22..2116.29 rows=148 width=362) (actual
time=22.134..256.531 rows=10858 loops=1)
Buffers: shared hit=44967
-> Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37)
(actual time=0.071..0.073 rows=1 loops=1)
Index Cond: (id = 1)
Buffers: shared hit=4
-> Nested Loop (cost=168.22..2106.54 rows=148 width=329) (actual
time=22.060..242.406 rows=10858 loops=1)
Buffers: shared hit=44963
-> HashAggregate (cost=168.22..170.22 rows=200 width=4) (actual
time=21.529..32.820 rows=11215 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..140.19 rows=11215
width=4) (actual time=0.005..9.527 rows=11215 loops=1)
-> Index Scan using dim_context_pkey on dim_context c
(cost=0.00..9.67 rows=1 width=329) (actual time=0.015..0.016 rows=1
loops=11215)
Index Cond: (c.key = "*VALUES*".column1)
Filter: ((c.tags @> '{blah}'::text[]) AND (c.org_id = 1))
Buffers: shared hit=44963
Total runtime: 263.639 ms

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert DiFalco 2013-08-07 18:12:48 Efficiently query for the most recent record for a given user
Previous Message Tom Lane 2013-08-07 16:07:37 Re: Better performance possible for a pathological query?