Better performance possible for a pathological query?

From: Alexis Lê-Quôc <alq(at)datadoghq(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Better performance possible for a pathological query?
Date: 2013-08-07 15:38:47
Message-ID: CAAGz8TOnHrOJyHwr8Zo2U6smY_UE3GO=aYcyjkgLeTKfM3K41g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I've seen a couple of bad queries go through one instance and I'm wondering
whether there's something simple that can be done to help.

Not running the query in the first place is what I am looking to do
ultimately but in the meantime, I'm interested in understanding more about
the plan below.

The query itself is very simple: a primary key lookup on a 1.5x10^7 rows.
The issue is that we are looking up over 11,000 primary keys at once,
causing the db to consume a lot of CPU.

Where I'm not sure I follow, is the discrepancy between the planned and
actual rows.

Bitmap Heap Scan on dim_context c (cost=6923.33..11762.31 rows=1
width=329) (actual time=17128.121..22031.783 rows=10858 loops=1)

Would a sequential scan be more beneficial? The table itself is about 10GB
on a 64GB box, 30% of these 10GB are buffered in RAM from what I can tell.

Thanks for your help,

Alexis

Here are the full details.

explain (analyze, buffers)
SELECT c.key,
c.x_id,
c.tags,
c.source_type_id,
x.api_key
FROM dim_context c
join x on c.x_id = x.id
WHERE c.key = ANY (ARRAY[15368196, (11,000 other keys)])
AND ((c.x_id = 1 AND c.tags @> ARRAY[E'blah']))

Here is the plan, abridged

Nested Loop (cost=6923.33..11770.59 rows=1 width=362) (actual
time=17128.188..22109.283 rows=10858 loops=1)
Buffers: shared hit=83494
-> Bitmap Heap Scan on dim_context c (cost=6923.33..11762.31 rows=1
width=329) (actual time=17128.121..22031.783 rows=10858 loops=1)
Recheck Cond: ((tags @> '{blah}'::text[]) AND (x_id = 1))
Filter: (key = ANY ('{15368196,(a lot more keys
here)}'::integer[]))
Buffers: shared hit=50919
-> BitmapAnd (cost=6923.33..6923.33 rows=269 width=0) (actual
time=132.910..132.910 rows=0 loops=1)
Buffers: shared hit=1342
-> Bitmap Index Scan on dim_context_tags_idx
(cost=0.00..1149.61 rows=15891 width=0) (actual time=64.614..64.614
rows=264777 loops=1)
Index Cond: (tags @> '{blah}'::text[])
Buffers: shared hit=401
-> Bitmap Index Scan on dim_context_x_id_source_type_id_idx
(cost=0.00..5773.47 rows=268667 width=0) (actual time=54.648..54.648
rows=267659 loops=1)
Index Cond: (x_id = 1)
Buffers: shared hit=941
-> Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37)
(actual time=0.003..0.004 rows=1 loops=10858)
Index Cond: (x.id = 1)
Buffers: shared hit=32575
Total runtime: 22117.417 ms

And here are the stats

attname | null_frac | avg_width | n_distinct | correlation
----------------+-----------+-----------+------------+-------------
key | 0 | 4 | -1 | 0.999558
x_id | 0 | 4 | 1498 | 0.351316
h_id | 0.05632 | 4 | 116570 | 0.653092
tags | 0.0544567 | 284 | 454877 | -0.169626
source_type_id | 0 | 4 | 23 | 0.39552
handle | 0 | 248 | -1 | 0.272456
created | 0 | 8 | -0.645231 | 0.999559
modified | 0 | 8 | -0.645231 | 0.999559

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Igor Neyman 2013-08-07 15:50:02 Re: Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
Previous Message slapo 2013-08-07 15:33:49 RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.