Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

From: Richard Huxton <dev(at)archonet(dot)com>
To: Tory M Blue <tmblue(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Date: 2010-01-22 18:25:00
Message-ID: 4B59ED7C.2060908@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 22/01/10 18:03, Tory M Blue wrote:
> On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton<dev(at)archonet(dot)com> wrote:
>
>> On 21/01/10 22:15, Tory M Blue wrote:

>> 2. If it's mostly tagged=true you are interested in you can always use a
>> partial index: CREATE INDEX ... (makeid) WHERE tagged
>> This might be a win even if you need a second index with WHERE NOT tagged.
>>
>
> Partial index doesn't seem to fit here due to the fact that there are 35-40%
> Marked True.
>
> Didn't think about creating a second index for false, may give that a shot.

If you're mostly search tagged=true, try the partial index - it'll mean
the planner is just scanning the index for the one term.

>> Also, either I've not had enough cofee yet, or a bitmap scan is an odd
>> choice for only ~ 13000 rows out of 100 million.
>>
>> * " -> Bitmap Index Scan on
>>>
>>> idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)"*
>>>
>>> * " Index Cond: ((makeid =
>>> 'b1mw-ferman'::text)
>>> AND (tagged = true))"*
>>>
>>
>> Otherwise, see what Craig said.
>>
>> I'm assuming this isn't the query that is CPU bound for a long time. Unless
>> your table is horribly bloated, there's no reason for that judging by this
>> plan.
>
> It is, but not always, only when there are 10K more matches. And the explain
> unfortunately is sometimes way high or way low, so the expalin is hit and
> miss.
>
> But the same sql that returns maybe 500 rows is pretty fast, it's the return
> of 10K+ rows that seems to stall and is CPU Bound.

Hmm - might be able to push that cross-over point up a bit by tweaking
various costs, but you've got to be careful you don't end up making all
your other queries worse. It'd be good to figure out what the problem is
first.

Looking at the query there are four stages:
1. Scan the index, build a bitmap of heap pages with matching rows
2. Scan those pages, find the rows that match
3. Run DISTINCT on the uids
4. Count them
I wonder if it could be the DISTINCT. What happens with a count(*) or
count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful
than straight EXPLAIN here. That will show actual times for each stage.

On Craig's branch of this thread, you say you call it 6000 times with
different "makeid"s. Any reason why you can't join to a temp table and
just do it in one query?

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2010-01-22 18:26:50 Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Previous Message Tory M Blue 2010-01-22 18:03:35 Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL