Re: Slow Count-Distinct Query

From: Christopher Jackson <crjackso(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, sthomas(at)optionshouse(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Count-Distinct Query
Date: 2014-03-31 15:53:33
Message-ID: CAN81C19HzHMGxPZj9r+=o6EBg=4ubDP0-eJ1+9Oe_qPVfzWEYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom and Shawn,

Thanks for the feedback. This has been helpful. It's worth noting that
I was spiking this out on my local box using default memory utilization
settings. I'll revisit this once we get our production box set up. It's
good to know what the best practices are around the enable_bitmapscan and
seq_page_cost settings are. Also, it's good to know that my hack wasn't
actually yielding anything. I'll check back in once our production
environment is up and running. For what it's worth, we're using Heroku and
we're thinking of going with the Standard Tengu tier as a start. This will
give us 1.7GB of RAM, so hopefully bumping up the work_mem setting
shouldn't be a problem. Does that make sense?

Thanks for the help,
Chris

On Mon, Mar 31, 2014 at 9:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Christopher Jackson <crjackso(at)gmail(dot)com> writes:
> > tl;dr - How can I speed up my count-distinct query?
>
> EXPLAIN doesn't provide a lot of visibility into what the Aggregate plan
> node is doing, but in this case what it's doing is an internal sort/uniq
> operation to implement the DISTINCT. You didn't say what value of
> work_mem you're using, but it'd need to be probably 50-100MB to prevent
> that sort from spilling to disk (and therefore being slow).
>
> Note that the indexscan is actually *slower* than the seqscan so far as
> the table access is concerned; if the table were big enough to not fit
> in RAM, this would get very much worse. So I'm not impressed with trying
> to force the optimizer's hand as you've done here --- it might be a nice
> plan now, but it's brittle. See if a bigger work_mem improves matters
> enough with the regular plan.
>
> > *I'm concerned about setting the enable_bitmapscan and seq_page_cost
> values
> > because I'm not yet sure what the consequences are. Can anyone enlighten
> > me on the recommended way to speed up this query?*
>
> Turning off enable_bitmapscan globally would be a seriously bad idea.
> Changing the cost settings to these values globally might be all right;
> it would amount to optimizing for all-in-memory cases, which might or
> might not be a good idea for your situation. For that matter, greatly
> increasing work_mem globally is usually not thought to be smart either;
> remember that it's a per-sort-operation setting so you may need to
> provision a considerable multiple of the setting as physical RAM,
> depending on how many queries you expect to run concurrently. So all in
> all you might be well advised to just set special values for this one
> query, whichever solution approach you use.
>
> I doubt you need the "where email=email" hack, in any case. That isn't
> forcing the optimizer's decision in any meaningful fashion.
>
> regards, tom lane
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Niels Kristian Schjødt 2014-03-31 17:16:58 Re: Sudden crazy high CPU usage
Previous Message Scott Marlowe 2014-03-31 14:50:22 Re: Sudden crazy high CPU usage