Skip site navigation (1) Skip section navigation (2)

Re: Performance question 83 GB Table 150 million rows, distinct select

From: Tory M Blue <tmblue(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance question 83 GB Table 150 million rows, distinct select
Date: 2011-11-17 03:16:23
Message-ID: CAEaSS0bzhbBNf5jWHZg01tsp=GBtr1nihrTEzMLta_s_n-9Qbw@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Wed, Nov 16, 2011 at 7:02 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Wed, Nov 16, 2011 at 7:42 PM, Tory M Blue <tmblue(at)gmail(dot)com> wrote:
>> On Wed, Nov 16, 2011 at 6:27 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> On 17 Listopad 2011, 2:57, Scott Marlowe wrote:
>>>> On Wed, Nov 16, 2011 at 4:59 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>>>
>>>>> But you're right - you're not bound by I/O (although I don't know what
>>>>> are
>>>>> those 15% - iowait, util or what?). The COUNT(DISTINCT) has to actually
>>>>> keep all the distinct values to determine which are actually distinct.
>>>>
>>>> Actually I meant to comment on this, he is IO bound.  Look at % Util,
>>>> it's at 99 or 100.
>>>>
>>>> Also, if you have 16 cores and look at something like vmstat you'll
>>>> see 6% wait state.  That 6% represents one CPU core waiting for IO,
>>>> the other cores will add up the rest to 100%.
>>>
>>> Aaaah, I keep forgetting about this and I somehow ignored the iostat
>>> results too. Yes, he's obviously IO bound.
>>
>> I'm not so sure on the io-bound. Been battling/reading about it all
>> day. 1 CPU is pegged at 100%, but the disk is not. If I do something
>
> Look here in iostat:
>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00     3.50 3060.00    2.00 49224.00    20.00
>> 16.08     2.21    0.76   0.33  99.95
>
> See that last column, it's % utilization.   Once it hits 100% you are
> anywhere from pretty close to IO bound to right on past it.
>
> I agree with the previous poster, you should roll these up ahead of
> time into a materialized view for fast reporting.
>
Ya I'm getting mixed opinions on that. avg queue size is nothing and
await and svctime is nothing, so maybe I'm on the edge, but it's not
"at face value", the cause of the slow query times. I think the data
structure is, however as it seems I need to query against all the
data, I'm unclear how to best set that up. Partitioning is not the
answer it seems.

In response to

Responses

pgsql-performance by date

Next:From: Tomas VondraDate: 2011-11-17 03:47:23
Subject: Re: Performance question 83 GB Table 150 million rows, distinct select
Previous:From: Scott MarloweDate: 2011-11-17 03:04:38
Subject: Re: Performance question 83 GB Table 150 million rows, distinct select

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group