Re: Slow query performance

From: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
To: Kevin Galligan <kgalligan(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Slow query performance
Date: 2008-11-02 18:26:05
Message-ID: 490DF0BD.5050903@familiedobbelsteen.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kevin Galligan wrote, On 29-10-08 23:35:
> An example of a slow query is...
>
> select count(*) from bigdatatable where age between 22 and 40 and state
> = 'NY';
>
> explain analyze returned the following...
>
> Aggregate (cost=5179639.55..5179639.56 rows=1 width=0) (actual
> time=389529.895..389529.897 rows=1 loops=1)
> -> Bitmap Heap Scan on bigdatatable (cost=285410.65..5172649.63
> rows=2795968 width=0) (actual time=6727.848..387159.175
> rows=2553273 loops=1)
> Recheck Cond: ((state)::text = 'NY'::text)
> Filter: ((age >= 22) AND (age <= 40))
> -> Bitmap Index Scan on idx_jstate (cost=0.00..284711.66
> rows=15425370 width=0) (actual time=6298.950..6298.950
> ro ws=16821828 loops=1)
> Index Cond: ((state)::text = 'NY'::text)
> Total runtime: 389544.088 ms
>
> It looks like the index scans are around 6 seconds or so each, but then
> the bitmap heap scan and aggregate jump up to 6 mintues.

Indeed. Its cause is that PostGreSQL must traverse the data in order to
verify if the data is valid for the transaction. This means A LOT of
data must be retrieved from disk.

The only real thing you can do is reduce I/O load, by reducing the
amount of data that must be traversed (or ensuring the data is stored
closely together, but thats really hard to get right). This requires
optimizing your database design for that single goal.
This will not make it scale any better than it currently does, however.
The query will scale O(N) with the size of your table, you want other
techniques to do better.

Another thing is spending extra money on hardware that can sustain
higher I/O seek rates (more and/or faster spindles).

- Joris

> More detail on the table design and other stuff in a bit...
>
>
> On Wed, Oct 29, 2008 at 6:18 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com
> <mailto:scott(dot)marlowe(at)gmail(dot)com>> wrote:
>
> On Wed, Oct 29, 2008 at 2:18 PM, Kevin Galligan <kgalligan(at)gmail(dot)com
> <mailto:kgalligan(at)gmail(dot)com>> wrote:
> > I'm approaching the end of my rope here. I have a large database.
> > 250 million rows (ish). Each row has potentially about 500 pieces of
> > data, although most of the columns are sparsely populated.
>
> A couple of notes here. PostgreSQL stores null values as a single bit
> in a bit field, making sparsely populated tables quite efficient as
> long as you store the non-existent values as null and not '' or some
> other real value.
>
> Have you run explain analyze on your queries yet? Pick a slow one,
> run explain analyze on it and post it and we'll see what we can do.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joris Dobbelsteen 2008-11-02 18:41:40 Re: Are there plans to add data compression feature to postgresql?
Previous Message Andreas Kretschmer 2008-11-02 07:30:47 Re: Error in Adding All Tables