Re: Poor performance on seq scan

From: Piotr Kołaczkowski <P(dot)Kolaczkowski(at)elka(dot)pw(dot)edu(dot)pl>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Poor performance on seq scan
Date: 2006-09-12 12:10:02
Message-ID: 200609121410.02710.P.Kolaczkowski@elka.pw.edu.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote:
> Laszlo Nagy wrote:
> > I made another test. I create a file with the identifiers and names of
> > the products:
> >
> > psql#\o products.txt
> > psql#select id,name from product;
> >
> > Then I can search using grep:
> >
> > grep "Mug" products.txt | cut -f1 -d\|
> >
> > There is a huge difference. This command runs within 0.5 seconds. That
> > is, at least 76 times faster than the seq scan. It is the same if I
> > vacuum, backup and restore the database. I thought that the table is
> > stored in one file, and the seq scan will be actually faster than
> > grepping the file. Can you please tell me what am I doing wrong? I'm
> > not sure if I can increase the performance of a seq scan by adjusting
> > the values in postgresql.conf. I do not like the idea of exporting the
> > product table periodically into a txt file, and search with grep. :-)
>
> Is there any other columns besides id and name in the table? How big is
> products.txt compared to the heap file?
>
> > Another question: I have a btree index on product(name). It contains
> > all product names and the identifiers of the products. Wouldn't it be
> > easier to seq scan the index instead of seq scan the table? The index
> > is only 66MB, the table is 1123MB.
>
> Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
> depends on how many matches there is. If you scan the index and then
> fetch the matching rows from the heap, you're doing random I/O to the
> heap. That becomes slower than scanning the heap sequentially if you're
> going to get more than a few hits.

Why match rows from the heap if ALL required data are in the index itself?
Why look at the heap at all?

This is the same performance problem in PostgreSQL I noticed when doing
some "SELECT count(*)" queries. Look at this:

explain analyze select count(*) from transakcja where data > '2005-09-09' and
miesiac >= (9 + 2005 * 12) and kwota < 50;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=601557.86..601557.87 rows=1 width=0) (actual
time=26733.479..26733.484 rows=1 loops=1)
-> Bitmap Heap Scan on transakcja (cost=154878.00..596928.23 rows=1851852
width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1)
Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision))
Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone)
-> Bitmap Index Scan on idx_transakcja_miesiac_kwota
(cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967
rows=1690402 loops=1)
Index Cond: ((miesiac >= 24069) AND (kwota < 50::double
precision))
Total runtime: 26733.980 ms
(7 rows)

The actual time retrieving tuples from the index is less than 10 seconds, but
the system executes needless heap scan that takes up additional 16 seconds.

Best regards,
Peter

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laszlo Nagy 2006-09-12 12:15:30 Re: Poor performance on seq scan
Previous Message Luke Lonergan 2006-09-12 10:50:56 Re: Poor performance on seq scan