Quick Links

Re: Poor performance on seq scan

From:	"Guido Neitzer" <guido(dot)neitzer(at)gmail(dot)com>
To:	Piotr Kołaczkowski <P(dot)Kolaczkowski(at)elka(dot)pw(dot)edu(dot)pl>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Poor performance on seq scan
Date:	2006-09-18 08:50:43
Message-ID:	fbbe50e0609180150k66e1462fg6591db841e490dc1@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Because there is no MVCC information in the index.

cug

2006/9/12, Piotr Kołaczkowski <P(dot)Kolaczkowski(at)elka(dot)pw(dot)edu(dot)pl>:
> On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote:
> > Laszlo Nagy wrote:
> > > I made another test. I create a file with the identifiers and names of
> > > the products:
> > >
> > > psql#\o products.txt
> > > psql#select id,name from product;
> > >
> > > Then I can search using grep:
> > >
> > > grep "Mug" products.txt | cut -f1 -d\|
> > >
> > > There is a huge difference. This command runs within 0.5 seconds. That
> > > is, at least 76 times faster than the seq scan. It is the same if I
> > > vacuum, backup and restore the database. I thought that the table is
> > > stored in one file, and the seq scan will be actually faster than
> > > grepping the file. Can you please tell me what am I doing wrong? I'm
> > > not sure if I can increase the performance of a seq scan by adjusting
> > > the values in postgresql.conf. I do not like the idea of exporting the
> > > product table periodically into a txt file, and search with grep. :-)
> >
> > Is there any other columns besides id and name in the table? How big is
> > products.txt compared to the heap file?
> >
> > > Another question: I have a btree index on product(name). It contains
> > > all product names and the identifiers of the products. Wouldn't it be
> > > easier to seq scan the index instead of seq scan the table? The index
> > > is only 66MB, the table is 1123MB.
> >
> > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
> > depends on how many matches there is. If you scan the index and then
> > fetch the matching rows from the heap, you're doing random I/O to the
> > heap. That becomes slower than scanning the heap sequentially if you're
> > going to get more than a few hits.
>
> Why match rows from the heap if ALL required data are in the index itself?
> Why look at the heap at all?
>
> This is the same performance problem in PostgreSQL I noticed when doing
> some "SELECT count(*)" queries. Look at this:
>
> explain analyze select count(*) from transakcja where data > '2005-09-09' and
> miesiac >= (9 + 2005 * 12) and kwota < 50;
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=601557.86..601557.87 rows=1 width=0) (actual
> time=26733.479..26733.484 rows=1 loops=1)
> -> Bitmap Heap Scan on transakcja (cost=154878.00..596928.23 rows=1851852
> width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1)
> Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision))
> Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone)
> -> Bitmap Index Scan on idx_transakcja_miesiac_kwota
> (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967
> rows=1690402 loops=1)
> Index Cond: ((miesiac >= 24069) AND (kwota < 50::double
> precision))
> Total runtime: 26733.980 ms
> (7 rows)
>
> The actual time retrieving tuples from the index is less than 10 seconds, but
> the system executes needless heap scan that takes up additional 16 seconds.
>
> Best regards,
> Peter
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006
http://www.bignerdranch.com/news/2006-08-21.shtml

In response to

Re: Poor performance on seq scan at 2006-09-12 12:10:02 from Piotr Kołaczkowski

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jérôme BENOIS	2006-09-18 14:30:51	Re: High CPU Load
Previous Message	Markus Schaber	2006-09-18 08:22:00	Re: Poor performance on seq scan