Re: Performance improvement hints + measurement

From: devik(at)cdi(dot)cz
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)hub(dot)org
Subject: Re: Performance improvement hints + measurement
Date: 2000-09-13 13:37:24
Message-ID: 39BF8314.8A790C34@cdi.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > But indexscan always lookups actual record in heap even if
> > all needed attributes are contained in the index.
> > Oracle and even MSSQL reads attributes directly from index
> > without looking for actual tuple at heap.
>
> Doesn't work in Postgres' storage management scheme --- the heap
> tuple must be consulted to see if it's still valid.

yes, I just spent another day by looking into sources and
it seems that we need xmin, xmax stuff.
What do you think about this approach:

1) add all validity & tx fields from heap tuple into
index tuple too
2) when generating plan for index scan try to determine
whether we can satisfy target list using only data
from index tuples, if yes then compute cost without
accounting random heap page reads - it will lead into
much lower cost
3) whenever you update/delete heap tuple's tx fields, update
then also in indices (you don't have to delete them from
index)

It will cost more storage space and slightly more work when
updating indices but should give excelent performance when
index is used.

Measurements:
I've table with about 2 mil. rows declared as
bigrel(namex varchar(50),cnt integer,sale datetime).
I regulary need to run this query against it:
select nazev,sum(cnt) from bigrel group by name;
It took (in seconds):

Server\Index YES NO
pg7.01 linux 58 264
MSSQL7 winnt 17 22

I compared on the same machine (PII/375,128RAM) using
WINNT under VMWARE and native linux 2.2. pq was
vaccum analyzed.
Why is pgsql so slow ? The mssql plan without index uses
hash aggregating but pg sorts while relation.
With index, in pg there is a big overhead of heap tuple
reading - mssql uses data directly from scanned index.

Also I noticed another problem, when I added
where nazev<'0' it took 110ms on pg when I used
set enable_seqscan=on;.
Without is, planner still tried to use seqscan+sort
which took 27s in this case.

I'm not sure how complex the proposed changes are. Another
way would be to implement another aggregator like HashAgg
which will use hashing.
But it could be even more complicated as one has to use
temp relation to store all hash buckets ..

Still I think that direct index reads should give us huge
speed improvement for all indexed queries.
I'm prepared to implement it but I'd like to know your
hints/complaints.

Regards, devik

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Lockhart 2000-09-13 13:42:40 Re: current is broken
Previous Message The Hermit Hacker 2000-09-13 13:31:13 Re: man, I feel like a beginner ...