Re: [HACKERS] custom types and optimization

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: brett(at)work(dot)chicken(dot)org (Brett McCormick)
Cc: pgsql-hackers(at)hub(dot)org, pgsql-general(at)postgresql(dot)org (PostgreSQL-general)
Subject: Re: [HACKERS] custom types and optimization
Date: 1998-05-31 19:38:11
Message-ID: 199805311938.PAA14879@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


[ CC'ing general list so you can see what we are working on, and my plea
for help in getting the word out about PostgreSQL's speed and features.
Replies will go the the hackers list because we don't want long
discussions like this in the general list.]

First, let me say you are thinking exactly like me. I agree 100% with
your ideas, and analysis of the issues.


> I know that custom types account for a portion of overhead, and I'm
> not by any means advocating their removal. I also know that the
> efficiency of postgres has improved greatly since the early days, and
> I'm wondering what more can be done.

Good question, and a question I have been asking myself.

> For instance, would it be possible to cache results of the
> input/output functions for the types? i.e. if we've already called
> foobar_out for a peice of data, why call it again? We could store the
> previous result in a hash, and then use that.

Not sure if that would help. We cache system tables lookups, and data
blocks. gprof does not show a huge problem in the type extensibility
area, at least in my tests.

gprof is your friend. Try compiling with the options, and run it and
analyze gmon.out (See FAQ for info.) That usually tells me quite a bit.

> Note that I next to nothing about how the query node tree gets
> executed (I'm reading up on it now) so this may not be possible or
> could even introduce extra overhead.

Also, I hope people are reading the developers FAQ, because I think that
can help people get started with coding.

> I'd like to get postgres up to speed. I know it is a great database,
> and I tell all my friends this, but there is too much pg bashing
> because of the early days. People think mysql rocks because it is so
> fast, but in reality, well.. It's all IMHO, and the right tool for
> the right job.

Yes, this has frustrated me too. Why are we not getting better mention
from people? I think we can now be classified as the 'most advanced'
free database. Can we do something about mentioning that to others? We
certainly are growing market share, but i guess I would like to see more
transfers from other databases.

The highly-biased MySQL comparison page hurts us too, but other people
explaining real issues can counter that.

> So my real question is: have we hit the limit on optimization and
> reduction of overhead, or is there more work to be done? Or should we
> concentrate on other aspects such as inheritance issues? I'm not
> quite as interested in ANSI compliance.

Not sure. I just removed exec(), so that saves us 0.01 on startup,
which is pretty major. We can move some of the initialization that is
done in every backend to the postmaster, but these will only do major
speedups for backends that do startup, short query, exit. Longer
queries and long-running backends don't see much change.

I have tested the throughput of sequential table scan, and it appears to
run pretty quickly, almost as quick as dd on the same file. That is
pretty good. Faster than wc on my system.

So why are we considered slow? First, historically, performance has not
been a major concern, first not at Berkeley(?), and second there were so
many other problems, that we did not have the resources to concentrate
on it. Only in the past nine months have there been real improvements,
and it takes time to get the word out.

Second, it is our features that make us slower. Transactions, type
system, optimizer all add to the slowness. We are very modular, and
have a large call overhead moving in and out of modules, though
profiling has enabled us to reduce this.

MySQL also has certain limitations that allow them to be faster, like
being able to specify indexes ONLY at table creation time, so their
indexes are in with the data. They use ISAM, which doesn't grow well,
but does provide good performance because the data is kind of pre-sorted
on the disk. Our CLUSTER command now does a similar function, without
the problems of ISAM.

I am glad David Gould and others are involved, because I am starting to
run out of tricks to speed things up. I need new ideas and perhaps
redesigned modules to get better performance.

--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas G. Lockhart 1998-05-31 19:40:12 Re: [HACKERS] custom types and optimization
Previous Message Thomas G. Lockhart 1998-05-31 18:20:00 Re: [INTERFACES] no shared libs

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas G. Lockhart 1998-05-31 19:40:12 Re: [HACKERS] custom types and optimization
Previous Message dalla 1998-05-31 19:07:00 REMOVE MY ADDRESS (Rod Stewart Live Online (fwd))