Re: Oracle vs PostgreSQL in real life : NEWS!!!

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Jean-Paul ARGUDO <jean-paul(dot)argudo(at)IDEALX(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, sdinot(at)IDEALX(dot)com, dbarth(at)IDEALX(dot)com
Subject: Re: Oracle vs PostgreSQL in real life : NEWS!!!
Date: 2002-03-02 22:21:56
Message-ID: 1015107716.3512.89.camel@rh72.home.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2002-03-01 at 23:44, Jean-Paul ARGUDO wrote:
> > > Oracle on NT 4 : 45 minuts to go , 1200 tps (yes one thousand and two hundred
> > > tps)
> > >
> > > Linux Red Hat 7.2 with PostgreSQL 7.2 : hours to go (statistically, 45 hours),
> > > 80 tps (eighty tps).
>
> Well... Where to start?
>
> We work on a team of two. The other one is a C/C++ senior coder. He
> mailed me a remark about datatypes on the database. Here is what he sent
> me:
>
> Our database has different datatypes, here are a count of distinct
> datatypes in all tables:
>
> 197 numeric(x)
> 19 numeric(x,2)
> 2 varchar(x)
> 61 char(x)
> 36 datetime
>
> He asked me about numeric(x) and he questioned my about how PG managed
> the NUMERIC types.
>
> I gave him a pointer on "numeric.c" in the PG srcs.
>
> I analyzed this source and found that NUMERIC types are much most
> expensive than simple INTEGER.
>
> I really fall on the floor.. :-( I was sure with as good quality PG is,
> when NUMERIC(x) columns are declared, It would be translated in INTEGER
> (int2, 4 or 8, whatever...).

Postgres does not do any silent type replacing based on data type max
length.

> So, I made a pg_dump of the current database, made some perl
> remplacements NUMERIC(x,0) to INTEGER.
>
> I loaded the database and launched treatments: the results are REALLY
> IMPRESIVE: here what I have:
>
> ((it is a port of Oracle/WinNT stuff to PostgreSQL/Red Hat 7.2)):
>
> Oracle PG72 with NUMERICs PG72 with INTEGERS
> --------------------------------------------------------------------------
> sample
> connect by
> query ported 350ms 750ms 569ms
> to PG
> (thanks to
> OpenACS code!)

Did you rewrite your CONNECT BY queries as recursive functions or did
you use varbit tree position pointers ?

> --------------------------------------------------------------------------
> sample "big"
> query with
> connect bys 3 min 30s 8 min 40s 5 min 1s
> and many
> sub-queries

Could you give more information on this query - i suspect this can be
made at least as fast as Oracle :)

> --------------------------------------------------------------------------
> Big Batch
> treatment 1300 queries/s 80 queries/s 250 queries/s
> queries
>
> PRO*C to 45 min to go ~4 to 6 DAYS not yet
> ECPG to go tested fully
>
> Ratio 1:1 1:21 not yet ..
> 21 times slower!

Did you run concurrent vacuum for both PG results ?

From my limited testing it seems that such vacuum is absolutely needed
for big batches of mostly updates.

And btw 45min * 21 = 15h45 not 4-6 days :)

> --------------------------------------------------------------------------
> ((but this batch will be yet re-writen in pure C + libpq + SPI,
> so we think we'll have better results again))

You probably will get better results :)

I rerun my test (5000 transactions of 20 updates on random unique key
between 1 and 768, with concurrent vacuum running every 4 sec) moving
the inner loop of 20 random updates to server, both without SPI prepared
statements and then using prepared statements.

Test hardware - Athlon 859, IDE, 512MB ram

update of random row i=1..768
all queries sent from client 2:02 = 820 updates sec
[hannu(at)taru abench]$ time ./abench.py
real 2m1.522s
user 0m20.260s
sys 0m3.320s
[hannu(at)taru abench]$ time ./abench.py
real 2m2.320s
user 0m19.830s
sys 0m3.490s

using plpython without prepared statements 1:35 = 1052 updates/sec
[hannu(at)taru abench]$ time ./abenchplpy2.py
real 1m34.587s
user 0m1.280s
sys 0m0.400s
[hannu(at)taru abench]$ time ./abenchplpy2.py
real 1m36.919s
user 0m1.350s
sys 0m0.450s

using plpython with SPI prepared statements 1:06.30 = 1503 updates/sec
[hannu(at)taru abench]$ time ./abenchplpy.py
real 1m6.134s
user 0m1.400s
sys 0m0.720s
[hannu(at)taru abench]$ time ./abenchplpy.py
real 1m7.186s
user 0m1.580s
sys 0m0.570s

plpython non-functional with SPI prepared
statements - update where i=1024 0:17.65 = 5666 non-updates sec
[hannu(at)taru abench]$ time ./abenchplpy.py
real 0m17.650s
user 0m0.990s
sys 0m0.290s

> So as you see, DATA TYPES are REALLY important, as I did write on a
> techdocs article ( I should have tought in this earlier )

Yes they are.

But running concurrent vacuum is _much_ more important if the number of
updates is much bigger than number of records (thanks Tom!)

------------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-03-02 23:00:43 Re: elog() patch
Previous Message Pavlo Baron 2002-03-02 20:01:44 Re: TODO question