Re: Air-traffic benchmark

From: "Gurgel, Flavio" <flavio(at)4linux(dot)com(dot)br>
To: Lefteris <lsidir(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Air-traffic benchmark
Date: 2010-01-07 13:24:24
Message-ID: 15201928.35841262870664489.JavaMail.root@mail.4linux.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello

----- "Lefteris" <lsidir(at)gmail(dot)com> escreveu:
> Hi all,
>
> following the simple but interesting air-traffic benchmark published
> at:
> http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/

Quite interesting test, if you have the time to download all that raw data.

> of 5 minutes while the slow ones in the order of 40 minutes. I came
> to
> the conclusion that some parameters are not set correctly. I give all

I do think so too.

> The hardware characteristics are:
> Platform Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz with 8GB RAM and
> ample disk space (2x 500 GB SATA disk @ 7200 RPM as SW-RAID-0)

SATA disks are not the best for a "benchmark" like yours, but the results were so deeply different from expected that I'm sure that you have hardware enough.

> I used postgresql-8.4.2. Source download and compiled by hand, no
> special parameters where passed to the configure phase.
>
> -- DETAILS on loading, analyze and index creation on postgresql-8.4.2
> -- loading time was 77m15.976s
> -- total rows: 119790558
> -- total size of pgdata/base: 46G
> ANALYZE;
> -- 219698.365ms aka 3m39.698s
> CREATE INDEX year_month_idx ON ontime ("Year", "Month");
> -- 517481.632ms aka 8m37.481s
> CREATE INDEX DepDelay_idx ON ontime ("DepDelay");
> -- 1257051.668ms aka 20m57.051s

Here comes my first question:
Did you ANALYZE your database (or at least related tables) _after_ index creation?
If not, try to do so. PostgreSQL needs statistics of the database when everything is in its place.

> airtraffic=# EXPLAIN ANALYZE SELECT "DayOfWeek", count(*) AS c FROM
> ontime WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER
> BY c DESC;
> QUERY
> PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=7407754.12..7407754.13 rows=4 width=2) (actual
> time=371188.821..371188.823 rows=7 loops=1)
> Sort Key: (count(*))
> Sort Method: quicksort Memory: 25kB
> -> HashAggregate (cost=7407754.04..7407754.08 rows=4 width=2)
> (actual time=371163.316..371163.320 rows=7 loops=1)
> -> Seq Scan on ontime (cost=0.00..7143875.40 rows=52775727
> width=2) (actual time=190938.959..346180.079 rows=52484047 loops=1)
> Filter: (("Year" >= 2000) AND ("Year" <= 2008))
> Total runtime: 371201.156 ms
> (7 rows)

You'll see here that PostgreSQL is not using the index you just created.
ANALYZE VERBOSE ontime;
should solve this.

> I understand that the problem here is the memory used by the sort
> method. *But*, I already changed the work_mem parameter to 6gigs:)

If you look to you explain you'll see that you don't need that much of memory.
You have 8GB of total RAM, if you use that much for sorting you'll start to swap.

> which means that of course work_mem == sort_mem, as
> such, shouldn't be the case that the sort algorithm should have used
> much more memory?

sort_mem is a part of the work_mem in recent versions of PostgreSQL.
No, the sort algorithm doesn't need that at all.

> I also attach the output of 'show all;' so you can advice me in any
> other configuration settings that I might need to change to perform
> better.

Let's take a look.

> geqo | on

If in doubt, turn this off. Geqo is capable of making bad execution plans for you.
But this is not the most important to change.

> shared_buffers | 32MB

Here it is. The default setting for shared_buffer doesn't give space for the buffercache.
I would recomend you to increase this to, at least, 40% of your total RAM.
Don't forget to restart PostgreSQL after changing here, and it's possible that you'll need to increase some system V parameters in Fedora. Read PostgreSQL documentation about it.

> effective_cache_size | 8MB

Increase here to, at least, shared_buffers + any caches that you have in your hardware (e.g. in the RAID controller)

> wal_buffers | 64kB

Increasing here can make your bulk load faster. 8MB is a good number.
This will not make your SELECT queries faster.

> wal_sync_method | fdatasync

fdatasync is the recommended method for Solaris. I would use open_sync. It's not important for SELECT too.

> work_mem | 6GB
Start here with 10MB. If you have temp files when executing your SELECTs try to increase just a bit.

Let us know what happens in your new tests.
Best regards

Flavio Henrique A. Gurgel
tel. 55-11-2125.4765
fax. 55-11-2125.4777
www.4linux.com.br
FREE SOFTWARE SOLUTIONS

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message A. Kretschmer 2010-01-07 13:32:25 Re: Air-traffic benchmark
Previous Message Leo Mannhart 2010-01-07 13:14:52 Re: Massive table (500M rows) update nightmare