Re: Air-traffic benchmark

From: "Gurgel, Flavio" <flavio(at)4linux(dot)com(dot)br>
To: Lefteris <lsidir(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Ivan Voras <ivoras(at)freebsd(dot)org>
Subject: Re: Air-traffic benchmark
Date: 2010-01-07 15:45:16
Message-ID: 32176465.41521262879116337.JavaMail.root@mail.4linux.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

----- "Lefteris" <lsidir(at)gmail(dot)com> escreveu:
> > Did you ever try increasing shared_buffers to what was suggested
> (around
> > 4 GB) and see what happens (I didn't see it in your posts)?
>
> No I did not to that yet, mainly because I need the admin of the
> machine to change the shmmax of the kernel and also because I have no
> multiple queries running. Does Seq scan uses shared_buffers?

Having multiple queries running is *not* the only reason you need lots of shared_buffers.
Think of shared_buffers as a page cache, data in PostgreSQL is organized in pages.
If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step speed and change the execution plan, since the data access in memory is (usually) faster then disk.

> > help performance very much on multiple exequtions of the same
> query.

This is also true.
This kind of test should, and will, give different results in subsequent executions.

> > From the description of the data ("...from years 1988 to 2009...")
> it
> > looks like the query for "between 2000 and 2009" pulls out about
> half of
> > the data. If an index could be used instead of seqscan, it could be
> > perhaps only 50% faster, which is still not very comparable to
> others.

The use of the index over seqscan has to be tested. I don't agree in 50% gain, since simple integers stored on B-Tree have a huge possibility of beeing retrieved in the required order, and the discarded data will be discarder quickly too, so the gain has to be measured.

I bet that an index scan will be a lot faster, but it's just a bet :)

> > The table is very wide, which is probably why the tested databases
> can
> > deal with it faster than PG. You could try and narrow the table
> down
> > (for instance: remove the Div* fields) to make the data more
> > "relational-like". In real life, speedups in this circumstances
> would
> > probably be gained by normalizing the data to make the basic table
> > smaller and easier to use with indexing.

Ugh. I don't think so. That's why indexes were invented. PostgreSQL is smart enough to "jump" over columns using byte offsets.
A better option for this table is to partition it in year (or year/month) chunks.

45GB is not a so huge table compared to other ones I have seen before. I have systems where each partition is like 10 or 20GB and data is very fast to access even whith aggregation queries.

Flavio Henrique A. Gurgel
tel. 55-11-2125.4765
fax. 55-11-2125.4777
www.4linux.com.br
FREE SOFTWARE SOLUTIONS

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Leo Mannhart 2010-01-07 15:47:10 Re: Massive table (500M rows) update nightmare
Previous Message Gary Warner 2010-01-07 15:23:17 "large" spam tables and performance: postgres memory parameters