From: | "Gurgel, Flavio" <flavio(at)4linux(dot)com(dot)br> |
---|---|
To: | Lefteris <lsidir(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org, Ivan Voras <ivoras(at)freebsd(dot)org> |
Subject: | Re: Air-traffic benchmark |
Date: | 2010-01-07 15:45:16 |
Message-ID: | 32176465.41521262879116337.JavaMail.root@mail.4linux.com.br |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
----- "Lefteris" <lsidir(at)gmail(dot)com> escreveu:
> > Did you ever try increasing shared_buffers to what was suggested
> (around
> > 4 GB) and see what happens (I didn't see it in your posts)?
>
> No I did not to that yet, mainly because I need the admin of the
> machine to change the shmmax of the kernel and also because I have no
> multiple queries running. Does Seq scan uses shared_buffers?
Having multiple queries running is *not* the only reason you need lots of shared_buffers.
Think of shared_buffers as a page cache, data in PostgreSQL is organized in pages.
If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step speed and change the execution plan, since the data access in memory is (usually) faster then disk.
> > help performance very much on multiple exequtions of the same
> query.
This is also true.
This kind of test should, and will, give different results in subsequent executions.
> > From the description of the data ("...from years 1988 to 2009...")
> it
> > looks like the query for "between 2000 and 2009" pulls out about
> half of
> > the data. If an index could be used instead of seqscan, it could be
> > perhaps only 50% faster, which is still not very comparable to
> others.
The use of the index over seqscan has to be tested. I don't agree in 50% gain, since simple integers stored on B-Tree have a huge possibility of beeing retrieved in the required order, and the discarded data will be discarder quickly too, so the gain has to be measured.
I bet that an index scan will be a lot faster, but it's just a bet :)
> > The table is very wide, which is probably why the tested databases
> can
> > deal with it faster than PG. You could try and narrow the table
> down
> > (for instance: remove the Div* fields) to make the data more
> > "relational-like". In real life, speedups in this circumstances
> would
> > probably be gained by normalizing the data to make the basic table
> > smaller and easier to use with indexing.
Ugh. I don't think so. That's why indexes were invented. PostgreSQL is smart enough to "jump" over columns using byte offsets.
A better option for this table is to partition it in year (or year/month) chunks.
45GB is not a so huge table compared to other ones I have seen before. I have systems where each partition is like 10 or 20GB and data is very fast to access even whith aggregation queries.
Flavio Henrique A. Gurgel
tel. 55-11-2125.4765
fax. 55-11-2125.4777
www.4linux.com.br
FREE SOFTWARE SOLUTIONS
From | Date | Subject | |
---|---|---|---|
Next Message | Leo Mannhart | 2010-01-07 15:47:10 | Re: Massive table (500M rows) update nightmare |
Previous Message | Gary Warner | 2010-01-07 15:23:17 | "large" spam tables and performance: postgres memory parameters |