Re: Vacuum Verbose output

From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-admin(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Scott Marlowe <smarlowe(at)g2switchworks(dot)com>, "Tomeh, Husam" <htomeh(at)firstam(dot)com>
Subject: Re: Vacuum Verbose output
Date: 2005-11-02 19:00:04
Message-ID: 200511021400.05346.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Monday 31 October 2005 22:59, Tom Lane wrote:
> Scott Marlowe <smarlowe(at)g2switchworks(dot)com> writes:
> > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
> >> Pre-allocating space will prevent extending the datafile during
> >> loading massive data (batch processing) and would improve the overall
> >> batch write performance.
> >
> > Have you got any file system benchmarks that back up this assertion? I
> > would love to see something that shows one way or the other if that
> > really makes any difference.
>
> Barring some pretty solid evidence, you're unlikely to attract any
> enthusiasm among pghackers for this sort of thing. We are generally
> disinclined to reinvent functionality that properly belongs to the
> kernel or filesystem layer. "Oracle does it" cuts no ice in this
> connection, because Oracle is designed around a twenty-year-old
> assumption that the database is smarter than the kernel, and the world
> has changed a lot since then.
>
> In short: show us some numbers that prove this is worth our attention.
>

I'm not terribly excited about the idea, but it might be worth hearing a
better argument. (FWIW I think this is somewhat debunkable too, but it gives
one something to think about)

"PostgreSQL unlike other commercial databases does not allow database files to
pregrow to certain sizes. So if you are loading multiple tables via different
connections there are two things that hurts scalability: One is the semaphore
locking which it needs to perform IO to the database files and second is file
fragmentation since it creates all tables in the same file system and grows
them as needed. So if both the tables are loaded then both files are growing
at "same" time which typically is seralized as blocks are allocated to each
of the file one at a time which means they will be dispersed and not
contiguous. How this hurts? Well if you do total row scans and compare the
time you can easily huge degradations. (I have seen about 50% degradations).
This means you have to load 1 table at a time. However if there was a way to
increase the space for the tables (pre-grown them) then it will be a bit
easier to load multiple tables simultaneously. (Of course the semaphore
problem is still there and that needs to be more granular also). Duh.. I
forgot the workaround here.. TABLESPACES are finally available in PostgreSQL
8. But semaphore problems are still existing and pre-growing files will still
help a lot since "growing" the files will be in your "1" process connection
timeline. "

taken from an interesting post at
http://blogs.sun.com/roller/page/jkshah?anchor=postgres_what_needs_to_be

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Kevin Grittner 2005-11-02 21:35:55 Re: how do you automate database backups?
Previous Message Robert Treat 2005-11-02 16:49:49 Re: Starten Server / SCO OpenServer6 / PostgreSQL 8.0.3