Jean-David Beyer wrote:
> This means, of course, that the things I think of as transactions have been
> bunched into a much smaller number of what postgreSQL thinks of as large
> transactions, since there is only one per file rather than one per record.
> Now if a file has several thousand records, this seems to work out just great.
Using the small transactions, you were limited by the speed your hard
disk flush the commit WAL records to the disk. With small transactions
like that, it's not about the bandwidth, but latency of the hard drive.
Using larger transactions helps because you get more work done on each
Upcoming 8.3 release will have a feature called "asynchronous commit",
which should speed up those small transactions dramatically, if you
don't want to batch them into larger transactions like you did:
> But what is the limitation on such a thing? In this case, I am just
> populating the database and there are no other users at such a time. I am
> willing to lose the whole insert of a file if something goes wrong -- I
> would fix whatever went wrong and start over anyway.
> But at some point, disk IO would have to be done. Is this just a function of
> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it
> have to do with wal_buffers and checkpoint_segments?
Well, you have to do the I/O eventually, regardless of shared_buffers.
Common wisdom is that increasing wal_buffers from the default helps with
bulk loading like that, up to a point. Increasing checkpoint_segments
helps as well. After you've done all that, you're going to be limited by
either the bandwidth of your I/O system, or the speed of your CPU,
depending on your hardware. Using COPY instead of INSERTs will help if
In response to
pgsql-performance by date
|Next:||From: Tom Lane||Date: 2007-10-25 16:21:53|
|Subject: Re: Bunching "transactions" |
|Previous:||From: Erik Jones||Date: 2007-10-25 15:51:57|
|Subject: Re: Bunching "transactions"|