Re: Benchmark Data requested

From: Matthew <matthew(at)flymine(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Benchmark Data requested
Date: 2008-02-05 14:51:29
Message-ID: Pine.LNX.4.64.0802051439300.20402@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Apologies for the blank email - mailer problems. I lost all my nicely
typed stuff, too.

On Tue, 5 Feb 2008, Dimitri Fontaine wrote:
>> Multiple table loads ( 1 per table) spawned via script is bit better
>> but hits wal problems.
>
> pgloader will too hit the WAL problem, but it still may have its benefits, or
> at least we will soon (you can already if you take it from CVS) be able to
> measure if the parallel loading at the client side is a good idea perf. wise.

You'll have to be careful here. Depending on the filesystem, writing large
amounts of data to two files simultaneously can results in the blocks
being interleaved to some degree on the disc, which can cause performance
problems later on.

As for the WAL, I have an suggestion, but I'm aware that I don't know how
PG actually does it, so you'll have to tell me if it is valid.

My impression is that the WAL is used to store writes in a transactional
manner, for writes that can't be written in a transactional manner
directly to the data files. Hence the suggestion for restoring database
dumps to run the whole restore in one transaction, which means that the
table creation is in the same transaction as loading the data into it.
Since the table is not visible to other backends, the writes to it do not
need to go to the WAL, and PG is clever enough to do this.

My suggestion is to extend that slightly. If there is a large chunk of
data to be written to a table, which will be entirely to empty pages or
appended to the of the data file, then there is no risk of corruption of
existing data, and that write could be made directly to the table. You
would have to write a WAL entry reserving the space in the data file, and
then write the data to the file. Then when that WAL entry is checkpointed,
no work would be required.

This would improve the performance of database restores and large writes
which expand the table's data file. So, would it work?

Matthew

--
If pro is the opposite of con, what is the opposite of progress?

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew 2008-02-05 14:53:25 Re: Benchmark Data requested
Previous Message Richard Huxton 2008-02-05 14:43:36 Re: Benchmark Data requested