Re: Slow dump with pg_dump/pg_restore ? How to improve ?

From: Soeren Gerlach <soeren(at)all-about-shift(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow dump with pg_dump/pg_restore ? How to improve ?
Date: 2004-06-30 22:08:46
Message-ID: 200407010008.46744.soeren@all-about-shift.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> > So...the dump in the above format needs some 14 minutes, the restore 10
> > minutes. This seems to be very slow as it means something like 100K/sec
> > for dumping and restoring. The drive is cappable of 40 Meg/seconds, so
> > thats not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max
> > out the CPU cycles as it's running at nearly 100% while dumping and
> > restoring.
>
> What datatypes have you got in the large tables? Also, what character
> set encoding are you using?

datestyle = 'ISO,European' ?

> The only reason I can think of for dump to be that slow is if conversion
> of the data to text is a big time sink. This would involve the
> datatype's own output routine plus possibly a character set conversion.
> You should at least make sure that no character set conversion needs to
> happen (offhand I think this would only be an issue if pg_dump is
> invoked with PGCLIENTENCODING set in its environment).

This are the schemas of the two big tables mentioned:

-------------------------------------------------------------------------
CREATE TABLE public.agent_action_history
(
aah_tag_id bigserial NOT NULL,
aah_action_type int2 NOT NULL DEFAULT 0,
aah_quantity float4 NOT NULL DEFAULT 0,
aah_price float4 NOT NULL DEFAULT 0,
aah_sim_flag bool NOT NULL DEFAULT false,
aah_timestamp timestamp NOT NULL,
aah_action_pl float4 NOT NULL DEFAULT 0
) WITHOUT OIDS;

CREATE TABLE public.tick_history
(
tkh_id serial NOT NULL,
tkh_cdt_id int8 NOT NULL,
tkh_price float4 NOT NULL,
tkh_price_type int2 NOT NULL,
tkh_volume float4 NOT NULL,
tkh_system_time timestamp(6) NOT NULL,
tkh_exchange_time timestamp(6) NOT NULL,
CONSTRAINT "PK_tick_history" PRIMARY KEY (tkh_id)
) WITHOUT OIDS;
-------------------------------------------------------------------------

> Also I trust you are using dump with the default COPY-style output,
> not dump-as-INSERTs?

Yes. I'm dumping using "-c -Fc -v" as options. I just rerun the whole:

* Total rows: 904,000
* Dumping plain format: 21min == 1260 sec == 717 rows/sec
* Dumping "-c -Fc" format: 26min == 1560 sec == 580 rows/sec
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps
* the resulting "-Fc" dump is 1/10 of the size of the plain dump

It would be interesting to know from people with DBs > 1 Gig how fast they
can dump in terms of rows/second.

Another info: I just noticed that I've 7.4.1 running not 7.4.3. So I'll
retest this tomorrow again and provide you with numbers for 7.4.3 too.

I'm likely to install Sybase as a reference to Postgres next week. But I can
remember to got there something like 5-10,000 rows/sec some one year ago
when I tested a 11.5 version for this issue too.

Thanks,
Soeren Gerlach

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2004-06-30 22:10:26 Re: Internationalization
Previous Message Chris Browne 2004-06-30 21:43:02 Re: what's best: opteron or xeon for pg 7.4.3?