AW: Re: Backup and Recovery

From: Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>
To: "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: AW: Re: Backup and Recovery
Date: 2001-07-06 09:04:32
Message-ID: 11C1E6749A55D411A9670001FA687963368369@sdexcsrv1.f000.d0188.sd.spardat.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > > Also, isn't the WAL format rather bulky to archive hours and hours of?
> >
> > If it were actually too bulky, then it needs to be made less so, since
> > that directly affects overall performance :-)
>
> ISTM that WAL record size trades off against lots of things, including
> (at least) complexity of recovery code, complexity of WAL generation
> code, usefulness in fixing corrupt table images, and processing time
> it would take to produce smaller log entries.
>
> Complexity is always expensive, and CPU time spent "pre-sync" is a lot
> more expensive than time spent in background. That is, time spent
> generating the raw log entries affects latency and peak capacity,
> where time in background mainly affects average system load.
>
> For a WAL, the balance seems to be far to the side of simple-and-bulky.
> For other uses, the balance is sure to be different.

I do not agree with the conclusions you make above.
The limiting factor on the WAL is almost always the IO bottleneck.
How long startup rollforward takes after a crash is mainly influenced
by the checkpoint interval and IO. Thus you can spend enough additional
CPU to reduce WAL size if that leads to a substantial reduction.
Keep in mind though, that because of Toast long column values that do not
change, already do not need to be written to the WAL. Thus the potential is
not as large as it might seem.

> > > > I would expect high-level transaction redo records to be much more
> > > > compact; mixed into the WAL, such records shouldn't make the WAL
> > > > grow much faster.
> >
> > All redo records have to be at the tuple level, so what higher-level
> > are you talking about ? (statement level redo records would not be
> > able to reproduce the same resulting table data (keyword: transaction
> > isolation level))
>
> Statement-level redo records would be nice, but as you note they are
> rarely practical if done by the database.

The point is, that the database cannot do it, unless it only allows
serializable access and allows no user defined functions with external
or runtime dependencies.

>
> Redo records that contain that contain whole blocks may be much bulkier
> than records of whole tuples.

What is written in whole pages is the physical log, and yes those pages can
be stripped before the log is copied to the backup location.

> Redo records of whole tuples may be much bulkier than those that just
> identify changed fields.

Yes, that might help in some cases, but as I said above, if it actually
makes a substantial difference it would be best already done before the WAL
is written.

> Bulky logs mean more-frequent snapshot backups, and bulky log formats
> are less suitable for network transmission, and therefore less useful
> for replication.

Any reasonably flexible replication that is based on the WAL will need to
preprocess the WAL files (or buffers) before transmission anyway.

Andreas

Browse pgsql-hackers by date

  From Date Subject
Next Message Colin Strickland 2001-07-06 09:36:38 Re: Pg on SMP half-powered
Previous Message Víctor Romero 2001-07-06 08:52:33 Re: Pg on SMP half-powered