Re: PostgreSQL data loads - turn off WAL

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: hartrc <rhart2(at)mt(dot)gov>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL data loads - turn off WAL
Date: 2012-09-26 17:04:57
Message-ID: CAMkU=1zv9Jd7hrwgeGXJA+6MZejpOM6OqLXfsEMwUjQKo=tdEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Sep 25, 2012 at 9:09 AM, hartrc <rhart2(at)mt(dot)gov> wrote:
> My version: PostgreSQL v9.1.5
> Version string: "PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by
> gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit"
>
> Basically my question is:
> Is there currently any way to avoid wal generation during data load for
> given tables and then have point in time recovery after that?
>
> Background and blurb
> The situation i'm referring to here is for a development environment. I
> require point in time recovery because if there is crash etc I don't want to
> lose up to a days work for 12 developers.

What is it that is being developed? If you are developing
applications that use postgres for storage, then with a crash of that
database you should not lose the developers' work, unless you are
using the same cluster as both the source-code repository and the
database to which the test applications being developed connect (which
seems like a bad idea)

Or are you not documenting your database schema and code anywhere
other than in the database itself? That too seems like a bad idea, at
least for a busy multi-person development shop.

> A developer did a data load yesterday of approximately 5GB of data into a
> new schema. This generated approximately 7GB of wal. The situation arises
> where if something is incorrect in the data load the data load may need to
> be repeated 2 or 3 times (thus generating 20GB +of WAL).

Is the problem the IO generated by this, or the short-term storage, or
the long term storage of it? If it is the storage, you could do a
specially-scheduled backup as soon as the load is done, and then
delete the WAL soon. Are you compressing your WAL? If not, then
doing that might be enough to solve the problem.

It sounds like the main thing your group "produces" is code
development, and from what you describe the database is a "production"
database since losing it loses your product. If that is the case, you
really need a "development" database as well. This could run in
archive_mode=off, and then the 5GB would not go to "production" until
they are fairly confident it will work and not have to be repeated.
So that would cut out 2/3 of the WAL.

Or, they could create the table as "unlogged" until they know the bulk
load works, then recreate it as logged and repeat the load. (I think
that currently there is no way to make an unlogged table then promote
it to logged, that would probably be ideal for you, but someday....)

> For a data load i
> don't want wal to be generated. I accept the fact before there was nothing
> and from the point of the next pg_basebackup there was everything. It is
> from the point i say ok that is everything (the next backup) that i want
> point in time recovery to apply to that table.
> It is doesn't seem practical, and appears very risky to turn off wal_archive
> during the data load.

How long does the bulk load plus one backup plus two database restarts
take? That would probably be far less than one day, so the risk of
turning off archive_mode temporarily seems like it should be minimal.

If your hard drives were to die horribly, how long would it take you
replace them and restore the database from the last backup and roll it
forward? That could calibrate how much past work you are willing to
lose in the event it comes to that.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Ernst 2012-09-26 18:10:03 Re: trigger and plpgsq help needed
Previous Message SUNDAY A. OLUTAYO 2012-09-26 16:42:41 trigger and plpgsq help needed