Re: Feature: give pg_dump a WHERE clause expression

From: Davy Durham <pubaddr5(at)davyandbeth(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Feature: give pg_dump a WHERE clause expression
Date: 2008-06-01 19:08:47
Message-ID: 1212347327.17810.52.camel@ubuntu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Sun, 2008-06-01 at 10:43 -0400, Tom Lane wrote:
> Davy Durham <pubaddr5(at)davyandbeth(dot)com> writes:
> > I have added a new parameter, -w/--where=EXPR
> > This lets you specify an expression that will be used in a WHERE
> > clause when the data is dumped.
>
> This seems pretty poorly thought out. It can hardly work in a dump
> of more than one table, which means that there's not any real reason
> to use pg_dump at all. Just do a COPY (SELECT ...) TO somefile.
>
> regards, tom lane
>

Well, my primary reason for writing the patch was to have a standard SQL
file using INSERT statements in order to load the some of a table's data
into a database other than postgresql which does not support the COPY
statement.

I'll admit that the single where clause would often not be applicable
across all tables in a database, but when pg_dump is told specific
tables to dump (a nice existing feature of pg_dump for doing something
specialized other than a simple entire database backup), then it can be
useful.

My particular case is that I have several tables that are simple event
logs. Each table has a timestamp column. I'm periodically bringing
these tables into sync on another database and I only want to pull rows
newer than since the last sync.. So, a where-clause of..
'ts > $last_sync'
..works for me. However, I'm sure there are other uses too..

== Thinking Further ==

Beyond serving my own needs, I'm trying to generically extend the
general idea that pg_dump already supports:
1) pg_dump can be made to dump an entire database
2) pg_dump can be made to dump only requested tables
3) [my addition] pg_dump can be made to dump only requested rows
from requested tables

However, it's no SO generic in that the where clause applies to all
tables.

So, if this patch is not acceptable as-is, what would you feel about
this:
I could enhance the -t/--table=NAME option to accept more than a
simple NAME. Rather it could accept something in the form:

--table=<table_name>:<where-clause expression>

For example, pg_dump --table='foo:col1 > 10 AND f2 < 14'

Currently, the user can specify -t/--table multiple times to
have more than one table dumped. Or the user can use a pattern
to a single -t option to request multiple tabes.

This way, a user could specify a WHERE clause per table he has
requested to dump. Granted, the WHERE clause may then apply to
multiple tables if a pattern was used, but that may very well be
desirable to the user.

Unless you disagree, this is a more generic solution (than what my patch
contains) to allowing the user of pg_dump to further refine what they
wish to dump.

Thoughts?

Thanks for the feedback

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-06-01 19:37:20 Re: [BUGS] BUG #4203: perform dblink() in begin/exception returns wrong SQLSTATE code
Previous Message Joe Conway 2008-06-01 19:03:41 Re: [BUGS] BUG #4203: perform dblink() in begin/exception returns wrong SQLSTATE code