Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

From: Euler Taveira <euler(at)timbira(dot)com(dot)br>
To: Carter Thaxton <carter(dot)thaxton(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data
Date: 2018-05-22 00:13:12
Message-ID: CAHE3wgjQ0sDSiWydn3AeT4cRcjQaOuChMVtAXv+98_BejMV-Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2018-05-20 20:48 GMT-03:00 Carter Thaxton <carter(dot)thaxton(at)gmail(dot)com>:
> Many times I've wanted to export a subset of a database, using some sort of
> row filter condition on some of the large tables. E.g. copying a production
> database to a staging environment, but with some time series data only from
> the past month.
>
How would you handle foreign keys? It seems easier to produce a dump
that won't restore.

> We have the existing options:
> --include-table=table (and its -t synonym)
> --exclude-table=table
> --exclude-table-data=table
>
> I propose a new option:
> --include-table-data-where=table:filter_clause
>
I remembered an old thread [1]. At that time pg_dump was not so
decoupled from the backend. We are far from being decoupled in a way
that someone can write his own pg_dump using only calls from a
library. I'm not sure pg_dump is the right place to add another ETL
parameter. We already have too much parameters that could break a
restore (flexibility is always welcome but too much is not so good).

> One would use this option as follows:
>
> pg_dump --include-table-data-where=largetable:"created_at >= '2018-05-01'"
> database_name
>
How would you check that that expression is correct? Every parameter
could quote its value. It means that your parameter have to escape the
quote in '2018-05-01'. Another problem is that your spec does not show
us how you would handle tables like Foo.Bar or "foo:bar" (colon have
to be escaped)?

> The filter_clause is used as the contents of a WHERE clause when querying
> the data to generate the COPY statement produced by pg_dump.
>
You are forgetting about --inserts parameter. Could I use
--include-table-data-where and --inserts?

[1] https://www.postgresql.org/message-id/1212299813.17810.17.camel%40ubuntu

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Blackwell 2018-05-22 01:07:03 [PATCH] Clear up perlcritic 'missing return' warning
Previous Message Thomas Munro 2018-05-22 00:07:17 Re: [HACKERS] kqueue