Re: COPY enhancements

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-09-10 21:05:28
Message-ID: 4AA96A18.9030403@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Emmanuel, Hackers,

Thank you for tackling this very long-time TODO.

> Error logging is described here:
> http://wiki.postgresql.org/wiki/Error_logging_in_COPY

Questions & Comments:

A) Why would someone want to turn error_logging on, but leave
error_logging_skip_tuples off? The pg_log already logs errors which
copy throws by default.

B) As I mentioned earlier, we'll want to provide the option of logging
to a file instead of to a table. That's not a reason to reject this
patch, but probably a TODO for 8.5.

C) Are we sure we want to handle this via GUCs rather than extensions to
COPY syntax? It seems like fairly often users would want to log
different COPY sources to different tables/files.

D) These GUCs are userset, I hope? (haven't dug into the code far
enough to tell yet).

E) What is error_logging_tuple_label for? You don't explain/give
examples. And how is error_logging_tuple_partition_key used?

F) Rawdata for rejected tuples is presumably BYTEA?

G) We should probably have a default for error_logging_table_name, such
as pg_copy_errors. Does that table get automatically created if it
doesn't exist?

H) Finally, one request of the TODO is some way to halt import after a
specified number of bad tuples because it probably means you have the
wrong file or wrong table. Do we still want that?

> Autopartitioning is described here:
> http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY

M) tuple_routing_in_copy should take "on" or "off", not 0 or 1.

N) Have you measured the overhead & speed of this kind of COPY as
opposed to COPY into a single table? Have you checked the overhead if
tuple_routing_in_copy is on, but you are not loading into a partitioned
table?

O) Is this capable of dealing with partitioning by more than one column,
or by an expression?

Finally, I'm going to suggest different names for the GUCs, as the names
you've chosen don't group well and would likely cause confusion. Here
are my suggestions, which all begin with "copy_" for prefix matching:

error_logging --> probaby not needed, see able
error_logging_skip_tuples --> copy_skip_bad_rows
error_logging_schema_name --> copy_logging_schema_name
error_logging_relation_name --> copy_logging_table_name
error_logging_tuple_label --> don't know what this is for, see above
error_logging_tuple_partition_key --> don't know what this is for, see above

tuple_routing_in_copy --> copy_partitioning
tuple_routing_cache_size --> copy_partitioning_cache_size

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2009-09-10 21:08:45 Re: RfD: more powerful "any" types
Previous Message Robert Haas 2009-09-10 20:56:58 Re: RfD: more powerful "any" types