Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-18 02:41:42
Message-ID: 51328f620d58dcb5e554dbe32d1c0533@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023-12-15 05:48, Masahiko Sawada wrote:

Thanks for joining this discussion!

> I've read this thread and the latest patch. IIUC with SAVE_ERROR
> option, COPY FROM creates an error table for the target table and
> writes error information there.
>
> While I agree that the final shape of this feature would be something
> like that design, I'm concerned some features are missing in order to
> make this feature useful in practice. For instance, error logs are
> inserted to error tables without bounds, meaning that users who want
> to tolerate errors during COPY FROM will have to truncate or drop the
> error tables periodically, or the database will grow with error logs
> without limit. Ideally such maintenance work should be done by the
> database. There might be some users who want to log such conversion
> errors in server logs to avoid such maintenance work. I think we
> should provide an option for where to write, at least. Also, since the
> error tables are normal user tables internally, error logs are also
> replicated to subscribers if there is a publication FOR ALL TABLES,
> unlike system catalogs. I think some users would not like such
> behavior.
>
> Looking at SAVE_ERROR feature closely, I think it consists of two
> separate features. That is, it enables COPY FROM to load data while
> (1) tolerating errors and (2) logging errors to somewhere (i.e., an
> error table). If we implement only (1), it would be like COPY FROM
> tolerate errors infinitely and log errors to /dev/null. The user
> cannot see the error details but I guess it could still help some
> cases as Andres mentioned[1] (it might be a good idea to send the
> number of rows successfully loaded in a NOTICE message if some rows
> could not be loaded). Then with (2), COPY FROM can log error
> information to somewhere such as tables and server logs and the user
> can select it.

+1.
I may be biased since I wrote some ~v6 patches which just output the
soft errors and number of skipped rows to log, but I think just (1)
would be worth implementing as you pointed out and I like if users could
choose where to log output.

I think there would be situations where it is preferable to save errors
to server log even considering problems which were pointed out in [1],
i.e. manually loading data.

[1]
https://www.postgresql.org/message-id/739953.1699467519%40sss.pgh.pa.us

> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.
>
> Regards,
>
> [1]
> https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-12-18 03:30:25 Re: "pgoutput" options missing on documentation
Previous Message Peter Smith 2023-12-18 01:15:33 Re: "pgoutput" options missing on documentation