Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Damir <dam(dot)bel07(at)gmail(dot)com>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, andres(at)anarazel(dot)de, daniel(at)yesql(dot)se, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-11-08 18:18:39
Message-ID: 739953.1699467519@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Damir <dam(dot)bel07(at)gmail(dot)com> writes:
> [ v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch ]

Sorry for being so late to the party, but ... I don't think this
is a well-designed feature as it stands. Simply dropping failed rows
seems like an unusable definition for any application that has
pretensions of robustness. "But", you say, "we're emitting WARNING
messages about it". That's *useless*. For most applications WARNING
messages just go into the bit bucket, or worse they cause memory leaks
(because the app never reads them). An app that tried to read them
would have to cope with all sorts of fun such as translated messages.
Furthermore, as best I can tell from the provided test cases, the
messages completely lack basic context such as which field or line
the problem occurred in. An app trying to use this to understand
which input lines had failed would not get far.

I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.

Also it'd be a good idea to have a vision of how the feature
could be extended to cope with lower-level errors, such as
lines that have the wrong number of columns or other problems
with line-level syntax. I don't say we need to cope with that
immediately, but it's going to be something people will want
to add, I think.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-11-08 18:29:38 Re: Syncrep and improving latency due to WAL throttling
Previous Message Robert Haas 2023-11-08 18:16:58 Re: Requiring recovery.signal or standby.signal when recovering with a backup_label