Re: BUG #16300: Text line order corruption with COPY command

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Hans Buschmann <buschmann(at)nidsa(dot)net>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16300: Text line order corruption with COPY command
Date: 2020-03-13 15:19:49
Message-ID: CAKFQuwZ+rbpG=mdHyRD0-OyKcKmKMSr3hYknQOYVvgzH604k=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Mar 13, 2020 at 1:50 AM Hans Buschmann <buschmann(at)nidsa(dot)net> wrote:

> ------------------------------
> *Von:* David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
> *Gesendet:* Donnerstag, 12. März 2020 21:42
> *An:* Hans Buschmann; PostgreSQL mailing lists
> *Betreff:* Re: BUG #16300: Text line order corruption with COPY command
>
>
>
> Thank you for the quick reply.
>
>
>
> When looking into the documentation I find under SQL COPY command:
>
>
>
> "If a column list is specified, COPY TO copies only the data in the
> specified columns to the file.
> For COPY FROM, each field in the file is inserted, in order, into the
> specified column.
> Table columns not specified in the COPY FROM column list will receive
> their default values. "
>
>
>
> So I expected the insertion in order as said above.
>

The above is talking about COLUMNS. You are talking about ROWS.

> In my opinion it is essential to preserve the order of textfile input in
> COPY FROM. This also holds true when copying from another source like a
> program, where often it is not practicable to add an orderable column on
> input.
>
SQL does not do this. You are free to either adapt your processing to
conform to what SQL does provide or choose a different language that better
meets your needs. PostgreSQL is simply adhering to a fundamental property
of the SQL language.

> Not preserving (and rendering on a select) the order makes the COPY FROM
> mostly unusable for cases where the order must be preserved. (Think of XML,
> JSON, Source code, Log files, Disassembly etc.).
>

If you are storing source code, json, xml, etc... into an SQL database
where each line in the original file ends up being a single record in the
database I propose that you are doing something fundamentally wrong.
Either your solution is ill-designed or you've chosen the wrong tool for
the job. So yes, it (copy and SQL generally) is unusable for those cases,
but it is perfectly usable for many others and does so with efficiency.
Trying to make it work for something like this would compromise that
efficiency for use cases that are questionable.

> My concern also goes to pg_dump and friends, which could change such not
> natural orderable tables through pg_dump/pg_restore.
>
If order matters there must be, somewhere, an identifier indicating the
position of each record in that sequence. SQL requires the user to be
explicit in defining that identifier instead of magically creating one. It
provide a sequence feature to bridge the gap.

David J.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-03-14 03:39:16 BUG #16301: unable to connect to server
Previous Message Francisco Olarte 2020-03-13 09:26:16 Re: BUG #16300: Text line order corruption with COPY command