pgsql: Do COPY FROM encoding conversion/verification in larger chunks.

From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Do COPY FROM encoding conversion/verification in larger chunks.
Date: 2021-04-01 09:25:01
Message-ID: E1lRtZV-0000J2-0K@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Do COPY FROM encoding conversion/verification in larger chunks.

This gives a small performance gain, by reducing the number of calls
to the conversion/verification function, and letting it work with
larger inputs. Also, reorganizing the input pipeline makes it easier
to parallelize the input parsing: after the input has been converted
to the database encoding, the next stage of finding the newlines can
be done in parallel, because there cannot be any newline chars
"embedded" in multi-byte characters in the encodings that we support
as server encodings.

This changes behavior in one corner case: if client and server
encodings are the same single-byte encoding (e.g. latin1), previously
the input would not be checked for zero bytes ('\0'). Any fields
containing zero bytes would be truncated at the zero. But if encoding
conversion was needed, the conversion routine would throw an error on
the zero. After this commit, the input is always checked for zeros.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/f82de5c46bdf8cd65812a7b04c9509c218e1545d

Modified Files
--------------
src/backend/commands/copyfrom.c | 80 +++--
src/backend/commands/copyfromparse.c | 522 +++++++++++++++++++++++--------
src/include/commands/copyfrom_internal.h | 62 ++--
src/include/mb/pg_wchar.h | 22 +-
4 files changed, 502 insertions(+), 184 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2021-04-01 13:29:14 pgsql: libpq_pipeline: Must strdup(optarg) to avoid crash
Previous Message Heikki Linnakangas 2021-04-01 09:25:00 pgsql: Add 'noError' argument to encoding conversion functions.