Re: Add header support to text format and matching feature

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Rémi Lapeyre <remi(dot)lapeyre(at)lenstra(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add header support to text format and matching feature
Date: 2020-08-17 12:49:09
Message-ID: CALDaNm07bcwPY2W68Qghvw2QAV9X8NkV6pihGicko5om=txF+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for your comments, Please find my thoughts inline.

> In my tests it works fine except for one crash that I can reproduce
> on a fresh build and default configuration with:
>
> $ cat >file.txt
> i
> 1
>
> $ psql
> postgres=# create table x(i int);
> CREATE TABLE
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> PANIC: ERRORDATA_STACK_SIZE exceeded
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.

>
> Code comments:
>
>
> +/*
> + * Represents whether the header must be absent, present or present and
> match.
> + */
> +typedef enum CopyHeader
> +{
> + COPY_HEADER_ABSENT,
> + COPY_HEADER_PRESENT,
> + COPY_HEADER_MATCH
> +} CopyHeader;
> +
> /*
> * This struct contains all the state variables used throughout a COPY
> * operation. For simplicity, we use the same struct for all variants of
> COPY,
> @@ -136,7 +146,7 @@ typedef struct CopyStateData
> bool binary; /* binary format? */
> bool freeze; /* freeze rows on loading? */
> bool csv_mode; /* Comma Separated Value
> format? */
> - bool header_line; /* CSV or text header line? */
> + CopyHeader header_line; /* CSV or text header line? */
>
>
> After the redefinition into this enum type, there are still a
> bunch of references to header_line that treat it like a boolean:
>
> 1190: if (cstate->header_line)
> 1398: if (cstate->binary && cstate->header_line)
> 2119: if (cstate->header_line)
> 3635: if (cstate->cur_lineno == 0 && cstate->header_line)
>
> It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
> but maybe it's not good style to count on that.

Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.

>
>
>
> + PG_TRY();
> + {
> + if (defGetBoolean(defel))
> + cstate->header_line =
> COPY_HEADER_PRESENT;
> + else
> + cstate->header_line =
> COPY_HEADER_ABSENT;
> + }
> + PG_CATCH();
> + {
> + char *sval = defGetString(defel);
> +
> + if (!cstate->is_copy_from)
> + PG_RE_THROW();
> +
> + if (pg_strcasecmp(sval, "match") == 0)
> + cstate->header_line =
> COPY_HEADER_MATCH;
> + else
> + ereport(ERROR,
> +
> (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("header requires a
> boolean or \"match\"")));
> + }
> + PG_END_TRY();
>
> It seems wrong to use a PG_CATCH block for this. I understand that
> it's because defGetBoolean() calls ereport() on non-booleans, but then
> it should be split into an error-throwing function and a
> non-error-throwing lexical analysis of the boolean, the above code
> calling the latter.
> Besides the comments in elog.h above PG_TRY say that
> "the error recovery code
> can either do PG_RE_THROW to propagate the error outwards, or do a
> (sub)transaction abort. Failure to do so may leave the system in an
> inconsistent state for further processing."
> Maybe this is what happens with the repeated uses of "match"
> eventually failing with ERRORDATA_STACK_SIZE exceeded.
>

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.

>
> - HEADER [ <replaceable class="parameter">boolean</replaceable> ]
> + HEADER { <literal>match</literal> | <literal>true</literal> |
> <literal>false</literal> }
>
> This should be enclosed in square brackets because HEADER
> with no argument is still accepted.
>

Fixed.

>
>
>
> + names from the table. On input, the first line is discarded when set
> + to <literal>true</literal> or required to match the column names if
> set
>
> The elision of "header" as the subject might be misinterpreted as if
> it's the first line that is true. I'd suggest
> "when <literal>header>/literal> is set to ..." to avoid any confusion.
>

Fixed.

Attached v5 patch with the fixes of above comments.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v5-0001-Add-header-support-to-COPY-TO-text-format.patch text/x-patch 6.4 KB
v5-0002-Add-header-matching-mode-to-COPY-FROM.patch text/x-patch 12.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-08-17 12:58:47 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Fujii Masao 2020-08-17 12:30:31 Re: track_planning causing performance regression