Re: Add header support to text format and matching feature

From: Rémi Lapeyre <remi(dot)lapeyre(at)lenstra(dot)fr>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add header support to text format and matching feature
Date: 2020-08-27 14:53:11
Message-ID: 0B55BD07-83E4-439F-AACC-FA2D7CF50532@lenstra.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Daniel for the review and Vignesh for addressing the comments.

I have two remarks with the state of the current patches:
- DefGetCopyHeader() duplicates a lot of code from defGetBoolean(), should we refactor this so that they can share more of their internals? In the current implementation any change to defGetBoolean() should be made to DefGetCopyHeader() too or their behaviour will subtly differ.
- It is possible to set the header option multiple time:
\copy x(i) from file.txt with (format csv, header off, header on);
In which case the last one is the one kept. I think this is a bug and it should be fixed, but this is already the behaviour in the current implementation so fixing it would not be backward compatible. Do you think users should not do this and I can fix it or that keeping the current behaviour is better for backward compatibility?

Regards,
Rémi

> Le 17 août 2020 à 14:49, vignesh C <vignesh21(at)gmail(dot)com> a écrit :
>
> Thanks for your comments, Please find my thoughts inline.
>
>> In my tests it works fine except for one crash that I can reproduce
>> on a fresh build and default configuration with:
>>
>> $ cat >file.txt
>> i
>> 1
>>
>> $ psql
>> postgres=# create table x(i int);
>> CREATE TABLE
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> PANIC: ERRORDATA_STACK_SIZE exceeded
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>>
>
> Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
>
>>
>> Code comments:
>>
>>
>> +/*
>> + * Represents whether the header must be absent, present or present and
>> match.
>> + */
>> +typedef enum CopyHeader
>> +{
>> + COPY_HEADER_ABSENT,
>> + COPY_HEADER_PRESENT,
>> + COPY_HEADER_MATCH
>> +} CopyHeader;
>> +
>> /*
>> * This struct contains all the state variables used throughout a COPY
>> * operation. For simplicity, we use the same struct for all variants of
>> COPY,
>> @@ -136,7 +146,7 @@ typedef struct CopyStateData
>> bool binary; /* binary format? */
>> bool freeze; /* freeze rows on loading? */
>> bool csv_mode; /* Comma Separated Value
>> format? */
>> - bool header_line; /* CSV or text header line? */
>> + CopyHeader header_line; /* CSV or text header line? */
>>
>>
>> After the redefinition into this enum type, there are still a
>> bunch of references to header_line that treat it like a boolean:
>>
>> 1190: if (cstate->header_line)
>> 1398: if (cstate->binary && cstate->header_line)
>> 2119: if (cstate->header_line)
>> 3635: if (cstate->cur_lineno == 0 && cstate->header_line)
>>
>> It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
>> but maybe it's not good style to count on that.
>
> Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.
>
>>
>>
>>
>> + PG_TRY();
>> + {
>> + if (defGetBoolean(defel))
>> + cstate->header_line =
>> COPY_HEADER_PRESENT;
>> + else
>> + cstate->header_line =
>> COPY_HEADER_ABSENT;
>> + }
>> + PG_CATCH();
>> + {
>> + char *sval = defGetString(defel);
>> +
>> + if (!cstate->is_copy_from)
>> + PG_RE_THROW();
>> +
>> + if (pg_strcasecmp(sval, "match") == 0)
>> + cstate->header_line =
>> COPY_HEADER_MATCH;
>> + else
>> + ereport(ERROR,
>> +
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("header requires a
>> boolean or \"match\"")));
>> + }
>> + PG_END_TRY();
>>
>> It seems wrong to use a PG_CATCH block for this. I understand that
>> it's because defGetBoolean() calls ereport() on non-booleans, but then
>> it should be split into an error-throwing function and a
>> non-error-throwing lexical analysis of the boolean, the above code
>> calling the latter.
>> Besides the comments in elog.h above PG_TRY say that
>> "the error recovery code
>> can either do PG_RE_THROW to propagate the error outwards, or do a
>> (sub)transaction abort. Failure to do so may leave the system in an
>> inconsistent state for further processing."
>> Maybe this is what happens with the repeated uses of "match"
>> eventually failing with ERRORDATA_STACK_SIZE exceeded.
>>
>
> Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
>
>>
>> - HEADER [ <replaceable class="parameter">boolean</replaceable> ]
>> + HEADER { <literal>match</literal> | <literal>true</literal> |
>> <literal>false</literal> }
>>
>> This should be enclosed in square brackets because HEADER
>> with no argument is still accepted.
>>
>
> Fixed.
>
>>
>>
>>
>> + names from the table. On input, the first line is discarded when set
>> + to <literal>true</literal> or required to match the column names if
>> set
>>
>> The elision of "header" as the subject might be misinterpreted as if
>> it's the first line that is true. I'd suggest
>> "when <literal>header>/literal> is set to ..." to avoid any confusion.
>>
>
> Fixed.
>
> Attached v5 patch with the fixes of above comments.
> Thoughts?
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
> <v5-0001-Add-header-support-to-COPY-TO-text-format.patch><v5-0002-Add-header-matching-mode-to-COPY-FROM.patch>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-27 15:46:09 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Previous Message Robert Haas 2020-08-27 14:19:21 Re: factorial function/phase out postfix operators?