Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch: Selective binary conversion of CSV file foreign tables
Date: 2012-06-26 14:04:41
Message-ID: CADyhKSWfT5WPRAQurQ9dmREPJKDwVY+uxH47VPfMmLjg3-Xw-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012/6/26 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
> Hi Kaigai-san,
>
>> -----Original Message-----
>> From: Kohei KaiGai [mailto:kaigai(at)kaigai(dot)gr(dot)jp]
>> Sent: Monday, June 25, 2012 9:49 PM
>> To: Etsuro Fujita
>> Cc: Robert Haas; pgsql-hackers(at)postgresql(dot)org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> Fujita-san,
>>
>> The revised patch is almost good for me.
>> Only point I noticed is the check for redundant or duplicated options I
> pointed
>> out on the previous post.
>> So, if you agree with the attached patch, I'd like to hand over this patch for
>> committers.
>
> OK I agree with you.  Thanks for the revision!
>
> Besides the revision, I modified check_selective_binary_conversion() to run
> heap_close() in the whole-row-reference case.  Attached is an updated version of
> the patch.
>
Sorry, I overlooked this code path. It seems to me fair enough.

So, I'd like to take over this patch for committers.

Thanks,

> Thanks.
>
> Best regards,
> Etsuro Fujita
>
>> All I changed is:
>>  --- a/src/backend/commands/copy.c
>>  +++ b/src/backend/commands/copy.c
>>  @@ -122,6 +122,11 @@ typedef struct CopyStateData @@ -217,7 +221,7 @@ index
>> 98bcb2f..0143d81 100644
>>         }
>>  +      else if (strcmp(defel->defname, "convert_binary") == 0)
>>  +      {
>> -+          if (cstate->convert_binary)
>> ++          if (cstate->convert_selectively)
>>  +              ereport(ERROR,
>>  +                      (errcode(ERRCODE_SYNTAX_ERROR),
>>  +                       errmsg("conflicting or redundant options")));
>>
>> Thanks,
>>
>> 2012/6/20 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
>> > Hi KaiGai-san,
>> >
>> > Thank you for the review.
>> >
>> >> -----Original Message-----
>> >> From: pgsql-hackers-owner(at)postgresql(dot)org
>> >> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kohei KaiGai
>> >> Sent: Wednesday, June 20, 2012 1:26 AM
>> >> To: Etsuro Fujita
>> >> Cc: Robert Haas; pgsql-hackers(at)postgresql(dot)org
>> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
>> >> file foreign tables
>> >>
>> >> Hi Fujita-san,
>> >>
>> >> Could you rebase this patch towards the latest tree?
>> >> It was unavailable to apply the patch cleanly.
>> >
>> > Sorry, I updated the patch.  Please find attached an updated version
>> > of the patch.
>> >
>> >> I looked over the patch, then noticed a few points.
>> >>
>> >> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
>> >> If so, cstate->convert_binary is not a suitable flag to check
>> >> redundant option. It seems to me cstate->convert_selectively is more
>> >> suitable flag to check it.
>> >>
>> >> +       else if (strcmp(defel->defname, "convert_binary") == 0)
>> >> +       {
>> >> +           if (cstate->convert_binary)
>> >> +               ereport(ERROR,
>> >> +                       (errcode(ERRCODE_SYNTAX_ERROR),
>> >> +                        errmsg("conflicting or redundant
>> >> + options")));
>> >> +           cstate->convert_selectively = true;
>> >> +           if (defel->arg == NULL || IsA(defel->arg, List))
>> >> +               cstate->convert_binary = (List *) defel->arg;
>> >> +           else
>> >> +               ereport(ERROR,
>> >> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> >> +                        errmsg("argument to option \"%s\" must be a
>> >> list of column names",
>> >> +                               defel->defname)));
>> >> +       }
>> >
>> > Yes, defel->arg can be NIL.  defel->arg is a List structure listing
>> > all the columns needed to be converted to binary representation, which
>> > is NIL for the case where no columns are needed to be converted.  For
>> > example,
>> > defel->arg is NIL for SELECT COUNT(*).  In this case, while
>> > cstate->convert_selectively is set to true, no columns are converted
>> > cstate->at
>> > NextCopyFrom().  Most efficient case!  In short,
>> > cstate->convert_selectively represents whether to do selective binary
>> > conversion at NextCopyFrom(), and
>> > cstate->convert_binary represents all the columns to be converted at
>> > NextCopyFrom(), that can be NIL.
>> >
>> >> At NextCopyFrom(), this routine computes default values if configured.
>> >> In case when these values are not referenced, it might be possible to
>> >> skip unnecessary calculations.
>> >> Is it unavailable to add logic to avoid to construct cstate->defmap
>> >> on unreferenced columns at  BeginCopyFrom()?
>> >
>> > I think that we don't need to add the above logic because file_fdw
>> > does
>> > BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
>> > doesn't construct cstate->defmap at all.
>> >
>> > I fixed a bug plus some minor optimization in
>> > check_binary_conversion() that is renamed to
>> > check_selective_binary_conversion() in the updated version, and now
>> > file_fdw gives up selective binary conversion for the following
>> > cases:
>> >
>> >  a) BINARY format
>> >  b) CSV/TEXT format and whole row reference
>> >  c) CSV/TEXT format and all the user attributes needed
>> >
>> >
>> > Best regards,
>> > Etsuro Fujita
>> >
>> >> Thanks,
>> >>
>> >> 2012/5/11 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
>> >> >> -----Original Message-----
>> >> >> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
>> >> >> Sent: Friday, May 11, 2012 1:36 AM
>> >> >> To: Etsuro Fujita
>> >> >> Cc: pgsql-hackers(at)postgresql(dot)org
>> >> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of
>> >> >> CSV file foreign tables
>> >> >>
>> >> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
>> >> > <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
>> >> >> wrote:
>> >> >> > I would like to propose to improve parsing efficiency of
>> >> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> >> >> > al.[1], which means that for a CSV/TEXT file foreign table,
>> >> >> > file_fdw performs binary conversion only for the columns needed
>> >> >> > for query processing.  Attached is a WIP patch implementing the
> feature.
>> >> >>
>> >> >> Can you add this to the next CommitFest?  Looks interesting.
>> >> >
>> >> > Done.
>> >> >
>> >> > Best regards,
>> >> > Etsuro Fujita
>> >> >
>> >> >> --
>> >> >> Robert Haas
>> >> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise
>> >> >> PostgreSQL Company
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> >> > To make changes to your subscription:
>> >> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >>
>> >>
>> >>
>> >> --
>> >> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>> >>
>> >> --
>> >> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To
>> >> make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-hackers
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-06-26 14:06:08 Re: [PATCH 01/16] Overhaul walsender wakeup handling
Previous Message Robert Haas 2012-06-26 14:01:26 Re: [PATCH 01/16] Overhaul walsender wakeup handling