Re: Improvements in Copy From

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improvements in Copy From
Date: 2020-09-10 11:21:22
Message-ID: CALDaNm1MhZFSFEuuMTHyBziJ5N5JUB337na5e9fpkVqPG29e9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 9, 2020 at 12:24 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> My basic understanding of first part of your patch is that by
> adjusting the "minread" it now allows it to loop multiple times
> internally within the CopyGetData rather than calling CopyLoadRawBuf
> for every N lines. There doesn't seem to be much change to what other
> code gets executed so the saving is essentially whatever is the cost
> of making 2 x function calls (CopyLoadRawBuff + CopyGetData) x N. Is
> that understanding correct?

Yes you are right, we will avoid the function calls and try to get as
many records as possible from the buffer & insert it to the relation.

> But with that change there seems to be opportunity for yet another
> tiny saving possible. IIUC, now you are processing a lot more data
> within the CopyGetData so it is now very likely that you will also
> gobble the COPY_NEW_FE's 'c' marker. So cstate->reached_eof will be
> set. So this means the calling code of CopyReadLineText may no longer
> need to call the CopyLoadRawBuf one last time just to discover there
> are no more bytes to read - something that it already knows if
> cstate->reached_eof == true.
>
> For example, with your change can't you also modify CopyReadLineText like below:
>
> BEFORE
> if (!CopyLoadRawBuf(cstate))
> hit_eof = true;
>
> AFTER
> if (cstate->reached_eof)
> {
> cstate->raw_buf[0] = '\0';
> cstate->raw_buf_index = cstate->raw_buf_len = 0;
> hit_eof = true;
> }
> else if (!CopyLoadRawBuf(cstate))
> {
> hit_eof = true;
> }
>
> Whether such a micro-optimisation is worth doing is another question.
Yes, what you suggested can also be done, but even I have the same
question as you. Because we will reduce just one function call, the
eof check is present immediately in the function, Should we include
this or not?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-09-10 11:40:22 Re: Two fsync related performance issues?
Previous Message Pavel Borisov 2020-09-10 11:19:10 Re: Yet another fast GiST build