Re: [PATCH] Initial progress reporting for COPY command

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: [PATCH] Initial progress reporting for COPY command
Date: 2020-06-23 12:52:01
Message-ID: CALj2ACVN18+z-RS1yKSE8ewD2dFMKpiLMN9HjvSQ093jJxBYBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> po 15. 6. 2020 v 7:34 odesílatel Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> napsal:
>>
>> > I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).
>> >
>> > 1. Is that a good way to get progress of file processing?
>>
>> IMO, it's better to handle the error cases. One possible case where
>> ftell can return -1 and set errno is when the total bytes processed is
>> more than LONG_MAX.
>>
>> Will your patch handle file_bytes_processed reporting for COPY FROM
>> STDIN cases? For this case, ftell can't be used.
>>
>> Instead of using ftell and worrying about the errors, a simple
>> approach could be to have a uint64 variable in CopyStateData to track
>> the number of bytes read whenever CopyGetData is called. This approach
>> can also handle the case of COPY FROM STDIN.
>
>
> Thanks for suggestion. I used this approach and latest patch supports both STDIN and STDOUT now.
>

Thanks.

It would be good to see the performance of the copy command(probably
with a few GBs of data) with patch and without patch for both csv/text
and binary files.

For copy from command CopyGetData gets called for every
RAW_BUF_SIZE(64KB) and so is CopyUpdateBytesProgress function, but for
binary format files, CopyGetData gets called for each field/column for
all rows/lines/tuples.

Can we make CopyUpdateBytesProgress() a macro or an inline
function(probably by using pg_attribute_always_inline) to reduce
function call overhead as it just handles two statements?

I tried to apply the patch on commit #
7ce461560159948ba0c802c767e42c5f5ae08b4a, seems like a warning.

bharath:postgres$ git apply /mnt/hgfs/Downloads/copy-progress-v2.diff
/mnt/hgfs/Downloads/copy-progress-v2.diff:277: trailing whitespace.
* for counting tuples inserted by an INSERT
command. Update
warning: 1 line adds whitespace errors.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-06-23 12:57:23 some more pg_dump refactoring
Previous Message Ranier Vilela 2020-06-23 12:31:51 [PATCH] fix size sum table_parallelscan_estimate (src/backend/access/table/tableam.c)