Re: [Proposal] Progress bar for pg_dump/pg_restore

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>
Cc: "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Akio Iwaasa <aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com>
Subject: Re: [Proposal] Progress bar for pg_dump/pg_restore
Date: 2015-06-22 02:45:54
Message-ID: CAMsr+YEwOVwVoibmCAEnJc5J_0gi9fD0kzqfVNZMZObcozdKtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 June 2015 at 16:45, Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com> wrote:
> Hi, andres
>
> Thank you for your comment, and sorry for late response.
>
>> The question is how to actually get useful estimates. As there's no
>> progress report for indvidiual COPY and CREATE INDEX commands you'll, in
>> many cases, have very irregular progress updates. In many many cases
>> most of the time is spent on a very small subset of the total objects.
>
> When dumping, I think number of tuples can be got from pg_class.reltuples, therefore I want pg_dump to run "select reltuples" to get it, and then pg_dump will calculate estimated time to execute "COPY FROM" command in getting each tuples.

It'd need to be a bit smarter than that, since it'd have to take some
account of average tuple size, etc, but it's an interesting idea to
use the stats to guestimate copy times.

> For restoring, I think it's better to record above information (number of tuples) into pg_dump file to estimate time to restore tables.

Since we generally suggest that people use a pg_dump and pg_restore
from the server version they're going to be restoring to, that should
be OK. It'd create some new entries in the pg_restore file manifest
that older pg_restore versions wouldn't understand.

> And, I also understood your concern about "CREATE INDEX", but we have no way to get progress information of "CREATE INDEX".
> At present, I think it may be good to refer to the same time as estimated time to execute "COPY TO".

You could probably get a handwave-quality estimate by looking at the
average column widths for the cols included in the index plus the
number of tuples in the table. It'd be useless for expression indexes,
partial indexes, etc, but you can't have everything...

Interesting idea to explore.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-06-22 03:18:17 Re: proposal: row_to_array function
Previous Message Craig Ringer 2015-06-22 02:32:03 Re: Is Postgres database server works fine if there is a change in system time?