Re: [HACKERS] CLUSTER command progress monitor

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-03-18 11:42:23
Message-ID: CA+FpmFeMbXHpOW3oX83OU=76eiE05Kw=qBMWcPPG84LYKsM35g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
<yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> On 2019/03/06 15:38, Tatsuro Yamada wrote:
> > On 2019/03/05 17:56, Tatsuro Yamada wrote:
> >> On 2019/03/05 11:35, Robert Haas wrote:
> >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> >>> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >>>> === Current design ===
> >>>>
> >>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
> >>>> Depending on which one is chosen, the command will proceed in the
> >>>> following sequence of phases:
> >>>>
> >>>> * Scan method: Seq Scan
> >>>> 0. initializing (*2)
> >>>> 1. seq scanning heap (*1)
> >>>> 3. sorting tuples (*2)
> >>>> 4. writing new heap (*1)
> >>>> 5. swapping relation files (*2)
> >>>> 6. rebuilding index (*2)
> >>>> 7. performing final cleanup (*2)
> >>>>
> >>>> * Scan method: Index Scan
> >>>> 0. initializing (*2)
> >>>> 2. index scanning heap (*1)
> >>>> 5. swapping relation files (*2)
> >>>> 6. rebuilding index (*2)
> >>>> 7. performing final cleanup (*2)
> >>>>
> >>>> VACUUM FULL command will proceed in the following sequence of phases:
> >>>>
> >>>> 1. seq scanning heap (*1)
> >>>> 5. swapping relation files (*2)
> >>>> 6. rebuilding index (*2)
> >>>> 7. performing final cleanup (*2)
> >>>>
> >>>> (*1): increasing the value in heap_tuples_scanned column
> >>>> (*2): only shows the phase in the phase column
> >>>
> >>> All of that sounds good.
> >>>
> >>>> The view provides the information of CLUSTER command progress details as follows
> >>>> # \d pg_stat_progress_cluster
> >>>> View "pg_catalog.pg_stat_progress_cluster"
> >>>> Column | Type | Collation | Nullable | Default
> >>>> ---------------------------+---------+-----------+----------+---------
> >>>> pid | integer | | |
> >>>> datid | oid | | |
> >>>> datname | name | | |
> >>>> relid | oid | | |
> >>>> command | text | | |
> >>>> phase | text | | |
> >>>> cluster_index_relid | bigint | | |
> >>>> heap_tuples_scanned | bigint | | |
> >>>> heap_tuples_vacuumed | bigint | | |
> >>>
> >>> Still not sure if we need heap_tuples_vacuumed. We could try to
> >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> >>> we're using a Seq Scan.
> >>
> >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
> >> next patch.
> >>
> >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
> >> get those from initscan(). I'll investigate it more.
> >>
> >> cluster.c
> >> copy_heap_data()
> >> heap_beginscan()
> >> heap_beginscan_internal()
> >> initscan()
> >>
> >>
> >>
> >>>> === Discussion points ===
> >>>>
> >>>> - Progress counter for "3. sorting tuples" phase
> >>>> - Should we add pgstat_progress_update_param() in tuplesort.c like a
> >>>> "trace_sort"?
> >>>> Thanks to Peter Geoghegan for the useful advice!
> >>>
> >>> How would we avoid an abstraction violation?
> >>
> >> Hmm... What do you mean an abstraction violation?
> >> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
> >>
> >>
> >>>> - Progress counter for "6. rebuilding index" phase
> >>>> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
> >>>> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
> >>>> However, I'm not sure whether it is okay or not.
> >>>
> >>> Doesn't seem unreasonable to me.
> >>
> >> I see, I'll add it later.
> >
> >
> > Attached file is revised and WIP patch including:
> >
> > - Remove heap_tuples_vacuumed
> > - Add heap_blks_scanned and heap_blks_total
> > - Add index_vacuum_count
> >
> > I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
> > "heap_tuples_scanned" column is suitable as a counter when a scan method is
> > both index-scan and seq-scan because CLUSTER is on a tuple basis.
>
>
> Attached file is rebased patch on current HEAD.
> I changed a status. :)
>
>
Looks like the patch needs a rebase.
I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2

PFA reject file in case you want to have a look.
> Regards,
> Tatsuro Yamada
>
>
>

--
Regards,
Rafia Sabih

Attachment Content-Type Size
cluster.c.rej text/x-reject 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-03-18 11:59:45 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Previous Message Rafia Sabih 2019-03-18 10:31:48 Re: explain plans with information about (modified) gucs