Re: [Proposal] Allow users to specify multiple tables in VACUUM commands

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Date: 2017-09-25 05:41:53
Message-ID: CAB7nPqS7bBVCqHXfRAx9Ztc5ppfgK1jKHZm10vDfd8hhoSKSRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 23, 2017 at 12:56 AM, Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> On 9/21/17, 9:55 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
>> I still think that ExecVacuum() should pass a list of VacuumRelation
>> objects to vacuum(), and get_rel_oids() should take in input a list,
>> and return a completed lists. This way the decision-making of doing
>> everything in the same transaction should happens once in vacuum().
>> And actually, if several relations are defined with VACUUM, your patch
>> would not use one transaction per table as use_own_xacts would be set
>> to false. I think that Tom meant that relations whose processed has
>> finished have to be committed immediately. Per your patch, the commit
>> happens once all relations are committed.
>
> Sorry, I must have misunderstood. I've attached an updated patch that
> looks more like what you described. I also cleaned up the test cases
> a bit.
>
> IIUC the only time use_own_xacts will be false is when we are only
> doing ANALYZE and at least one of the following is true:
>
> 1. We are in a transaction block.
> 2. We are processing only one relation.

Yes.

> From the code, it appears that vacuum_rel() always starts and commits a
> new transaction for each relation:
>
> * vacuum_rel expects to be entered with no transaction active; it will
> * start and commit its own transaction. But we are called by an SQL

Yes.

> So, by ensuring that get_rel_oids() returns a list whenever multiple
> tables are specified, we are making sure that commands like
>
> ANALYZE table1, table2, table3;
>
> create transactions for each processed relation (as long as they are
> not inside a transaction block).

Yes.

> I suppose the alternative would be
> to call vacuum() for each relation and to remove the restriction that
> we must be processing more than one relation for use_own_xacts to be
> true.

The main point of my comment is that like ExecVacuum(), vacuum()
should be a high-level function where is decided if multiple
transactions should be run or not. By calling vacuum() multiple times
you break this promise. vacuum_rel should be the one working with
individual transactions.

Here is the diff between v19 and v21 that matters here:
/* Now go through the common routine */
- if (vacstmt->rels == NIL)
- vacuum(vacstmt->options, NULL, &params, NULL, isTopLevel);
- else
- {
- ListCell *lc;
- foreach(lc, vacstmt->rels)
- vacuum(vacstmt->options, lfirst_node(VacuumRelation, lc),
- &params, NULL, isTopLevel);
- }
+ vacuum(vacstmt->options, vacstmt->rels, &params, NULL, isTopLevel);
If you do so, for an ANALYZE with multiple relations you would finish
by using the same transaction for all relations. I think that we had
better be consistent with VACUUM when not using an outer transaction
so as tables are analyzed and committed one by one. This does not
happen here: a unique transaction is used when using a list of
non-partitioned tables.

On Sun, Sep 24, 2017 at 4:37 AM, Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> Here is a version of the patch without the switch to AutovacMemCxt in
> autovacuum_do_vac_analyze(), which should no longer be necessary after
> 335f3d04.

Thanks for the new version.

+ if (!IsAutoVacuumWorkerProcess())
+ ereport(WARNING,
+ (errmsg("skipping \"%s\" --- relation no longer exists",
+ relation->relname)));
I like the use of WARNING here, but we could use as well a LOG to be
consistent when a lock obtention is skipped.

+ * going to commit this transaction and begin a new one between now
+ * and then.
+ */
+ relid = RangeVarGetRelid(relinfo->relation, NoLock, false);
We will likely have to wait that the matters discussed in
https://www.postgresql.org/message-id/25023.1506107590@sss.pgh.pa.us
are settled.

+VACUUM FULL vactst, vactst, vactst, vactst;
This is actually a waste of cycles.

I think I don't have much other comments about this patch.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-25 05:42:52 Re: Setting pd_lower in GIN metapage
Previous Message Amit Kapila 2017-09-25 05:26:07 Re: Setting pd_lower in GIN metapage