Re: Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-02-17 07:42:34
Message-ID: 38d14945-b84c-0769-da36-b1f67c65395d@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/02/16 23:40, Robert Haas wrote:
> On Thu, Feb 16, 2017 at 6:32 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Why not vacuum all partitions?
>> Why not analyze all partitions?
>> Truncate all partitions
>
> I agree. But, we need to be careful that a database-wide VACUUM or
> ANALYZE doesn't hit the partitions multiple times, once for the parent
> and again for each child. Actually, a database-wide VACUUM should hit
> each partition individually and do nothing for the parents, but a

This is what would happen even without the patch. Patch only modifies
what happens when a partitioned table is specified in the vacuum command.
It emits a warning:

WARNING: skipping "%s" --- cannot vacuum partitioned tables

It seems both you and Simon agree that instead of this warning, we should
recurse to process the leaf partitions (ignoring any partitioned tables in
the hierarchy for which there is nothing to do). If that's right, I will
modify the patch to do that.

> database-wide ANALYZE should process the parents and do nothing for
> the children, so that the inheritance statistics get updated.

Currently vacuum() processes the following relations:

/*
* Process all plain relations and materialized views listed in
* pg_class
*/

while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);

if (classForm->relkind != RELKIND_RELATION &&
classForm->relkind != RELKIND_MATVIEW)
continue;

Do you mean that if database-wide analyze is to be run, we should also
exclude those RELKIND_RELATION relations that are partitions?

So the only way to update a partition's statistics is to directly specify
it in the command or by autovacuum.

Truncate already recurses to partitions by way of inheritance recursion
that's already in place. The patch simply teaches ExecuteTruncate() to
ignore partitioned tables when we get to the part where relfilenodes are
manipulated.

>>> - ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
>>> tables, because there is nothing to be done.
>>>
>>> - Since we cannot create indexes on partitioned tables anyway, there is
>>> no need to handle cluster and reindex (they throw a meaningful error
>>> already due to the lack of indexes.)
>>
>> Create index on all partitions
>
> That one's more complicated, per what I wrote in
> https://www.postgresql.org/message-id/CA+TgmoZUwj=QYnaK+F7xEf4w_e2g3XxdMnSNZMZjuinHRcOB8A@mail.gmail.com

I agree that it would be nice to fix this usability issue. But, it's also
important as you say in the quoted email that we should not underestimate
the amount of work that might be required to properly implement it.

>> (It also seems like wasted effort to try to remove the overhead caused
>> by a parent table for partitioning. Why introduce a special case to
>> save a few bytes? Premature optimization, surely?)
>
> I don't think it's wasted effort, actually. My concern isn't so much
> the empty file on disk (which is stupid, but maybe harmless) as
> eliminating the dummy scan from the query plan. I believe the
> do-nothing scan can actually be a noticeable drag on performance in
> some cases - e.g. if the scan of the partitioned table is on the
> inside of a nested loop, so that instead of repeatedly doing an index
> scan on each of 4 partitions, you repeatedly do an index scan on each
> of 4 partitions and a sequential scan of an empty table. A zero-page
> sequential scan is pretty fast, but not free. An even bigger problem
> is that the planner may think that always-empty parent can contain
> some rows, throwing planner estimates off and messing up the whole
> plan. We've been living with that problem for a long time, but now
> that we have an opportunity to fix it, it would be good to do so.

Agreed. I would have reconsidered if it were a more invasive patch.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-02-17 07:45:19 Re: Measuring replay lag
Previous Message Thomas Munro 2017-02-17 07:41:50 Re: Measuring replay lag