Re: Partitioning versus autovacuum

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning versus autovacuum
Date: 2019-10-04 02:13:59
Message-ID: CA+HiwqEeZQ-H2OVbHZ=n2RNNPF84Hygi1HC-MDwC-VnBjpA1=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Greg,

On Tue, Oct 1, 2019 at 4:03 AM Greg Stark <stark(at)mit(dot)edu> wrote:
>
> Actually -- I'm sorry to followup to myself (twice) -- but that's
> wrong. That Todo item predates the modern partitioning code. It came
> from when the partitioned statistics were added for inheritance trees.
> The resulting comment almost doesn't make sense any more since it
> talks about updates to the parent table and treats them as distinct
> from updates to the children.
>
> In any case it's actually not true any more as updates to the parent
> table aren't even tracked any more -- see below. My modest proposal is
> that we should count any updates that arrive through the parent table
> as mods for both the parent and child.

Yeah, we need to teach autovacuum to consider analyzing partitioned
tables. That is still a TODO for declarative partitioning.

We do need to weigh the trade-offs here. In the thread quoted in your
previous email, Tom expresses a concern [1] about ending up doing
excessive work, because partitions would be scanned twice -- first to
collect their own statistics and then to collect the parent's when the
parent table is analyzed. Maybe if we find a way to calculate
parent's stats from the partitions' stats without scanning the
partitions, that would be great.

Another thing to consider is that users now (as of v11) have the
option of using partitionwise plans. Consider joining two huge
partitioned tables. If they are identically partitioned, Postgres
planner considers joining pairs of matching partitions and appending
the outputs of these smaller joins. In this case, even if the
non-partitionwise join couldn't use hash join, individual smaller
joins could, because partition stats would be up to date. The
requirements that the tables being joined be identically partitioned
(or be partitioned at all) might be a bit too restrictive though.

> A more ambitious proposal would have updates to the children also
> count against the parent somehow but I'm not sure exactly how. And I'm
> not sure we shouldn't be updating the parent statistics whenever we
> run analyze on a child anyways but again I'm not sure how.

As I mentioned above, we could try to figure out a way to "merge" the
individual partitions' statistics when they're refreshed into the
parent's stats.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/489.1276114285%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Nelson 2019-10-04 03:23:50 Re: Change atoi to strtol in same place
Previous Message Stephen Frost 2019-10-04 01:42:39 Re: Transparent Data Encryption (TDE) and encrypted files