Re: pg_upgrade and statistics

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade and statistics
Date: 2012-03-13 07:35:14
Message-ID: CAAZKuFa0KUwWVjjFpWLyme07uUJ=h4zswM3wK2fzFK0A=qZs9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 12, 2012 at 9:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Copying the statistics from the old server is on the pg_upgrade TODO
>> list.  I have avoided it because it will add an additional requirement
>> that will make pg_upgrade more fragile in case of major version changes.
>
>> Does anyone have a sense of how often we change the statistics data
>> between major versions?
>
> I don't think pg_statistic is inherently any more stable than any other
> system catalog.

Agreed, but it would appear that in practice that a fair amount of it
carries forward. If someone ripped up the statistics system and did
them all over in such a way that the old fields had no meaning on
future costing metrics, that'd probably be reasonable cause for a
caveat involving full-blown reanalyze...still, that doesn't seem to
happen every year.

> We've whacked it around significantly just last week,
> which might color my perception a bit, but there are other changes on
> the to-do list.  (For one example, see nearby complaints about
> estimating TOAST-related costs, which we could not fix without adding
> more stats data.)

Is accruing additional statistics likely going to be a big problem? I
noticed the addition of the new anyarray (presumably for
array-selectivity) features; would planning with an "empty" assumption
be disastrous vs. the old behavior, which had no concept of those at
all?

I don't think it's necessary to make statistics porting a feature of
pg_upgrade in all circumstances, but it would be "nice" when possible.
That having been said, perhaps there are other ways for pg_upgrade to
be better invested in....or, best of all and somewhat unrelatedly,
full blown logical replication.

Although this conversation has taken focus on "how do we move stats
forward", I am about as interested in "how do I run statements (like
ANALYZE) more 'nicely'". The same general problem pervades many
background task issues, including autovacuum and large physical
reorganizations of data.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-03-13 08:42:03 Re: initdb and fsync
Previous Message Daniel Farina 2012-03-13 07:33:09 Re: pg_upgrade and statistics