|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>|
|Cc:||David Gould <daveg(at)sonic(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>|
|Subject:||Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
> On 01.03.2018 06:23, David Gould wrote:
>> In theory the sample pages analyze uses should represent the whole table
>> fairly well. We rely on this to generate pg_statistic and it is a key
>> input to the planner. Why should we not believe in it as much only for
>> reltuples? If the analyze sampling does not work, the fix would be to improve
>> that, not to disregard it piecemeal.
> Well, that sounds reasonable. But the problem with the moving average
> calculation remains. Suppose you run vacuum and not analyze. If the
> updates are random enough, vacuum won't be able to reclaim all the
> pages, so the number of pages will grow. Again, we'll have the same
> thing where the number of pages grows, the real number of live tuples
> stays constant, and the estimated reltuples grows after each vacuum run.
You claimed that before, with no more evidence than this time, and I still
don't follow your argument. The number of pages may indeed bloat but the
number of live tuples per page will fall. Ideally, at least, the estimate
would remain on-target. If it doesn't, there's some other effect that
you haven't explained. It doesn't seem to me that the use of a moving
average would prevent that from happening. What it *would* do is smooth
out errors from the inevitable sampling bias in any one vacuum or analyze
run, and that seems like a good thing.
> I did some more calculations on paper to try to understand this. If we
> average reltuples directly, instead of averaging tuple density, it
> converges like it should. The error with this density calculation seems
> to be that we're effectively multiplying the old density by the new
> number of pages. I'm not sure why we even work with tuple density. We
> could just estimate the number of tuples based on analyze/vacuum, and
> then apply moving average to it. The calculations would be shorter, too.
> What do you think?
I think you're reinventing the way we used to do it. Perhaps consulting
the git history in the vicinity of this code would be enlightening.
regards, tom lane
|Next Message||Magnus Hagander||2018-03-01 15:18:48||Re: Online enabling of checksums|
|Previous Message||Amit Kapila||2018-03-01 15:05:12||Re: zheap: a new storage format for PostgreSQL|