| From: | Tender Wang <tndrwang(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq |
| Date: | 2026-02-27 23:57:22 |
| Message-ID: | CAHewXNknzZ0+trb0RzvTBpzr9R+42W0xLYzDDsueJ11N1mbqNg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 于2026年2月28日周六 03:15写道:
>
> Tender Wang <tndrwang(at)gmail(dot)com> writes:
> > I added Tom to the cc list. He may know more about this.
>
> Hmm, git blame says I originated this function 25 years ago
> (f905d65ee), but I don't claim to remember that.
>
> Looking at it now, though, I think that bd3e3e9e5 is indeed
> wrong but not in the way Joel suggests. The longstanding
> way to compute mcv_freq is
>
> /*
> * The first MCV stat is for the most common value.
> */
> if (sslot.nnumbers > 0)
> *mcv_freq = sslot.numbers[0];
>
> *This number is a fraction measured on the raw relation.*
> (Necessarily so, because it's just a number computed by ANALYZE.)
> Then bd3e3e9e5 added
>
> /*
> * If there are no recorded MCVs, but we do have a histogram, then
> * assume that ANALYZE determined that the column is unique.
> */
> if (vardata.rel && vardata.rel->rows > 0)
> *mcv_freq = 1.0 / vardata.rel->rows;
>
> This is a pure thinko. rel->rows is the estimated number
> of filtered rows. What I should have used is rel->tuples,
> which is the estimated raw relation size, so as to get a
> number that is commensurate with the longstanding way
> of calculating mcv_freq. Then that also matches up with
> computing avgfreq on the raw relation.
>
> So I think the correct fix is basically
>
> - if (vardata.rel && vardata.rel->rows > 0)
> - *mcv_freq = 1.0 / vardata.rel->rows;
> + if (vardata.rel && vardata.rel->tuples > 0)
> + *mcv_freq = 1.0 / vardata.rel->tuples;
>
Yeah, in my last email, I said I tried this way. But I worried that
rel->tuples may be zero for an empty relation.
> and I wonder if that will wind up in reverting a lot of the plan
> choice changes seen in bd3e3e9e5.
Yes, a lot plan diff in partition_join.sql.
--
Thanks,
Tender Wang
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-02-28 00:12:03 | Re: doc: Clarify that empty COMMENT string removes the comment |
| Previous Message | Melanie Plageman | 2026-02-27 23:56:48 | Re: All-visible pages with valid prune xid are confusing |