From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mark Frost <FROSTMAR(at)uk(dot)ibm(dot)com> |
Cc: | "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column |
Date: | 2025-07-18 21:54:05 |
Message-ID: | 2194366.1752875645@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I wrote:
> Well, we don't have a most common element in this scenario --- the
> whole point is that the occurrence counts resulting from the lossy
> counting algorithm are too low to be trustworthy. However, what we
> do have is the cutoff frequency, and it seems to me that we could use
> that as the estimate of the maximum frequency of the non-MCEs.
Here's a less crude patch for that. The array_typanalyze.c changes
are the same as before, but I reconsidered what to do about this
stumbling block:
> Assuming we like this direction, the main thing that makes this a hack
> not a polished patch is that I had to strongarm the code into storing
> a zero-length values array. What update_attstats would normally do
> is leave the values column of the MCELEM stats slot NULL, which then
> causes get_attstatsslot to throw a that-shouldn't-be-null error.
> An alternative answer is to change get_attstatsslot to allow a null,
> but I'm not sure that that's any cleaner. Either way it seems like
> there's a hazard of breaking some code that isn't expecting the case.
I concluded that letting get_attstatsslot accept nulls is too risky;
there is probably code that assumes that get_attstatsslot would've
rejected that. Instead, this version changes update_attstats' rule
for when to store an array rather than a null. Now it will do so
if the passed-in stavalues pointer isn't null, even if numvalues
is zero. I think that this doesn't change the outcome for any
existing analyze routines because they wouldn't pass a data pointer
if they have no values; and even if it does, storing an empty array
not a null seems like it should be pretty harmless.
> An alternative that feels cleaner but a good bit more invasive
> is to get rid of the convention of storing the min/max/nulls
> frequencies as extra entries in the MCELEM numbers entry ---
> which surely is a hack too --- and put them into some new slot
> type. I'm not sure if that's enough nicer to be worth the
> conversion pain.
A year ago I would have seriously considered doing it that way.
But now that we have code to dump-n-restore stats, that code would
have to be adjusted to convert the old representation. It's not
worth it for this case.
Hence, v1 attached, now with a commit message.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Track-the-maximum-possible-frequency-of-non-MCE-a.patch | text/x-diff | 4.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Xuan Chen | 2025-07-19 23:03:14 | Question: Is it valid for a parent node's total cost to be lower than a child's total cost in EXPLAIN? |
Previous Message | Jerry Brenner | 2025-07-18 13:20:03 | Re: Is there a way to identify a plan generated by GECO? |