| From: | Kenichiro Tanaka <kenichirotanakapg(at)gmail(dot)com> |
|---|---|
| To: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
| Cc: | vellaipandiyan sm <vellaipandiyan(dot)sm(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Can we use Statistics Import and Export feature to perforamance testing? |
| Date: | 2026-06-29 23:21:36 |
| Message-ID: | CALyBiZ+oHRJgCSdMQBnEMyzzW1B_Y-2E8GoQAebjoW8JV27dxg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, apologies to jump-in. Bear with me to share some observations.
Please point out if there is any misunderstanding.
Summary:
Regarding the purpose, I think there are 3 concerns about this.
1)"relation sizes" is ambiguous
2)"Index OIDs" might be misleading
3)Histogram bounds feature can reinforce not guaranteeing plan
Possible alternative:
since factors such as actual relation file sizes, the creation order
of indexes, actual data range in indexes, or configuration parameters
may affect planner behavior.
Reasons:
1)"relation sizes" is ambiguous
I think "relation sizes" is ambiguous and could be confused with
pg_class.relpages. I would suggest using "actual relation file sizes"
instead. Would that be clearer?
2)"Index OIDs" might be misleading
I initially misread this as meaning that statistics are restored by
OID, but I think the intent is different. If two indexes have fuzzily
equal costs, the planner may choose between them based on the order
in which they are listed internally, which follows the order of their
OIDs. Since index OIDs are not guaranteed to be the same after
dump/restore, the chosen index could differ between environments even
with identical statistics.
If that is the correct interpretation, I would suggest rephrasing to
something like "the creation order of indexes" to avoid the
misunderstanding that statistics are identified or restored by OID.
3)Histogram bounds feature(get_actual_variable_range())
The planner uses actual data range read from indexes when a WHERE
condition value falls outside the histogram bounds. This is done via
get_actual_variable_range() in selfuncs.c. Since this reads live index
data rather than restored statistics, the same restored statistics can
produce different selectivity estimates if the actual data range in the
target environment differs.
I think this feature can strongly explain that we can not guarantee a plan
to fix statistics.
Any thoughts on this?
Kenichiro Tanaka
2026年6月5日(金) 17:21 Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>:
>
> On Wed, 27 May 2026 05:54:06 +0000
> vellaipandiyan sm <vellaipandiyan(dot)sm(at)gmail(dot)com> wrote:
>
> > I prepared a small documentation follow-up patch adding a cross-reference to the planner statistics documentation section from the statistics manipulation warning.
> >
> > The patch builds cleanly with:
> >
> > `make -C doc/src/sgml html`
> >
> > I will send the patch to the mailing list thread as well.
>
> I may have missed it, but I couldn't find the patch in the thread.
>
> Could you please point me to it?
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
>
>
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Masahiko Sawada | 2026-06-29 23:01:30 | Re: Report bytes and transactions actually sent downtream |