| From: | Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | "samimseih(at)gmail(dot)com" <samimseih(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [PATCH] Add sampling statistics to autoanalyze log output |
| Date: | 2026-01-19 16:33:21 |
| Message-ID: | CAHza6qfeiPDNZmb-T+gUYm6ZWQhZWFSf64YxMjQr7ZfPuF38cQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Fujii-san, Chao-san,
Sorry for the late reply.
I have addressed your comments in the attached v4 patch.
> child_sampling_stats is only used inside if (childblocks > 0), so it
would be better to just define it there.
Fixed.
> SamplingStats is a very generic name, maybe rename to
AnalyzeSamplingStats or something else.
Fixed. I changed the struct name.
> > With this change, we would end up reporting nearly the same sampling
> > information twice, for example:
> >
> > INFO: analyzing "public.ft"
> > INFO: "ft": table contains 10 rows, 10 rows in sample
> > INFO: finished analyzing table "postgres.public.ft"
> > sampling: 10 rows in sample, 10 estimated total rows
> >
> > Wouldn't it be less confusing to avoid reporting the second sampling
line?
>
> I agree, that would be better. I'll fix this.
Fixed. I removed the duplicated messages.
> > I'm not sure it's acceptable to change the FDW API and require
> > FDW authors to update their extensions, especially since
> > the benefit on the FDW side seems limited at this point.
>
> I agree that it's premature to require FDW-side changes at this stage.
I'll remove those modifications.
Fixed. I reverted the FDW API changes, so no modifications are required for
the FDW API.
> > You may need to update src/tools/pgindent/typedefs.list.
>
> Thank you! I'll check and update it.
Done. Thank you for pointing this out!
> > Based on my testing, the aggregated values look incorrect.
> > In the example below, both t_0 and t_1 report 5,000,000 live rows,
> > but the aggregated result is 6,779,052. Is that the expected
> > aggregation behavior?
>
> I had misunderstood the sampling behavior for inheritance tables. I
initially thought it would be straightforward to display the sum of child
table values for the parent table. However, the parent table's sampling
logic proportionally distributes samples from inherited child tables based
on their relative block counts, and stores that as the parent's statistics.
Therefore, simply changing the log output to show summed values would imply
a change in the sampling methodology itself. While such a modification
might be possible in the future, it would deviate from the original purpose
of this patch, which is to align the log output between ANALYZE VERBOSE and
autoanalyze. For now, I'll exclude this from the current patch.
I have excluded this modification from v4 patch. With v4, the log output
now shows per-child sampling statistics for inheritance trees and
partitioned tables, which matches the current behavior.
I also conducted several patterns of tests to ensure new log output matches
current output. I have attached a SETUP SQL script(setup_tables.sql) and
test results(CURRENT:verboselog_current.log, APPLIED
PATCH:verboselog_after_patch.log) demonstrating that the ANALYZE VERBOSE
output which is the same as the autoanalyze log format.
Regards,
Tatsuya Kawata
| Attachment | Content-Type | Size |
|---|---|---|
| verboselog_after_patch.log | application/octet-stream | 10.4 KB |
| verboselog_current.log | application/octet-stream | 10.5 KB |
| setup_tables.sql | application/octet-stream | 4.7 KB |
| v4-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch | application/octet-stream | 12.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Rahila Syed | 2026-01-19 17:31:17 | Re: Enhancing Memory Context Statistics Reporting |
| Previous Message | Antonin Houska | 2026-01-19 16:31:19 | Re: Adding REPACK [concurrently] |