Calculage avg. width when operator = is missing

From: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Calculage avg. width when operator = is missing
Date: 2015-09-22 16:16:46
Message-ID: CACACo5QGXRc5daiUytveu99qfM=xkV=2SxEkZwuqCOqoz1wwaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,

I've recently stumbled upon a problem with table bloat estimation in case
there are columns of type JSON.

The quick bloat estimation queries use sum over pg_statistic.stawidth of
table's columns, but in case of JSON the corresponding entry is never
created by the ANALYZE command due to equality comparison operator
missing. I understand why there is no such operator defined for this
particular type, but shouldn't we still try to produce meaningful average
width estimation?

In my case the actual bloat is around 40% as verified with pgstattuple,
while the bloat reported by quick estimate can be between 75% and 95%(!) in
three instances of this problem. We're talking about some hundreds of GB
of miscalculation.

Attached patch against master makes the std_typanalyze still try to compute
the minimal stats even if there is no "=" operator. Makes sense?

I could also find this report in archives that talks about similar problem,
but due to all values being over the analyze threshold:

http://www.postgresql.org/message-id/flat/12480(dot)1389370514(at)sss(dot)pgh(dot)pa(dot)us#12480(dot)1389370514@sss.pgh.pa.us

I think we could try harder, otherwise any estimate relying on average
width can be way off in such cases.

--
Alex

Attachment Content-Type Size
still-analyze-when-no-eqopr.patch text/x-patch 6.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-09-22 17:15:38 Re: 9.5: Can't connect with PGSSLMODE=require on Windows
Previous Message Alvaro Herrera 2015-09-22 16:04:35 Re: Improving test coverage of extensions with pg_dump