Re: Collect frequency statistics for arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collect frequency statistics for arrays
Date: 2012-02-29 21:09:51
Message-ID: 7833.1330549791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I am starting to look at this patch now. I'm wondering exactly why the
>> decision was made to continue storing btree-style statistics for arrays,

> Probably, btree statistics really does matter for some sort of arrays? For
> example, arrays representing paths in the tree. We could request a subtree
> in a range query on such arrays.

That seems like a pretty narrow, uncommon use-case. Also, to get
accurate stats for such queries that way, you'd need really enormous
histograms. I doubt that the existing parameters for histogram size
will permit meaningful estimation of more than the first array entry
(since we don't make the histogram any larger than we do for a scalar
column).

The real point here is that the fact that we're storing btree-style
stats for arrays is an accident, backed into by having added btree
comparators for arrays plus analyze.c's habit of applying default
scalar-oriented analysis functions to any type without an explicit
typanalyze entry. I don't recall that we ever thought hard about
it or showed that those stats were worth anything.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-02-29 21:19:03 Re: Collect frequency statistics for arrays
Previous Message Tom Lane 2012-02-29 20:59:35 Re: controlling the location of server-side SSL files