Re: WIP: collect frequency statistics for arrays

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: collect frequency statistics for arrays
Date: 2011-06-13 19:10:36
Message-ID: BANLkTikOowSvYoZWUE8b4uS7JdOZ=A-y4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 13, 2011 at 8:16 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> If the data type is hashable, you could consider building a hash table
> on the MCVs and then do a probe for each element in the array. I
> think that's better than the other way around because there can't be
> more than 10k MCVs, whereas the input constant could be arbitrarily
> long. I'm not entirely sure whether this case is important enough to
> be worth spending a lot of code on, but then again it might not be
> that much code.
>
Unfortunately, most time consuming operation isn't related to elements
comparison. It is caused by complex computations in calc_distr function.

> Another option is to bound the number of operations you're willing to
> perform to some reasonable limit, say, 10 * default_statistics_target.
> Work out ceil((10 * default_statistics_target) /
> number-of-elements-in-const) and consider at most that many MCVs.
> When this limit kicks in you'll get a less-accurate selectivity
> estimate, but that's a reasonable price to pay for not blowing out
> planning time.

Good option. I'm going to add such condition to my patch.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-06-13 19:15:52 Re: SSI patch renumbered existing 2PC resource managers??
Previous Message Jeff Shanab 2011-06-13 19:10:29 Libpq in VS 2010