Re: Collect frequency statistics for arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collect frequency statistics for arrays
Date: 2012-03-03 18:04:16
Message-ID: 13539.1330797856@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

... BTW, could you explain exactly how that "Fill histogram by hashtab"
loop works? It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch. I've reworked it to this:

/* Fill histogram by hashtab. */
delta = analyzed_rows - 1;
count_item_index = 0;
frac = sorted_count_items[0]->frequency * (num_hist - 1);
for (i = 0; i < num_hist; i++)
{
while (frac <= 0)
{
count_item_index++;
Assert(count_item_index < count_items_count);
frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1);
}
hist[i] = sorted_count_items[count_item_index]->count;
frac -= delta;
}
Assert(count_item_index == count_items_count - 1);

The asserts don't fire in any test case I've tried, which seems to
indicate that it *does* work in the sense that the first histogram entry
is always the smallest count and the last histogram entry is always
the largest one. But it's extremely unclear why it manages to stop
exactly at the last count_items array entry, or for that matter why it's
generating a representative histogram at all. I'm suspicious that the
"-1" bits represent off-by-one bugs.

I also don't especially like the fact that "frac" is capable of
overflowing (since worst case frequency is 300 * 10000 and worst case
num_hist is 10000, with the current limits on statistics_target).
We could work around that by doing the frac arithmetic in int64, but I
wonder whether that couldn't be avoided. In any case, first I'd like
an explanation why this code works at all.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-03-03 18:37:53 Re: Command Triggers, patch v11
Previous Message Thom Brown 2012-03-03 16:21:33 Re: Command Triggers, patch v11