Re: gistchoose vs. bloat

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 21:07:36
Message-ID: 25278.1359061656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> BTW, one thing that I wondered about this: How expensive is random()?
> I'm assuming not very, but I don't really know. Alexander's patch called
> random() for every tuple on the page, while I call it only once for each
> equal-penalty tuple. If it's really cheap, I think my/Tom's patch could
> be slightly simplified by always initializing keep_current_best with
> random() at top of the function, and eliminating the -1 "haven't decided
> yet" state.

I thought about that too, and concluded that random() is probably
expensive enough that we don't want to call it unnecessarily.

> OTOH, if random() is expensive, I note that we only need one
> bit of randomness, so we could avoid calling random() so often if we
> utilized all the bits from each random() call.

Meh. That would hard-wire the decision that the probability of keeping
a best tuple is exactly 0.5. I'd rather keep the flexibility to tune it
later. The way your patch is set up, it seems unlikely that it will
call random() very many times per page, so I'm not that concerned about
minimizing the number of calls further. (Also, in the probably-common
case that there are no exactly equally good alternatives, this would
actually be a net loss since it would add one unnecessary call.)

So basically, Alexander's method requires more random() calls and fewer
penalty() calls than yours, at least in typical cases. It's hard to say
which is faster without benchmarking, and it would matter which opclass
you were testing on.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-24 21:51:04 Re: [HACKERS] BUG #6572: The example of SPI_execute is bogus
Previous Message Andrew Dunstan 2013-01-24 21:04:04 Re: BUG #6510: A simple prompt is displayed using wrong charset