Re: Gsoc2012 idea, tablesample

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Qi Huang" <huangqiyx(at)hotmail(dot)com>,"Florian Pflug" <fgp(at)phlo(dot)org>
Cc: <josh(at)agliodbs(dot)com>,<andres(at)anarazel(dot)de>, <alvherre(at)commandprompt(dot)com>, <ants(at)cybertec(dot)at>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <cbbrowne(at)gmail(dot)com>, <neil(dot)conway(at)gmail(dot)com>, <daniel(at)heroku(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <sfrost(at)snowman(dot)net>
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-05-10 14:28:17
Message-ID: 4FAB8A310200002500047ABA@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> wrote:

> One problem I see with this approach is that its efficiency
> depends on the average tuple length, at least with a naive
> approach to random ctid generator. The simplest way to generate
> those randomly without introducing bias is to generate a random
> page index between 0 and the relation's size in pages,

Right.

> and then generate random tuple index between 0 and
> MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard
> page size of 8k.

I think we can do better than that without moving too far from "the
simplest way". It seems like we should be able to get a more
accurate calculation of a minimum base tuple size based on the table
definition, and calculate the maximum number of those which could
fit on a page. On the other hand, ctid uses a line pointer, doesn't
it? Do we need to worry about dead line pointers allowing higher
tuple indexes than the calculated maximum number of tuples per page?

> The current toasting threshold (TOAST_TUPLE_THRESHOLD) is
> approximately 2k, so having tables with an average heap tuple size
> of a few hundred bytes doesn't seem unlikely. Now, assume the
> average tuple length is 128 bytes, i.e. on average you'll have ~
> 8k/128 = 64 live tuples / page if the fill factor is 100% and all
> tuples are live. To account for lower fill factors and dead
> tuples, let's thus say there are 50 live tuples / page. Then, on
> average, only every 6th randomly generated ctid will point to a
> live tuple. But whether or not it does can only be decided after
> reading the page from disk, so you end up with a rate of 6
> random-access reads per returned tuple.
>
> IIRC, the cutoff point where an index scan loses compared to a
> sequential scan is somewhere around 10% of the table read, i.e. if
> a predicate selects more than 10% of the available rows, a
> sequential scan is more efficient than an index scan.

That ratio *might* be better for a ctid scan, since you don't have
the index access in the mix.

> Scaling that with the 1/6-th success rate from above means that
> Kevin's approach would only beat a sequential scan if the sampling
> percentage isn't much larger than 1%, assuming an average row size
> of 128 bytes.
>
> The algorithm still seems like a good choice for very small
> sampling percentages, though.

Yeah, even with a maximum tuple count calculated by page, there are
certainly going to be cases where another approach will be faster,
especially where the sample is a relatively high percentage of the
table. It would be good to have multiple plans compete on costs, if
possible. It would not surprise me at all if the typical break-even
point between the two techniques was somewhere on the order of a 1%
sample size, but one would hope we could get there on the basis of
estimated costs rather than using arbitrary rules or forcing the
user to guess and code a choice explicitly.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-05-10 14:42:28 Re: Draft release notes complete
Previous Message Alvaro Herrera 2012-05-10 13:20:32 Re: Draft release notes complete