Quick Links

Re: Gsoc2012 idea, tablesample

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Qi Huang <huangqiyx(at)hotmail(dot)com>
Cc:	<ants(at)cybertec(dot)at>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <andres(at)anarazel(dot)de>, <alvherre(at)commandprompt(dot)com>, <neil(dot)conway(at)gmail(dot)com>, <daniel(at)heroku(dot)com>, <cbbrowne(at)gmail(dot)com>, <kevin(dot)grittner(at)wicourts(dot)gov>, <sfrost(at)snowman(dot)net>
Subject:	Re: Gsoc2012 idea, tablesample
Date:	2012-05-10 11:43:01
Message-ID:	798891F2-2E56-41CB-89A3-5BF5F332F77C@phlo.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On May10, 2012, at 10:43 , Qi Huang wrote:
> 2. use TIDSCAN to directly access tuples. The below way of using ctid proposed by Kevin looks good.
>
> -One technique which might be suitably random without reading the
> -whole table would be to figure out a maximum block number and tuple
> -ID for the table, and generate a series of random ctid values to
> -read. If the tuple doesn't exist or is not visible to the snapshot,
> -you ignore it and continue, until you have read the requisite number
> -of rows. You could try to generate them in advance and sort them by
> -block number, but then you need to solve the problems of what to do
> -if that set of ctids yields too many rows or too few rows, both of
> -which have sticky issues.

> I think this technique could be considered as an implementation algo for BERNOULLI method. It looks that it could still reduce a lot of cost compared to just assign random number to every tuple and then retrieve.

One problem I see with this approach is that its efficiency depends on the average tuple length, at least with a naive approach to random ctid generator. The simplest way to generate those randomly without introducing bias is to generate a random page index between 0 and the relation's size in pages, and then generate random tuple index between 0 and MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard page size of 8k.

The current toasting threshold (TOAST_TUPLE_THRESHOLD) is approximately 2k, so having tables with an average heap tuple size of a few hundred bytes doesn't seem unlikely. Now, assume the average tuple length is 128 bytes, i.e. on average you'll have ~ 8k/128 = 64 live tuples / page if the fill factor is 100% and all tuples are live. To account for lower fill factors and dead tuples, let's thus say there are 50 live tuples / page. Then, on average, only every 6th randomly generated ctid will point to a live tuple. But whether or not it does can only be decided after reading the page from disk, so you end up with a rate of 6 random-access reads per returned tuple.

IIRC, the cutoff point where an index scan loses compared to a sequential scan is somewhere around 10% of the table read, i.e. if a predicate selects more than 10% of the available rows, a sequential scan is more efficient than an index scan. Scaling that with the 1/6-th success rate from above means that Kevin's approach would only beat a sequential scan if the sampling percentage isn't much larger than 1%, assuming an average row size of 128 bytes.

The algorithm still seems like a good choice for very small sampling percentages, though.

best regards,
Florian Pflug

In response to

Re: Gsoc2012 idea, tablesample at 2012-05-10 08:43:10 from Qi Huang

Responses

Re: Gsoc2012 idea, tablesample at 2012-05-10 14:28:17 from Kevin Grittner

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	MauMau	2012-05-10 11:45:31	Re: Can pg_trgm handle non-alphanumeric characters?
Previous Message	Andrew Dunstan	2012-05-10 11:25:35	Re: Draft release notes complete