Re: Gsoc2012 idea, tablesample

From: Sandro Santilli <strk(at)keybit(dot)net>
To: Ants Aasma <ants(at)cybertec(dot)at>, Qi Huang <huangqiyx(at)hotmail(dot)com>, heikki(dot)linnakangas(at)enterprisedb(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, neil(dot)conway(at)gmail(dot)com, daniel(at)heroku(dot)com, cbbrowne(at)gmail(dot)com, kevin(dot)grittner(at)wicourts(dot)gov
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-04-24 07:31:36
Message-ID: 20120424073136.GF7891@gnash
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 24, 2012 at 08:49:26AM +0200, Sandro Santilli wrote:
> On Mon, Apr 23, 2012 at 08:34:44PM +0300, Ants Aasma wrote:

> > SELECT (SELECT reservoir_sample(some_table, 50) AS samples
> > FROM some_table WHERE ctid =~ ANY (rnd_pgtids))
> > FROM random_pages('some_table', 50) AS rnd_pgtids;
>
> But I don't understand the reservoir_sample call, what is it supposed to do ?

Ok got it, that was probably to avoid:

ERROR: more than one row returned by a subquery used as an expression

But this also works nicely:

SELECT * FROM lots_of_points
WHERE ctid = ANY ( ARRAY[(SELECT random_tids('lots_of_points', 100000))] );

and still uses tidscan.

The advanced TID operator would be for random_tids to only return pages rather
than full tids...

--strk;

,------o-.
| __/ | Delivering high quality PostGIS 2.0 !
| / 2.0 | http://strk.keybit.net - http://vizzuality.com
`-o------'

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2012-04-24 07:59:18 PL/PGSQL bug in handling composite types
Previous Message Nikhil Sontakke 2012-04-24 07:12:56 Re: B-tree page deletion boundary cases