Re: Gsoc2012 idea, tablesample

From: Sandro Santilli <strk(at)keybit(dot)net>
To: Qi Huang <huangqiyx(at)hotmail(dot)com>
Cc: heikki(dot)linnakangas(at)enterprisedb(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, neil(dot)conway(at)gmail(dot)com, daniel(at)heroku(dot)com, cbbrowne(at)gmail(dot)com, kevin(dot)grittner(at)wicourts(dot)gov
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-04-23 13:37:42
Message-ID: 20120423133742.GO26868@gnash
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 21, 2012 at 02:28:52PM +0800, Qi Huang wrote:
>
> Hi, Heikki
...
> > Another idea that Robert Haas suggested was to add support doing a TID
> > scan for a query like "WHERE ctid< '(501,1)'". That's not enough work
> > for GSoC project on its own, but could certainly be a part of it.
>
> the first one and the last one are still not clear.

The last one was the TID scan on filters like ctid < '(501,1)'.
TID "scans" are the fastest access method as they directly access
explicitly referenced addresses. Starting from this observation a sampling
function may select random pages and tuples within pages and directly
access them, optimizing accesses by grouping tuples within the same
page so to fetch them all togheter.

This is what the ANALYZE command already does when providing samples
for the type analyzers.

Unfortunately it looks like at SQL level only the equality operator triggers
a TID scan, so things like "WHERE ctid < '(501,1)'" won't be as fast as
fetching all visible tuples in the first 501 pages.

I think that's what Heikki was referring about.

I'd love to see enhanced CTID operators, to fetch all visible tuples in a page
using a tidscan. Something like: WHERE ctid =~ '(501,*)' or a ctidrange.

--strk;

,------o-.
| __/ | Delivering high quality PostGIS 2.0 !
| / 2.0 | http://strk.keybit.net - http://vizzuality.com
`-o------'

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Krzysztof Nienartowicz 2012-04-23 13:42:40 Namespace of array of user defined types is confused by the parser in insert?
Previous Message Marc Cousin 2012-04-23 13:08:32 Re: [PATCH] lock_timeout and common SIGALRM framework