Re: Gsoc2012 idea, tablesample

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: <josh(at)agliodbs(dot)com>,<andres(at)anarazel(dot)de>, <alvherre(at)commandprompt(dot)com>, <ants(at)cybertec(dot)at>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <cbbrowne(at)gmail(dot)com>, <neil(dot)conway(at)gmail(dot)com>, <daniel(at)heroku(dot)com>, <huangqiyx(at)hotmail(dot)com>, "Florian Pflug" <fgp(at)phlo(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, <sfrost(at)snowman(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-05-11 15:36:16
Message-ID: 4FACEBA00200002500047BA4@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> The trouble is, AFAICS, that you can't bound M very well without
> scanning the whole table. I mean, it's bounded by theoretical
> limit, but that's it.

What would the theoretical limit be? (black size - page header size
- minimum size of one tuple) / item pointer size? So, on an 8KB
page, somewhere in the neighborhood of 1350? Hmm. If that's right,
that would mean a 1% random sample would need 13.5 probes per page,
meaning there wouldn't tend to be a lot of pages missed. Still, the
technique for getting a random sample seems sound, unless someone
suggests something better. Maybe we just want to go straight to a
seqscan to get to the pages we want to probe rather than reading
just the ones on the "probe list" in physical order?

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-11 15:42:21 Re: Gsoc2012 idea, tablesample
Previous Message Tom Lane 2012-05-11 15:35:28 Re: Gsoc2012 idea, tablesample