Re: Gsoc2012 idea, tablesample

From: Qi Huang <huangqiyx(at)hotmail(dot)com>
To: <sfrost(at)snowman(dot)net>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <andres(at)anarazel(dot)de>, <alvherre(at)commandprompt(dot)com>, <neil(dot)conway(at)gmail(dot)com>, <daniel(at)heroku(dot)com>, <cbbrowne(at)gmail(dot)com>, <kevin(dot)grittner(at)wicourts(dot)gov>
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-04-17 15:21:24
Message-ID: BAY159-W2914D3B136B8733DCFD1A7A33F0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > 2. It's not very useful if it's just a dummy replacement for "WHERE
> > random() < ?". It has to be more advanced than that. Quality of the
> > sample is important, as is performance. There was also an
> > interesting idea of on implementing monetary unit sampling.
>
> In reviewing this, I got the impression (perhaps mistaken..), that
> different sampling methods are defined by the SQL standard and that it
> would simply be us to implement them according to what the standard
> requires.
>
> > I think this would be a useful project if those two points are taken
> > care of.
>
> Doing it 'right' certainly isn't going to be simply taking what Neil did
> and updating it, and I understand Tom's concerns about having this be
> more than a hack on seqscan, so I'm a bit nervous that this would turn
> into something bigger than a GSoC project.
>

As Christopher Browne mentioned, for this sampling method, it is not possible without scanning the whole data set. It improves the sampling quality but increases the sampling cost. I think it should also be using only for some special sampling types, not for general. The general sampling methods, as in the SQL standard, should have only SYSTEM and BERNOULLI methods.

Best Regards and ThanksHuang Qi VictorComputer Science of National University of Singapore

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-04-17 15:27:16 Re: Gsoc2012 idea, tablesample
Previous Message Greg Stark 2012-04-17 15:16:15 Re: Gsoc2012 idea, tablesample