Re: Gsoc2012 idea, tablesample

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Qi Huang <huangqiyx(at)hotmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, neil conway <neil(dot)conway(at)gmail(dot)com>, daniel(at)heroku(dot)com, cbbrowne(at)gmail(dot)com, kevin grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-04-17 13:49:49
Message-ID: 20120417134949.GR1267@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as
> the API, instead of extra SQL syntax.

Err, I missed that, and don't particularly agree with it.. Is there a
serious issue with the grammar defined in the SQL standard? The other
DBs which provide this- do they use the SQL grammar or something else?

I'm not sure that I particularly *like* the SQL grammar, but if we're
going to implement this, we should really do it 'right'.

> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an
> interesting idea of on implementing monetary unit sampling.

In reviewing this, I got the impression (perhaps mistaken..), that
different sampling methods are defined by the SQL standard and that it
would simply be us to implement them according to what the standard
requires.

> I think this would be a useful project if those two points are taken
> care of.

Doing it 'right' certainly isn't going to be simply taking what Neil did
and updating it, and I understand Tom's concerns about having this be
more than a hack on seqscan, so I'm a bit nervous that this would turn
into something bigger than a GSoC project.

> Another idea that Robert Haas suggested was to add support doing a
> TID scan for a query like "WHERE ctid< '(501,1)'". That's not
> enough work for GSoC project on its own, but could certainly be a
> part of it.

I don't think Robert's suggestion would be part of a 'tablesample'
patch. Perhaps a completely different project which was geared towards
allowing hidden columns to be used in various ways in a WHERE clause..
Of course, we'd need someone to actually define that; I don't think
someone relatively new to the project is going to know what experienced
hackers want to do with system columns.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Shulgin 2012-04-17 14:04:49 Re: Last gasp
Previous Message Heikki Linnakangas 2012-04-17 13:49:30 Re: Gsoc2012 idea, tablesample