Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch
Date: 2014-12-10 23:29:37
Message-ID: 5488D761.3090506@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/12/14 00:24, Petr Jelinek wrote:
> Hello,
>
> Attached is a basic implementation of TABLESAMPLE clause. It's SQL
> standard clause and couple of people tried to submit it before so I
> think I don't need to explain in length what it does - basically returns
> "random" sample of a table using a specified sampling method.
>
> I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
> standard. The SYSTEM sampling does block level sampling using same
> algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
>
> There is API for sampling methods which consists of 4 functions at the
> moment - init, end, nextblock and nexttuple. I added catalog which maps
> the sampling method to the functions implementing this API. The grammar
> creates new TableSampleRange struct that I added for sampling. Parser
> then uses the catalog to load information about the sampling method into
> TableSampleClause which is then attached to RTE. Planner checks for if
> this parameter is present in the RTE and if it finds it it will create
> plan with just one path - SampleScan. SampleScan implements standard
> executor API and calls the sampling method API as needed.
>
> It is possible to write custom sampling methods. The sampling method
> parameters are not limited to just percent number as in standard but
> dynamic list of expressions which is checked against the definition of
> the init function in a similar fashion (although much simplified) as
> function calls are.
>
> Notable lacking parts are:
> - proper costing and returned row count estimation - given the dynamic
> nature of parameters I think for we'll need to let the sampling method
> do this, so there will have to be fifth function in the API.
> - ruleutils support (it needs a bit of code in get_from_clause_item
> function)
> - docs are sparse at the moment
>

Forgot the obligatory:

The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under grant
agreement n° 318633.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-10 23:46:35 Re: Casting issues with domains
Previous Message Petr Jelinek 2014-12-10 23:24:49 TABLESAMPLE patch