From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: TABLESAMPLE patch |
Date: | 2014-12-10 23:29:37 |
Message-ID: | 5488D761.3090506@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/12/14 00:24, Petr Jelinek wrote:
> Hello,
>
> Attached is a basic implementation of TABLESAMPLE clause. It's SQL
> standard clause and couple of people tried to submit it before so I
> think I don't need to explain in length what it does - basically returns
> "random" sample of a table using a specified sampling method.
>
> I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
> standard. The SYSTEM sampling does block level sampling using same
> algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
>
> There is API for sampling methods which consists of 4 functions at the
> moment - init, end, nextblock and nexttuple. I added catalog which maps
> the sampling method to the functions implementing this API. The grammar
> creates new TableSampleRange struct that I added for sampling. Parser
> then uses the catalog to load information about the sampling method into
> TableSampleClause which is then attached to RTE. Planner checks for if
> this parameter is present in the RTE and if it finds it it will create
> plan with just one path - SampleScan. SampleScan implements standard
> executor API and calls the sampling method API as needed.
>
> It is possible to write custom sampling methods. The sampling method
> parameters are not limited to just percent number as in standard but
> dynamic list of expressions which is checked against the definition of
> the init function in a similar fashion (although much simplified) as
> function calls are.
>
> Notable lacking parts are:
> - proper costing and returned row count estimation - given the dynamic
> nature of parameters I think for we'll need to let the sampling method
> do this, so there will have to be fifth function in the API.
> - ruleutils support (it needs a bit of code in get_from_clause_item
> function)
> - docs are sparse at the moment
>
Forgot the obligatory:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under grant
agreement n° 318633.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-12-10 23:46:35 | Re: Casting issues with domains |
Previous Message | Petr Jelinek | 2014-12-10 23:24:49 | TABLESAMPLE patch |