Re: TABLESAMPLE patch is really in pretty sad shape

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch is really in pretty sad shape
Date: 2015-07-23 00:01:39
Message-ID: 15899.1437609699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> We could alternatively provide two scan-initialization callbacks,
> one that analyzes the parameters before we do heap_beginscan,
> and another that can do additional setup afterwards. Actually,
> that second call would really not be meaningfully different from
> the ReScan call, so another solution would be to just automatically
> invoke ReScan after we've created the HeapScanDesc. TSMs could work
> around not having this by complicating their NextBlock function a bit
> (so that it would do some initialization on first call) but perhaps
> it would be easier to have the additional init call.

Actually, there's another reason why this has to be redesigned: evaluating
the parameter expressions of TABLESAMPLE during ExecInitSampleScan is
not OK. Consider this:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select count(*) from tenk1 tablesample bernoulli (pct)) ss;
pct | count
-----+-------
0 | 0
100 | 0
(2 rows)

Surely the second execution of the subquery should've given me 10000.
It doesn't because the TABLESAMPLE parameters aren't recomputed during a
rescan. Even if you're minded to claim that that's all right, this is
definitely not acceptable:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select * from tenk1 tablesample bernoulli (pct)) ss;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

That one's crashing because "pct" is a Var that refers to an outer scan
tuple that doesn't exist yet. I'm not real sure how come the case with
an aggregate manages not to crash, actually.

This needs to work more like LIMIT, which doesn't try to compute the
limit parameters until the first fetch. So what we need is an Init
function that does very darn little indeed (maybe we don't even need
it at all), and then a ParamInspect function that is called at first fetch
or during a ReScan, and that one is the one that gets to look at the
evaluated parameter values.

If we wanted to let the method inspect the HeapScanDesc during setup
it would need still a third callback. I'm not exactly sure if that's
worth the trouble.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-07-23 00:06:11 Re: ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard
Previous Message Heikki Linnakangas 2015-07-22 22:31:55 Re: could not truncate directory "pg_subtrans": apparent wraparound