Safer and faster get_attstatsslot()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Safer and faster get_attstatsslot()
Date: 2017-05-11 16:41:02
Message-ID: 16364.1494520862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Monday's round of security patches was a lot more exciting than I would
have liked, because code that worked fine for Peter and me failed
erratically in the buildfarm. What eventually emerged was that I'd added
some missing free_attstatsslot() calls in rangetypes_selfuncs.c, and
naively copied the first argument (atttype) from the matching
get_attstatsslot() calls. One of those atttype values was in fact
wrong for the slot in question; this had been missed for years because
get_attstatsslot() doesn't actually do anything with that argument.
I think that at one point we had, or at least in the original conception
intended to have, an Assert that the atttype matched the actual stats
array element type found in the pg_statistic row; but we had to remove
it because in some cases the type in pg_statistic is only
binary-compatible with the datatype the applied operator is expecting.

So the existing API for get_attstatsslot()/free_attstatsslot() is just
seriously bug-prone. It would be better if the caller did not have
to supply any type information; indeed really what we'd want is for
get_attstatsslot() to pass back the actual type it found in pg_statistic.

I also realized as I looked at the code that it's exceedingly inefficient
if the array element type is pass-by-reference --- then it'll incur a
separate palloc, copy, and pfree for each element. We'd be a lot better
off to copy the stats array as a whole, especially since that would come
for free in the probably-common case that the array has to be detoasted.
This code was written with very small stats targets in mind, like about
10, and it just doesn't look very good when you're imagining 1000 or more
entries in the stats array.

So attached is a proposed redesign that makes the API for
get_attstatsslot()/free_attstatsslot() simpler and hopefully more
foolproof. I've not made any particular attempt to performance-test
it, but it really ought to be a significant win for pass-by-ref element
types. It will add an array-copying step that wasn't there before when
the element type is pass-by-value and the array isn't toasted, but that
seems like an acceptable price.

BTW, this patch makes the skewColType and skewColTypmod fields of Hash
plan nodes unnecessary, as the only reason for them was to satisfy
get_attstatsslot(). I didn't remove them here but it would make sense
to do so.

Comments? Is this something that'd be OK to push now, or had I better
sit on it till v11? Being freshly burned, I kinda want to fix it now,
but I recognize that my judgment may be colored by that.

regards, tom lane

Attachment Content-Type Size
fix-get-free-attstatsslot-api-1.patch text/x-diff 93.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-05-11 16:45:09 Re: eval_const_expresisions and ScalarArrayOpExpr
Previous Message Heikki Linnakangas 2017-05-11 16:20:55 Re: eval_const_expresisions and ScalarArrayOpExpr