Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Date: 2016-06-20 07:06:09
Message-ID: CAKJS1f8cVf6ZuKdsJ=MLC-XiagEjXmOksfNOhuwYa8pBiNBOKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 June 2016 at 05:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The situation is much direr as far as serialize/deserialize are concerned.
> These basically don't work at all for polymorphic transtypes: if you try
> to declare "deserialize(bytea) returns anyarray", the type system won't
> let you. Perhaps that's not an issue because you shouldn't really need
> serialize/deserialize for anything except INTERNAL transtype. However,
> there's a far bigger problem which is that "deserialize(bytea) returns
> internal" is a blatant security hole, which I see you ignored the defense
> against in opr_sanity.sql. The claim in the comment there that it's okay
> if we check for aggregate context is a joke --- or haven't you heard that
> we allow users to create aggregates? That's not nearly good enough to
> prevent unsafe usage of such functions. Not to mention that CREATE
> FUNCTION won't allow creation of such functions, so extensions are locked
> out of using this feature.

This is an unfortunate oversight, and it is my fault.

> This has to be redesigned or else reverted entirely. I'm not going to
> take no for an answer.

I don't think anyone is against fixing any of the said issues. I
personally was committed to other non-work related things Friday -
Sunday and I was unable to do anything with this.

> A possible solution is to give deserialize an extra dummy argument, along
> the lines of "deserialize(bytea, internal) returns internal", thereby
> ensuring it can't be called in any non-system-originated contexts. This
> is still rather dangerous if the other argument is variable, as somebody
> might be able to abuse an internal-taking function by naming it as the
> deserialize function for a maliciously-designed aggregate. What I'm
> inclined to do to lock it down further is to drop the "serialtype"
> argument to CREATE AGGREGATE, which seems rather pointless (what else
> would you ever use besides bytea?). Instead, insist that
> serialize/deserialize apply *only* when the transtype is INTERNAL, and
> their signatures are exactly "serialize(internal) returns bytea" and
> "deserialize(bytea, internal) returns internal", never anything else.

This is also the only way that I can think of to fix this issue. If we
can agree that the fix should be to insist that the deserialisation
function take an additional 2nd parameter of INTERNAL, then I can
write a patch to fix this, and include a patch for the document
section 35.10 to explain better about parallelising user defined
aggregates.

> A different way of locking things down, which might be cleaner in the
> long run, is to invent a new pseudo-type for the sole purpose of being
> the serialization type, that is we'd insist on the signatures being
> "serialize(internal) returns serialized_internal" and
> "deserialize(serialized_internal) returns internal", where
> serialized_internal has the representation properties of bytea but is
> usable for no other purpose than this. Not sure it's worth the trouble
> though.

I agree this idea has merit, but seems like quite a large project for
9.6 at this stage. Adding the extra parameter to the deserialiation
function seems more reasonable at this stage, although once we get
that we're likely stuck with it even if we devise a better way to
handle all this later.

> Anyway, setting aside the security angle, it doesn't seem like there is
> anything wrong with the high-level design for polymorphic cases. I'm now
> thinking the problem I saw is just a garden variety implementation bug
> (or bugs). I'm still not very happy with the confusion around aggtype
> though, not least because I suspect it contributed directly to this bug.
> I also feel that both the code comments and the user-facing documentation
> for this feature are well below project standards, eg where's the
> discussion in section 35.10?

Thank you for committing the fix for the originally reported problem
with the polymorphic aggregates.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-06-20 07:17:20 Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Previous Message Satoshi Nagayasu 2016-06-20 07:06:04 Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .