Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-09-24 02:33:35
Message-ID: CA+TgmoZevF_DAhqbo4j7VhwmEzSZT3wprZviqW4zvao1qgC_wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 3, 2015 at 6:21 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> [ new patches ]

Still more review comments:

+ /* Allow space for terminating zero-byte */
+ size = add_size(size, 1);

This is pointless. The length is already stored separately, and if it
weren't, this wouldn't be adequate anyway because a varlena can
contain NUL bytes. It won't if it's text, but it could be bytea or
numeric or whatever.

RestoreBoundParams is broken, because it can do unaligned reads, which
will core dump on some architectures (and merely be slow on others).
If there are two or more parameters, and the first one is a varlena
with a length that is not a multiple of MAXIMUM_ALIGNOF, the second
SerializedParamExternData will be misaligned.

Also, it's pretty lame that we send the useless pointer even for a
pass-by-reference data type and then overwrite the bad pointer with a
good one a few lines later. It would be better to design the
serialization format so that we don't send the bogus pointer over the
wire in the first place.

It's also problematic in my view that there is so much duplicated code
here. SerializedParamExternData and SerializedParamExecData are very
similar and there are large swaths of very similar code to handle each
case. Both structures contain Datum value, Size length, bool isnull,
and Oid ptype, albeit not in the same order for some reason. The only
difference is that SerializedParamExternData contains uint16 pflags
and SerializedParamExecData contains int paramid. I think we need to
refactor this code to get rid of all this duplication. I suggest that
we decide to represent a datum here in a uniform fashion, perhaps like
this:

First, store a 4-byte header word. If this is -2, the value is NULL
and no data follows. If it's -1, the value is pass-by-value and
sizeof(Datum) bytes follow. If it's >0, the value is
pass-by-reference and the value gives the number of following bytes
that should be copied into a brand-new palloc'd chunk.

Using a format like this, we can serialize and restore datums in
various contexts - including bind and exec params - without having to
rewrite the code each time. For example, for param extern data, you
can dump an array of all the ptypes and paramids and then follow it
with all of the params one after another. For param exec data, you
can dump an array of all the ptypes and paramids and then follow it
with the values one after another. The code that reads and writes the
datums in both cases can be the same. If we need to send datums in
other contexts, we can use the same code for it.

The attached patch - which I even tested! - shows what I have in mind.
It can save and restore the ParamListInfo (bind parameters). I
haven't tried to adapt it to the exec parameters because I don't quite
understand what you are doing there yet, but you can see that the
datum-serialization logic is separated from the stuff that knows about
the details of ParamListInfo, so datumSerialize() should be reusable
for other purposes. This also doesn't have the other problems
mentioned above.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
datum-and-param-serialize.patch application/x-patch 9.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-09-24 02:41:46 Re: No Issue Tracker - Say it Ain't So!
Previous Message Amir Rohan 2015-09-24 02:26:38 Re: Support for N synchronous standby servers - take 2