Re: Postgresql network transmission overhead

From: "Michael J(dot) Baars" <mjbaars1977(dot)pgsql-hackers(at)cyberfiber(dot)eu>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Postgresql network transmission overhead
Date: 2021-02-26 06:41:15
Message-ID: af82af191e41bfd4ec5a29371a38ae83dbf188ac.camel@cyberfiber.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2021-02-24 at 19:18 -0600, Justin Pryzby wrote:
> On Wed, Feb 24, 2021 at 09:14:19AM +0100, Michael J. Baars wrote:
> > I've written this function to insert several rows at once, and noticed a certain postgresql overhead as you can see from the log file. A lot more data than
> > the
> > user data is actually sent over the net. This has a certain noticeable impact on the user transmission speed.
> >
> > I noticed that a libpq query always has a number of arguments of the following form:
> >
> > Oid paramt[cols] = { 1082, 701, 701, 701, 701, 701, 20, 701 };
> > int paraml[cols] = { 4, 8, 8, 8, 8, 8, 8, 8 };
> > int paramf[cols] = { 1, 1, 1, 1, 1, 1, 1, 1 };
> >
> > result = PQexecParams(psql_cnn, (char* ) &statement, 1, paramt, (const char** ) paramv, paraml, paramf, 1);
> >
> > I think the 'paramf' is completely redundant. The data mode, text or binary, is already specified in the last argument to 'PQexecParams' and does not have
> > to be
> > repeated for every value. Am I correct?
>
> The last argument is the *result* format.
> The array is for the format of the *input* bind parameters.
>

Yes, but we are reading from and writing to the same table here. Why specify different formats for the input and the output exactly?

Is this vector being sent over the network?

In the logfile you can see that the effective user data being written is only 913kb, while the actual being transmitted over the network is 7946kb when writing
one row at a time. That is an overhead of 770%!

> Regarding the redundancy:
>
> https://www.postgresql.org/docs/current/libpq-exec.html
> > nParams
> > The number of parameters supplied; it is the length of the arrays paramTypes[], paramValues[], paramLengths[], and paramFormats[]. (The array pointers
> > can be NULL when nParams is zero.)
> > paramTypes[]
> > Specifies, by OID, the data types to be assigned to the parameter symbols. If paramTypes is NULL, or any particular element in the array is zero, the
> > server infers a data type for the parameter symbol in the same way it would do for an untyped literal string.
> > paramValues[]
> > ...
> > paramLengths[]
> > Specifies the actual data lengths of binary-format parameters. It is ignored for null parameters and text-format parameters. The array pointer can be
> > null when there are no binary parameters.
> > paramFormats[]
> > Specifies whether parameters are text (put a zero in the array entry for the corresponding parameter) or binary (put a one in the array entry for the
> > corresponding parameter). If the array pointer is null then all parameters are presumed to be text strings.
>
> nParams specifies the length of the arrays: if you pass an array of length
> greater than nParams, then the rest of the array is being ignored.
>
> You don't *have* to specify Types, and Lengths and Formats can be specified as
> NULL for text format params.
>
> > semi-prepared
>
> What does semi-prepared mean ?
>

I'm writing a total of 4096+ rows of each n columns to this table. Some preparation is in order such that the timer can be placed around the actual network
transmission only, preparing the statement, i.e. the string of input arguments and the input data structures, is being done before the timer is started.

Also I noticed that when more columns are used, prepared statements with too many rows * columns cannot be loaded into postgresql, probably because the size of
the prepared statements are limited to a certain size in memory. It does not return an error of the sorts from PQprepare, only during execution of the
statement.

Thanks,
Mischa.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-26 06:49:48 Add --tablespace option to reindexdb
Previous Message Greg Nancarrow 2021-02-26 06:35:14 Re: Parallel INSERT (INTO ... SELECT ...)