Re: Why is pq_begintypsend so slow?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jack Christensen <jack(at)jncsoftware(dot)com>, David Fetter <david(at)fetter(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is pq_begintypsend so slow?
Date: 2020-06-03 15:30:42
Message-ID: CA+TgmoaQ4ga3cQLsKEmReHFHxBeH8_+rfgGXhA5speccKzJ-0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 2, 2020 at 9:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> The biggest problem after that is that we waste a lot of time memcpying
> stuff around repeatedly. There is:
> 1) send function: datum -> per datum stringinfo
> 2) printtup: per datum stringinfo -> per row stringinfo
> 3) socket_putmessage: per row stringinfo -> PqSendBuffer
> 4) send(): PqSendBuffer -> kernel buffer
>
> It's obviously hard to avoid 1) and 4) in the common case, but the
> number of other copies seem pretty clearly excessive.

I too have seen recent benchmarking data where this was a big problem.
Basically, you need a workload where the server doesn't have much or
any actual query processing to do, but is just returning a lot of
stuff to a really fast client - e.g. a locally connected client.
That's not necessarily the most common case but, if you have it, all
this extra copying is really pretty expensive.

My first thought was to wonder about changing all of our send/output
functions to write into a buffer passed as an argument rather than
returning something which we then have to copy into a different
buffer, but that would be a somewhat painful change, so it is probably
better to first pursue the idea of getting rid of some of the other
copies that happen in more centralized places (e.g. printtup). I
wonder if we could replace the whole
pq_beginmessage...()/pq_send....()/pq_endmessage...() system with
something a bit better-designed. For instance, suppose we get rid of
the idea that the caller supplies the buffer, and we move the
responsibility for error recovery into the pqcomm layer. So you do
something like:

my_message = xyz_beginmessage('D');
xyz_sendint32(my_message, 42);
xyz_endmessage(my_message);

Maybe what happens here under the hood is we keep a pool of free
message buffers sitting around, and you just grab one and put your
data into it. When you end the message we add it to a list of used
message buffers that are waiting to be sent, and once we send the data
it goes back on the free list. If an error occurs after
xyz_beginmessage() and before xyz_endmessage(), we put the buffer back
on the free list. That would allow us to merge (2) and (3) into a
single copy. To go further, we could allow send/output functions to
opt in to receiving a message buffer rather than returning a value,
and then we could get rid of (1) for types that participate. (4) seems
unavoidable AFAIK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-06-03 15:32:47 significant slowdown of HashAggregate between 9.6 and 10
Previous Message Euler Taveira 2020-06-03 15:08:56 Re: More tests with USING INDEX replident and dropped indexes