Re: Why is pq_begintypsend so slow?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Fetter <david(at)fetter(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is pq_begintypsend so slow?
Date: 2020-05-18 16:38:05
Message-ID: 6648.1589819885@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
>> FWIW, I've also observed, in another thread (the node func generation
>> thing [1]), that inlining enlargeStringInfo() helps a lot, especially
>> when inlining some of its callers. Moving e.g. appendStringInfo() inline
>> allows the compiler to sometimes optimize away the strlen. But if
>> e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo()
>> unconditionally, successive appends cannot optimize away memory accesses
>> for ->len/->data.

> With a set of patches doing so, int4send itself is not a significant
> factor for my test benchmark [1] anymore.

This thread seems to have died out, possibly because the last set of
patches that Andres posted was sufficiently complicated and invasive
that nobody wanted to review it. I thought about this again after
seeing that [1] is mostly about pq_begintypsend overhead, and had
an epiphany: there isn't really a strong reason for pq_begintypsend
to be inserting bits into the buffer at all. The bytes will be
filled by pq_endtypsend, and nothing in between should be touching
them. So I propose 0001 attached. It's poking into the stringinfo
abstraction a bit more than I would want to do if there weren't a
compelling performance reason to do so, but there evidently is.

With 0001, pq_begintypsend drops from being the top single routine
in a profile of a test case like [1] to being well down the list.
The next biggest cost compared to text-format output is that
printtup() itself is noticeably more expensive. A lot of the extra
cost there seems to be from pq_sendint32(), which is getting inlined
into printtup(), and there probably isn't much we can do to make that
cheaper. But eliminating a common subexpression as in 0002 below does
help noticeably, at least with the rather old gcc I'm using.

For me, the combination of these two eliminates most but not quite
all of the cost penalty of binary over text output as seen in [1].

regards, tom lane

[1] https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com

Attachment Content-Type Size
0001-fix-pq_begintypsend.patch text/x-diff 890 bytes
0002-minor-printtup-fix.patch text/x-diff 710 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-05-18 17:03:23 Re: Missing grammar production for WITH TIES
Previous Message Luke Porter 2020-05-18 16:21:35 PostgresSQL project