Performance improvements for src/port/snprintf.c

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Performance improvements for src/port/snprintf.c
Date: 2018-08-17 18:32:59
Message-ID: 11787.1534530779@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Over in the what-about-%m thread, we speculated about replacing the
platform's *printf functions if they didn't support %m, which would
basically mean using src/port/snprintf.c on all non-glibc platforms,
rather than only on Windows as happens right now (ignoring some
obsolete platforms with busted snprintf's).

I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD. While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:

1. Our implementation always scans the format string twice, so that it
can sort out argument-ordering options (%n$). Everybody else is bright
enough to do that only for formats that actually use %n$, and it turns
out that it doesn't really cost anything extra to do so: you can just
perform the extra scan when and if you first find a dollar specifier.
(Perhaps there's an arguable downside for this, with invalid format
strings that have non-dollar conversion specs followed by dollar ones:
with this approach we might fetch some arguments before realizing that
the format is broken. But a wrong format can cause indefinitely bad
results already, so that seems like a pretty thin objection to me,
especially if all other implementations share the same hazard.)

2. Our implementation is shoving simple data characters in the format
out to the result buffer one at a time. More common is to skip to the
next % as fast as possible, and then dump anything skipped over using
the string-output code path, reducing the overhead of buffer overrun
checking.

The attached patch fixes both of those things, and also does some
micro-optimization hacking to avoid loops around dopr_outch() as well
as unnecessary use of pass-by-ref arguments. This version stacks up
pretty well against all the libraries I compared it to. The remaining
weak spot is that floating-point conversions are consistently 30%-50%
slower than the native libraries, which is not terribly surprising
considering that our implementation involves calling the native sprintf
and then massaging the result. Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand. I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.

I've attached both the patch and a simple performance testbed in case
anybody wants to do their own measurements. For reference's sake,
these are the specific test cases I looked at:

snprintf(buffer, sizeof(buffer),
"%2$.*3$f %1$d\n",
42, 123.456, 2);

snprintf(buffer, sizeof(buffer),
"%.*g", 15, 123.456);

snprintf(buffer, sizeof(buffer),
"%d %d", 15, 16);

snprintf(buffer, sizeof(buffer),
"%10d", 15);

snprintf(buffer, sizeof(buffer),
"%s",
"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");

snprintf(buffer, sizeof(buffer),
"%d 0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",

snprintf(buffer, sizeof(buffer),
"%1$d 0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
42);

A couple of other notes of interest:

* The skip-to-next-% searches could alternatively be implemented with
strchr(), although then you need a strlen() call if there isn't another %.
glibc's version of strchr() is fast enough to make that a win, but since
we're not contemplating using this atop glibc, that's not a case we care
about. On other platforms the manual loop mostly seems to be faster.

* NetBSD seems to have a special fast path for the case that the format
string is exactly "%s". I did not adopt that idea here, reasoning that
checking for it would add overhead to all other cases, making it probably
a net loss overall. I'm prepared to listen to arguments otherwise,
though. It is a common case, I just doubt it's common enough (and
other library authors seem to agree).

I'll add this to the upcoming CF.

regards, tom lane

Attachment Content-Type Size
snprintf-speedups-1.patch text/x-diff 23.8 KB
timeprintf.c text/x-c 1.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-08-17 18:48:49 Re: [PATCH] Improve geometric types
Previous Message Emre Hasegeli 2018-08-17 18:24:18 Re: [PATCH] Improve geometric types