Re: performance for high-volume log insertion

From: david(at)lang(dot)hm
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Glenn Maynard <glennfmaynard(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-23 04:56:51
Message-ID: alpine.DEB.1.10.0904222156090.12662@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 22 Apr 2009, Stephen Frost wrote:

> * Glenn Maynard (glennfmaynard(at)gmail(dot)com) wrote:
>> On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>> For a single column table, I wouldn't expect much either.  With more
>>> columns I think it would be a larger improvement.
>>
>> Maybe. I'm not sure why parsing "(1,2,3,4,5)" in an EXECUTE parameter
>> should be faster than parsing the exact same thing in an INSERT,
>> though.
>
> Erm.. Prepared queries is about using PQexecPrepared(), not about
> sending a text string as an SQL EXECUTE(). PQexecPrepared takes an
> array of arguments. That gets translated into a Bind command in the
> protocol with a defined number of parameters and a length for each
> parameter being passed. That removes any need for scanning/parsing the
> string sent to the backend. That's the savings I'm referring to.

are you sure? I thought that what goes out over the wire is always text.

David Lang

> If you weren't using PQexecPrepared() (and using psql, you wouldn't
> be..), then the difference you saw was more likely planning cost.
>
>> Of course, you still need to get it in that format. Be careful to
>> include any parsing you're doing to create the binary date in the
>> benchmarks. Inevitably, at least part of the difference will be costs
>> simply moving from the psql process to your own.
>
> Sure. What I recall from when I was working on this is that it wasn't
> terribly hard to go from unix timestamps (epoch from 1970) to a PG
> timestamp format (and there was nice example code in the backend) in
> terms of CPU time.
>
> Thanks,
>
> Stephen
>
>From pgsql-performance-owner(at)postgresql(dot)org Thu Apr 23 08:04:40 2009
Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id 4789B63254D
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Thu, 23 Apr 2009 08:04:38 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 78570-10
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Thu, 23 Apr 2009 08:04:36 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from tamriel.snowman.net (tamriel.snowman.net [72.66.115.51])
by mail.postgresql.org (Postfix) with ESMTP id 8FBA66323BD
for <pgsql-performance(at)postgresql(dot)org>; Thu, 23 Apr 2009 08:04:36 -0300 (ADT)
Received: by tamriel.snowman.net (Postfix, from userid 1000)
id A581622238; Thu, 23 Apr 2009 07:04:34 -0400 (EDT)
Date: Thu, 23 Apr 2009 07:04:34 -0400
From: Stephen Frost <sfrost(at)snowman(dot)net>
To: david(at)lang(dot)hm
Cc: Glenn Maynard <glennfmaynard(at)gmail(dot)com>,
pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Message-ID: <20090423110434(dot)GM8123(at)tamriel(dot)snowman(dot)net>
References: <20090421192531(dot)GC8123(at)tamriel(dot)snowman(dot)net> <alpine(dot)DEB(dot)1(dot)10(dot)0904211709030(dot)12662(at)asgard(dot)lang(dot)hm> <20090422121928(dot)GF8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221233n191d05abwada64ab8438fc5ae(at)mail(dot)gmail(dot)com> <20090422203703(dot)GH8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221420v3f8680c6m65033c4738efa179(at)mail(dot)gmail(dot)com> <20090422215100(dot)GI8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221516tb8285bn1c1af6205fa272e0(at)mail(dot)gmail(dot)com> <20090423014831(dot)GK8123(at)tamriel(dot)snowman(dot)net> <alpine(dot)DEB(dot)1(dot)10(dot)0904222156090(dot)12662(at)asgard(dot)lang(dot)hm>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="7p2pcO0s0ZIbU/PH"
Content-Disposition: inline
In-Reply-To: <alpine(dot)DEB(dot)1(dot)10(dot)0904222156090(dot)12662(at)asgard(dot)lang(dot)hm>
X-Editor: Vim http://www.vim.org/
X-Info: http://www.snowman.net
X-Operating-System: Linux/2.6.26-1-amd64 (x86_64)
X-Uptime: 06:48:25 up 101 days, 11:46, 19 users, load average: 0.01, 0.05,
0.06
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200904/371
X-Sequence-Number: 33738

--7p2pcO0s0ZIbU/PH
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

* david(at)lang(dot)hm (david(at)lang(dot)hm) wrote:
> On Wed, 22 Apr 2009, Stephen Frost wrote:
>> Erm.. Prepared queries is about using PQexecPrepared(), not about
>> sending a text string as an SQL EXECUTE(). PQexecPrepared takes an
>> array of arguments. That gets translated into a Bind command in the
>> protocol with a defined number of parameters and a length for each
>> parameter being passed. That removes any need for scanning/parsing the
>> string sent to the backend. That's the savings I'm referring to.
>
> are you sure? I thought that what goes out over the wire is always text.

Wow, why is there so much confusion and misunderstanding about this?

*psql* sends everything to the backend as text (except perhaps COPY
BINARY.. but that's because the user handles it), but if you're using
libpq, PQexecPrepared, and protocol 3.0 (any recent PG version), it's
going to use the Parse/Bind protocol-level commands. To make it perhaps
more clear, here's a snippet from the libpq code for PQsendQueryGuts(),
which is the work-horse called by PQexecPrepared:

/*
* We will send Parse (if needed), Bind, Describe Portal, Execute, Sync,
* using specified statement name and the unnamed portal.
*/
[...]

/* Construct the Bind message */
if (pqPutMsgStart('B', false, conn) < 0 ||
pqPuts("", conn) < 0 ||
pqPuts(stmtName, conn) < 0)
goto sendFailed;

/* Send parameter formats */
[...]
-- No param formats included, let the backend know
if (pqPutInt(0, 2, conn) < 0)
goto sendFailed;

-- Tell the backend the number of parameters to expect
if (pqPutInt(nParams, 2, conn) < 0)
goto sendFailed;

/* Send parameters */
for (i = 0; i < nParams; i++)
[...]
-- Pull the length from the caller-provided for each param
nbytes = paramLengths[i];
[...]
-- Send the length, then the param, over the wire
if (pqPutInt(nbytes, 4, conn) < 0 ||
pqPutnchar(paramValues[i], nbytes, conn) < 0)
goto sendFailed;
[...]
-- All done, send finish indicator
if (pqPutInt(1, 2, conn) < 0 ||
pqPutInt(resultFormat, 2, conn))
goto sendFailed;
if (pqPutMsgEnd(conn) < 0)
goto sendFailed;

/* construct the Describe Portal message */
if (pqPutMsgStart('D', false, conn) < 0 ||
pqPutc('P', conn) < 0 ||
pqPuts("", conn) < 0 ||
pqPutMsgEnd(conn) < 0)
goto sendFailed;

/* construct the Execute message */
if (pqPutMsgStart('E', false, conn) < 0 ||
pqPuts("", conn) < 0 ||
pqPutInt(0, 4, conn) < 0 ||
pqPutMsgEnd(conn) < 0)
goto sendFailed;

[...]
-- clear everything out
if (pqFlush(conn) < 0)
goto sendFailed;

Any other questions?

Thanks,

Stephen

--7p2pcO0s0ZIbU/PH
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAknwS0IACgkQrzgMPqB3kiizaQCfcWZ6JbSwo9wgt95YTxJy6awn
VacAnAxdm4gzJsgS0ArWJd+Iii0ZPxdQ
=AqNi
-----END PGP SIGNATURE-----

--7p2pcO0s0ZIbU/PH--

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2009-04-23 11:11:32 Re: performance for high-volume log insertion
Previous Message Glenn Maynard 2009-04-23 02:20:41 Re: performance for high-volume log insertion