Re: performance for high-volume log insertion

From: david(at)lang(dot)hm
To: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-22 22:46:30
Message-ID: alpine.DEB.1.10.0904221545320.28211@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 22 Apr 2009, Glenn Maynard wrote:

> On Wed, Apr 22, 2009 at 4:53 PM, James Mansion
> <james(at)mansionfamily(dot)plus(dot)com> wrote:
>> And I'm disagreeing with that.  Single row is a given, but I think you'll
>> find it pays to have one
>> round trip if at all possible and invoking multiple prepared statements can
>> work against this.
>
> You're talking about round-trips to a *local* server, on the same
> system, not a dedicated server with network round-trips, right?

the use-case for a production setup for logging servers would probably
include a network hop.

David Lang

> Blocking round trips to another process on the same server should be
> fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
> connection) where the other side is listening for it; and then
> blocking in return for the response. The act of writing to an FD that
> another process is waiting for will make the kernel mark the process
> as "ready to wake up" immediately, and the act of blocking for the
> response will kick the scheduler to some waiting process, so as long
> as there isn't something else to compete for CPU for, each write/read
> will wake up the other process instantly. There's a task switching
> cost, but that's too small to be relevant here.
>
> Doing 1000000 local round trips, over a pipe: 5.25s (5 *microseconds*
> each), code attached. The cost *should* be essentially identical for
> any local transport (pipes, named pipes, local TCP connections), since
> the underlying scheduler mechanisms are the same.
>
> That's not to say that batching inserts doesn't make a difference--it
> clearly does, and it would probably be a much larger difference with
> actual network round-trips--but round-trips to a local server aren't
> inherently slow. I'd be (casually) interested in knowing what the
> actual costs are behind an SQL command round-trip (where the command
> isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O
> for things like constraint checks that need to be done before the
> command can return success).
>
>
>From pgsql-performance-owner(at)postgresql(dot)org Wed Apr 22 22:48:35 2009
Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id B2E12632E70
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Wed, 22 Apr 2009 22:48:34 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 10845-09
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Wed, 22 Apr 2009 22:48:32 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from tamriel.snowman.net (tamriel.snowman.net [72.66.115.51])
by mail.postgresql.org (Postfix) with ESMTP id 55E82632292
for <pgsql-performance(at)postgresql(dot)org>; Wed, 22 Apr 2009 22:48:33 -0300 (ADT)
Received: by tamriel.snowman.net (Postfix, from userid 1000)
id 44F55228C7; Wed, 22 Apr 2009 21:48:31 -0400 (EDT)
Date: Wed, 22 Apr 2009 21:48:31 -0400
From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Message-ID: <20090423014831(dot)GK8123(at)tamriel(dot)snowman(dot)net>
References: <alpine(dot)DEB(dot)1(dot)10(dot)0904201859460(dot)12662(at)asgard(dot)lang(dot)hm> <49EE1CE2(dot)8020902(at)mansionfamily(dot)plus(dot)com> <20090421192531(dot)GC8123(at)tamriel(dot)snowman(dot)net> <alpine(dot)DEB(dot)1(dot)10(dot)0904211709030(dot)12662(at)asgard(dot)lang(dot)hm> <20090422121928(dot)GF8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221233n191d05abwada64ab8438fc5ae(at)mail(dot)gmail(dot)com> <20090422203703(dot)GH8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221420v3f8680c6m65033c4738efa179(at)mail(dot)gmail(dot)com> <20090422215100(dot)GI8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221516tb8285bn1c1af6205fa272e0(at)mail(dot)gmail(dot)com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="12Mx6XhZmCw9eEgg"
Content-Disposition: inline
In-Reply-To: <bd36f99e0904221516tb8285bn1c1af6205fa272e0(at)mail(dot)gmail(dot)com>
X-Editor: Vim http://www.vim.org/
X-Info: http://www.snowman.net
X-Operating-System: Linux/2.6.26-1-amd64 (x86_64)
X-Uptime: 21:28:07 up 101 days, 2:26, 23 users, load average: 0.00, 0.04,
0.05
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200904/367
X-Sequence-Number: 33734

--12Mx6XhZmCw9eEgg
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* Glenn Maynard (glennfmaynard(at)gmail(dot)com) wrote:
> On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > For a single column table, I wouldn't expect much either. =A0With more
> > columns I think it would be a larger improvement.
>=20
> Maybe. I'm not sure why parsing "(1,2,3,4,5)" in an EXECUTE parameter
> should be faster than parsing the exact same thing in an INSERT,
> though.

Erm.. Prepared queries is about using PQexecPrepared(), not about
sending a text string as an SQL EXECUTE(). PQexecPrepared takes an
array of arguments. That gets translated into a Bind command in the
protocol with a defined number of parameters and a length for each
parameter being passed. That removes any need for scanning/parsing the
string sent to the backend. That's the savings I'm referring to.

If you weren't using PQexecPrepared() (and using psql, you wouldn't
be..), then the difference you saw was more likely planning cost.

> Of course, you still need to get it in that format. Be careful to
> include any parsing you're doing to create the binary date in the
> benchmarks. Inevitably, at least part of the difference will be costs
> simply moving from the psql process to your own.

Sure. What I recall from when I was working on this is that it wasn't
terribly hard to go from unix timestamps (epoch from 1970) to a PG
timestamp format (and there was nice example code in the backend) in
terms of CPU time.

Thanks,

Stephen

--12Mx6XhZmCw9eEgg
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAknvyO8ACgkQrzgMPqB3kigw8QCgl44vCriX5XkLYMcML36TAVrv
M0sAnjXQaAj9NDcjwkzopKlcO62U+TnR
=kczC
-----END PGP SIGNATURE-----

--12Mx6XhZmCw9eEgg--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2009-04-23 01:50:36 Re: performance for high-volume log insertion
Previous Message Thomas Kellerer 2009-04-22 22:25:52 Re: performance for high-volume log insertion