Re: performance for high-volume log insertion

From: david(at)lang(dot)hm
To: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-22 20:07:34
Message-ID: alpine.DEB.1.10.0904221304070.28211@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 22 Apr 2009, Glenn Maynard wrote:

> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> Yes, as I beleive was mentioned already, planning time for inserts is
>> really small.  Parsing time for inserts when there's little parsing that
>> has to happen also isn't all *that* expensive and the same goes for
>> conversions from textual representations of data to binary.
>>
>> We're starting to re-hash things, in my view.  The low-hanging fruit is
>> doing multiple things in a single transaction, either by using COPY,
>> multi-value INSERTs, or just multiple INSERTs in a single transaction.
>> That's absolutely step one.
>
> This is all well-known, covered information, but perhaps some numbers
> will help drive this home. 40000 inserts into a single-column,
> unindexed table; with predictable results:
>
> separate inserts, no transaction: 21.21s
> separate inserts, same transaction: 1.89s

are these done as seperate round trips?

i.e.
begin <send>
insert <send>
insert <send>
..
end <send>

or as one round trip?

i.e.
begin;insert;insert..;end

> 40 inserts, 100 rows/insert: 0.18s
> one 40000-value insert: 0.16s
> 40 prepared inserts, 100 rows/insert: 0.15s

are one of these missing a 0?

> COPY (text): 0.10s
> COPY (binary): 0.10s
>
> Of course, real workloads will change the weights, but this is more or
> less the magnitude of difference I always see--batch your inserts into
> single statements, and if that's not enough, skip to COPY.

thanks for this information, this is exactly what I was looking for.

can this get stored somewhere for reference?

David Lang
>From pgsql-performance-owner(at)postgresql(dot)org Wed Apr 22 17:37:08 2009
Received: from localhost (unknown [200.46.208.211])
by mail.postgresql.org (Postfix) with ESMTP id 7AF82632D82
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Wed, 22 Apr 2009 17:37:07 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.208.211]) (amavisd-maia, port 10024)
with ESMTP id 59754-08
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Wed, 22 Apr 2009 17:37:01 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from tamriel.snowman.net (tamriel.snowman.net [72.66.115.51])
by mail.postgresql.org (Postfix) with ESMTP id D13AD632CD9
for <pgsql-performance(at)postgresql(dot)org>; Wed, 22 Apr 2009 17:37:05 -0300 (ADT)
Received: by tamriel.snowman.net (Postfix, from userid 1000)
id 0530722108; Wed, 22 Apr 2009 16:37:04 -0400 (EDT)
Date: Wed, 22 Apr 2009 16:37:03 -0400
From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Message-ID: <20090422203703(dot)GH8123(at)tamriel(dot)snowman(dot)net>
References: <alpine(dot)DEB(dot)1(dot)10(dot)0904201442080(dot)28211(at)asgard(dot)lang(dot)hm> <20090421015515(dot)GR8123(at)tamriel(dot)snowman(dot)net> <alpine(dot)DEB(dot)1(dot)10(dot)0904201859460(dot)12662(at)asgard(dot)lang(dot)hm> <49EE1CE2(dot)8020902(at)mansionfamily(dot)plus(dot)com> <20090421192531(dot)GC8123(at)tamriel(dot)snowman(dot)net> <alpine(dot)DEB(dot)1(dot)10(dot)0904211709030(dot)12662(at)asgard(dot)lang(dot)hm> <20090422121928(dot)GF8123(at)tamriel(dot)snowman(dot)net> <bd36f99e0904221233n191d05abwada64ab8438fc5ae(at)mail(dot)gmail(dot)com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="mLR48wkpOFoZrHIA"
Content-Disposition: inline
In-Reply-To: <bd36f99e0904221233n191d05abwada64ab8438fc5ae(at)mail(dot)gmail(dot)com>
X-Editor: Vim http://www.vim.org/
X-Info: http://www.snowman.net
X-Operating-System: Linux/2.6.26-1-amd64 (x86_64)
X-Uptime: 16:34:12 up 100 days, 21:32, 24 users, load average: 0.13, 0.11,
0.06
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0.205 tagged_above=0 required=5 tests=AWL=0.205
X-Spam-Level:
X-Archive-Number: 200904/357
X-Sequence-Number: 33724

--mLR48wkpOFoZrHIA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Glenn,

* Glenn Maynard (glennfmaynard(at)gmail(dot)com) wrote:
> This is all well-known, covered information, but perhaps some numbers
> will help drive this home. 40000 inserts into a single-column,
> unindexed table; with predictable results:

Thanks for doing the work. I had been intending to but hadn't gotten to
it yet.

> separate inserts, no transaction: 21.21s
> separate inserts, same transaction: 1.89s
> 40 inserts, 100 rows/insert: 0.18s
> one 40000-value insert: 0.16s
> 40 prepared inserts, 100 rows/insert: 0.15s
> COPY (text): 0.10s
> COPY (binary): 0.10s

What about 40000 individual prepared inserts? Just curious about it.

Also, kind of suprised about COPY text vs. binary. What was the data
type in the table..? If text, that makes sense, if it was an integer or
something else, I'm kind of suprised.

Thanks,

Stephen

--mLR48wkpOFoZrHIA
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAknvf+8ACgkQrzgMPqB3kigfrgCggjcQ5axBN+Skqg35MaA/EaIb
OUAAn34HJWhYZZhOAJl8UZ2nZ5+iOaAL
=pDo7
-----END PGP SIGNATURE-----

--mLR48wkpOFoZrHIA--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-04-22 20:49:09 Re: performance for high-volume log insertion
Previous Message Glenn Maynard 2009-04-22 19:33:21 Re: performance for high-volume log insertion