Re: Populating large tables with occasional bad values

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "John T(dot) Dow" <john(at)johntdow(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Populating large tables with occasional bad values
Date: 2008-06-11 16:20:46
Message-ID: 484FFB5E.8000203@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Sorry for the self-reply, but a correction and an example are really
required.

Craig Ringer wrote:

> OK, so you're waiting for each INSERT to complete before issuing the
> next. That means that over the Internet you add *at* *least* (2*latency)
> to the time it takes to complete each insert, where `latency' is the
> round trip time for a packet between the DB client and server.

Whoops. At least `latency' not 2*latency. I was thinking one-way latency
and got confused at some point into saying round trip instead. Sorry.

> That gets expensive fast. My latency from my home DSL to my work's DSL
> is 12ms - and I'm in the same city and on the same ISP as work is, as
> well as connected to the same peering point. I regularly see latencies
> of > 150ms to hosts within Australia.

Here's an example. The psql client is running on my desktop, and
connected to work's Pg server.

I set \timing in psql so psql reports the total time from when the local
client issues the query to when it receives the reply.

I then issue an insert with EXPLAIN ANALYZE so the server reports how
long it took the server to execute the query.

test=# \timing
Timing is on
test=# explain analyze insert into dummy (pk, value)
test-# values (101,'thing');
QUERY PLAN

--------------------------------------------------------------------------------

Result (cost=0.00..0.01 rows=1 width=0)
(actual time=0.001..0.002 rows=1 loops=1)
Total runtime: 0.090 ms
(2 rows)

Time: 21.914 ms

You can see that the sever took only 0.09ms to execute the query, but to
the client it appeared to take 21ms.

If I ssh to the server and connect with psql locally over a unix domain
socket or the loopback interface, I get these results instead:

test=# explain analyze insert into dummy (pk, value)
test-# values (102,'thing');
QUERY PLAN

------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
(actual time=0.002..0.002 rows=1 loops=1)
Total runtime: 0.056 ms
(2 rows)

Time: 0.530 ms

... which is 40 times faster from client query issue to client query
completion despite the negligible server execution speed difference.

multiple row INSERTs, use of COPY, etc will all help combat this.

--
Craig Ringer\

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message John T. Dow 2008-06-11 16:39:36 Re: Populating large tables with occasional bad values
Previous Message Craig Ringer 2008-06-11 16:06:01 Re: Populating large tables with occasional bad values