Re: TCP Overhead on Local Loopback

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Ofer Israeli <oferi(at)checkpoint(dot)com>
Cc: Samuel Gendler <sgendler(at)ideasculptor(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Andy <angelflow(at)yahoo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: TCP Overhead on Local Loopback
Date: 2012-04-03 15:38:34
Message-ID: CAGTBQpZyPw2Y=JWOZf=tgWS7V_cw9_FMc1d90cg0ZbiPF0yPig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Apr 3, 2012 at 12:24 PM, Ofer Israeli <oferi(at)checkpoint(dot)com> wrote:
> On Sun, Apr 2, 2012 at 11:25 AM, Samuel Gendler < sgendler(at)ideasculptor(dot)com >  wrote:
>> But suggesting moving away from TCP/IP with no actual evidence that it is network overhead that is the problem is a little premature, regardless.
>
> Agreed, that's why I'd like to understand what tools / methodologies are available in order to test whether TCP is the issue.

As it was pointed out already, if you perform 60.000 x (5+1+2) "select
1" queries you'll effectively measure TCP overhead, as planning and
execution will be down to negligible times.

>> What, exactly, are the set of operations that each update is performing and is there any way to batch them into fewer statements
>> within the transaction.  For example, could you insert all 60,000 records into a temporary table via COPY, then run just a couple of queries to do
>> bulk inserts and bulk updates into the destination tble via joins to the temp table?
>
> I don't see how a COPY can be faster here as I would need to both run the COPY into the temp table and then UPDATE all the columns in the real table.
> Are you referring to saving the time where all the UPDATEs would be performed via a stored procedure strictly in the db domain without networking back and forth?

You'll be saving a lot of planning and parsing time, as COPY is
significantly simpler to plan and parse, and the complex UPDATEs and
INSERTs required to move data from the temp table will only incur a
one-time planning cost. In general, doing it that way is significantly
faster than 480.000 separate queries. But it does depend on the
operations themselves.

>> 60,000 rows updated with 25 columns, 1 indexed in 3ms is not exactly slow.  That's a not insignificant quantity of data which must be transferred from client to server,
>> parsed, and then written to disk, regardless of TCP overhead.  That is happening via at least 60,000 individual SQL statements that are not even prepared statements.  I don't
>> imagine that TCP overhead is really the problem here.  Regardless, you can reduce both statement parse time and TCP overhead by doing bulk inserts
>> (COPY) followed by multi-row selects/updates into the final table.  I don't know how much below 3ms you are going to get, but that's going to be as fast
>> as you can possibly do it on your hardware, assuming the rest of your configuration is as efficient as possible.
>
> The 3ms is per each event processing, not the whole 60K batch.  Each event processing includes:
> 5 SELECTs
> 1 DELETE
> 2 UPDATEs
> where each query performed involves TCP connections, that is, the queries are not grouped in a stored procedure or such.

If you run the 480.000 queries on a single transaction, you use a
single connection already. So you only have transmission overhead,
without the TCP handshake. You still might gain a bit by disabling
Nagle's algorithm (if that's possible in windows), which is the main
source of latency for TCP. But that's very low-level tinkering.

> For all these queries does 3ms sound like a reasonable time?  If so, do you have an estimation of how long the network portion would be here?

You perform 8 roundtrips minimum per event, so that's 375us per query.
It doesn't look like much. That's probably Nagle and task switching
time, I don't think you can get it much lower than that, without
issuing less queries (ie: using the COPY method).

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Cesar Martin 2012-04-03 15:42:34 Re: H800 + md1200 Performance problem
Previous Message Tom Lane 2012-04-03 15:35:34 Re: ...WHERE TRUE" condition in union results in bad query pla