Why is there a tenfold difference between Postgres's alleged query execution time and packet transmission time?

From: Rollo Konig-Brock <rollokb(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Why is there a tenfold difference between Postgres's alleged query execution time and packet transmission time?
Date: 2021-04-14 23:10:40
Message-ID: CAOkhyB7Cm=5GuRYr6LJQcxw_XKkHDeV_FVsCk4LDzN-Yig5wrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey all,

I've been pulling my hair out over this for days now, as I'm trying to
build a low latency application. Databases should be fast, but I can not
work out why so much latency is added between the actual database process
and the application code. For simple queries, that should take less than a
millisecond, this mystery latency is by far the biggest performance hit.

For example, I have a very simple table on a local Postgres database, which
takes 324us to query. This is absolutely fine.

postgres=# EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
select count("ID") from example t;
-[ RECORD 1 ]------------------------------------------------------
QUERY PLAN | Aggregate (actual rows=1 loops=1)
-[ RECORD 2 ]------------------------------------------------------
QUERY PLAN | -> Seq Scan on example t (actual rows=4119 loops=1)
-[ RECORD 3 ]------------------------------------------------------
QUERY PLAN | Planning Time: 0.051 ms
-[ RECORD 4 ]------------------------------------------------------
QUERY PLAN | Execution Time: 0.324 ms

But then if I want to connect to the database in Python, getting the actual
data takes over 2500us! The loopback interface is not the problem here, I
sanity checked that (it adds 100us at a stretch).

In [1]: %time cursor.execute("select count(\"ID\") from example;");r =
cursor.fetchmany()
CPU times: user 316 µs, sys: 1.12 ms, total: 1.44 ms
Wall time: 2.73 ms

Investigating further, I opened up WireShark to see how long the packets
themselves take.

[image: Wireshark screenshot] <https://i.stack.imgur.com/ItoSu.png>

From this I can tell that Postgres took 2594us to send the data. This kind
of overhead for an loopback IPC call is way beyond what I would expect. Why
is this? And how can I optimise this away?

This is a problem for the software ecosystem, because people start thinking
databases are too slow to do simple lookups, and they start caching queries
in Redis that take 300us to execute, but 3ms to actually fetch back to the
client. It adds up to an awful lot of latency.

As an aside, I've been really miffed by how bad database IPC performance
has been in general. Oracle has a problem where each ~1500 bytes makes
receiving the packet 800us slower (on a gigabit network).

I'd really love to know what's happening here.

Rollo

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Daniel Westermann (DWE) 2021-04-15 13:17:52 Strange behavior once statistics are there
Previous Message Alvaro Herrera 2021-04-14 13:20:46 Re: LWLocks by LockManager slowing large DB