Re: Fwd: [HACKERS] client performance v.s. server statistics

From: Han Zhou <zhouhan(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Fwd: [HACKERS] client performance v.s. server statistics
Date: 2012-02-16 01:44:53
Message-ID: CADtzDCnTP=mfOHLB87aJZN=agmzH0fAG3USVAt+W6LEH1+mudQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi Andres,

Good hint!

DBRNWHSB=# COPY (SELECT * FROM my_large) TO '/tmp/somefile';
COPY 73728
Time: 1405.976 ms
DBRNWHSB=# COPY (SELECT * FROM my_large) TO '/tmp/somefile_binary' BINARY ;
COPY 73728
Time: 840.987 ms
DBRNWHSB=# EXPLAIN ANALYZE SELECT * FROM my_large;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on my_large (cost=0.00..1723.78 rows=80678 width=59)
(actual time=0.036..114.400 rows=73728 loops=1)
Total runtime: 171.561 ms
(2 rows)

Time: 172.523 ms
DBRNWHSB=# SELECT * FROM my_large;
...
Time: 1513.274 ms

In this test the record number is 73728, each with tens of bytes. The
size of somefile is 5,455,872, and the size of somefile_binary is even
more: 6,782,997. However, BINARY COPY to memory file costs lower, so
it means something else, e.g. result preparing is taking CPU time. But
even the BINARY COPY still takes much more time than the ANALYZE:
840ms v.s. 172ms. So I guess most time is spent in preparing +
transferring result in backend, and this part of time is not counted
in the ANALYZE or pg_stat_statement statistics.

If this assumption is true, then is it possible to optimise towards
the result preparing and transferring in backend? Or is there any
"bulk" output operation already supported in some existing PostgreSQL
options?

Best regards,
Han

On Wed, Feb 15, 2012 at 7:36 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On Wednesday, February 15, 2012 12:33:13 PM Han Zhou wrote:
>> Hi,
>>
>> To be more specific, I list my calculation here:
>> The timing shown in psql may include: plan + execution + copying to
>> result set in backend (does this step exist?) + transferring data to
>> client via socket.
> Correct.
>
>> Then I want to know what's the time shown in pg_stat_statement and
>> EXPLAIN ANALYZE in terms of the above mentioned parts. And why are the
>> gap is almost 10 times (100 ms v.s. 1 second)? As a comparison,
>> transferring same amount of data with unix domain socket should cost
>> only a very small fraction of this (almost negligible), according to
>> my other performance tests.
> Yea, you proved my quick theory wrong.
>
>> And I don't think the plan time plays an important role here in
>> EXPLAIN ANALYZE, because the command itself costs similar time to the
>> "Total runtime" as shown in psql (timing on), which means the plan is
>> too simple to take any significant part of time in this case.
> Sounds like that.
>
> It would be interesting to see the time difference between:
> COPY (SELECT * FROM blub) TO '/tmp/somefile';
> COPY (SELECT * FROM blub) TO '/tmp/somefile' BINARY;
> EXPLAIN ANALYZE SELECT * FROM blub;
>
> Andres

--
Best regards,
Han

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2012-02-16 05:24:19 Re: Speed dblink using alternate libpq tuple storage
Previous Message Dann Corbit 2012-02-16 00:37:40 Re: CUDA Sorting

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Innerebner 2012-02-16 06:54:37 Re: Optimizer is not choosing index
Previous Message Tom Lane 2012-02-15 23:11:20 Re: Optimizer is not choosing index