Re: [PATCH] Push limit to sort through a subquery

From: Douglas Doole <dougdoole(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Push limit to sort through a subquery
Date: 2017-08-17 15:36:02
Message-ID: CADE5jYJKLYQ4BmLsFd-5wm_QtEuzgbxq=yzzX6a_aEUHeBB_Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I completely agree. The further a limit can be pushed down, the better.

The patch looks good to me.

On Thu, Aug 17, 2017 at 8:27 AM Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

>
>
> On 29.04.2017 00:13, Douglas Doole wrote:
>
> If you add this to the commitfest app, more people might look at it when
>> the next commitfest opens.
>
>
> I have added it. https://commitfest.postgresql.org/14/1119/
>
> Also, it might help if you can provide a query/ies with numbers where this
>> optimization shows improvement.
>>
>
> I can't provide the real queries where we encountered the problem because
> they are internal. However I showed a simplified version of the queries in
> my first post.
>
> On our queries, the change made quite a difference - execution time
> dropped from 31.4 seconds to 7.2 seconds. Explain analyze also shows that
> memory use dropped significantly and we didn't have to spill the sort to
> disk
>
> From:
>
> -> Sort (cost=989.95..1013.27 rows=9326 width=30)
> (node_startup_time/loop=31328.891, node_total_time/loop: 31329.756
> rows=2001 loops=1) Buffers: temp read=772 written=11201 lsm_bufmgr
> hits=3392 Sort Key: *** Sort Method: external merge Sort Space Used: 89592
> Sort Space Type: Disk
>
> To:
>
> -> Sort (cost=989.95..1013.27 rows=9326 width=30)
> (node_startup_time/loop=7123.275, node_total_time/loop: 7123.504 rows=2001
> loops=1) Buffers: lsm_bufmgr hits=3387 Sort Key: *** Sort Method: top-N
> heapsort Sort Space Used: 3256 Sort Space Type: Memory
>
>
> Attached please find yet another small patch which pushes down LIMIT to
> ForeignScan.
> I should notice that currently Postgres optimizer is using "Merge Append"
> and fetches from remote nodes only required number of tuples.
> So even without LIMIT push down, postgres_fdw will not pull the whole
> table from remote host.
> postgres_fdw is using cursor for fetching data from remote. Default fetch
> size is 100, so even without limit remote query will fetch no more than
> 100 rows at remote site.
>
> Assume the following example:
>
> postgres=# create extension postgres_fdw;
> postgres=# create server shard1 FOREIGN DATA WRAPPER postgres_fdw
> options(dbname 'postgres', host 'localhost', port '5432');
> postgres=# create server shard2 FOREIGN DATA WRAPPER postgres_fdw
> options(dbname 'postgres', host 'localhost', port '5432');
> postgres=# CREATE USER MAPPING for $user SERVER shard1 options (user
> '$user');
> postgres=# CREATE USER MAPPING for $user SERVER shard1 options (user
> '$user');
> postgres=# CREATE TABLE t(u integer primary key, v integer);
> postgres=# CREATE TABLE t1(u integer primary key, v integer);
> postgres=# CREATE TABLE t2(u integer primary key, v integer);
> postgres=# insert into t1 values (generate_series(1,100000),
> random()*100000);
> postgres=# insert into t2 values (generate_series(1,100000),
> random()*100000);
> postgres=# CREATE FOREIGN TABLE t_fdw1() inherits (t) server shard1
> options(table_name 't1');
> postgres=# CREATE FOREIGN TABLE t_fdw2() inherits (t) server shard2
> options(table_name 't2');
>
>
> postgres=# explain analyze select * from t order by u limit 1;
> QUERY
> PLAN
>
> -----------------------------------------------------------------------------------------------------------------------
> Limit (cost=200.15..200.20 rows=1 width=8) (actual time=2.010..2.010
> rows=1 loops=1)
> -> Merge Append (cost=200.15..449.39 rows=5121 width=8) (actual
> time=2.009..2.009 rows=1 loops=1)
> Sort Key: t.u
> -> Index Scan using t_pkey on t (cost=0.12..8.14 rows=1
> width=8) (actual time=0.005..0.005 rows=0 loops=1)
> -> Foreign Scan on t_fdw2 (cost=100.00..193.92 rows=2560
> width=8) (actual time=1.074..1.074 rows=1 loops=1)
> -> Foreign Scan on t_fdw1 (cost=100.00..193.92 rows=2560
> width=8) (actual time=0.928..0.928 rows=1 loops=1)
> Planning time: 0.769 ms
> Execution time: 6.837 ms
> (8 rows)
>
> As you can see foreign scan fetches only one row from each remote node.
>
> But still pushing down limit can have positive effect on performance,
> especially if SORT can be replaced with TOP-N.
> I got the following results (time in seconds):
>
> Query
> original
> limit push down
> select * from t order by u limit 1
> 2.276
> 1.777
> select * from t order by v limit 1
> 100 42
>
> There is index for "u", so fetching records with smallest "u" values can
> be done without sorting, so times are similar.
> But in case of sorting by "v", pushing down limit allows to use TOP-1
> instead of global sort and it reduces query execution time more than 2
> times.
>
> --
>
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dang Minh Huong 2017-08-17 15:43:13 Re: Extra Vietnamese unaccent rules
Previous Message Tom Lane 2017-08-17 15:30:56 Re: pl/perl extension fails on Windows