Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2018-09-06 16:51:53
Message-ID: 153625271389.4078.18405235923135713189.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

A fairly common planning problem for us is what we call "most recent first" queries; i.e., "the 50 most recent <table> rows for a <foreign key>".

Here's a basic setup:

-- created_at has very high cardinality
create table foo(pk serial primary key, owner_fk integer, created_at timestamp);
create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at);

-- technically this data guarantees unique created_at values,
-- but there's no reason it couldn't be modified to have a few
-- random non-unique values to prove the point
insert into foo(owner_fk, created_at)
select i % 100, now() - (i::text || ' minutes')::interval
from generate_series(1, 1000000) t(i);

And here's the naive query to get the results we want:

select *
from foo
where owner_fk = 23
-- pk is only here to disambiguate/guarantee a stable sort
-- in the rare case that there are collisions in the other
-- sort field(s)
order by created_at desc, pk desc
limit 50;

On stock Postgres this ends up being pretty terrible for cases where the fk filter represents a large number of rows, because the planner generates a sort node under the limit node and therefore fetches all matches, sorts them, and then applies the limit. Here's the plan:

Limit (cost=61386.12..61391.95 rows=50 width=16) (actual time=187.814..191.653 rows=50 loops=1)
-> Gather Merge (cost=61386.12..70979.59 rows=82224 width=16) (actual time=187.813..191.647 rows=50 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=60386.10..60488.88 rows=41112 width=16) (actual time=185.639..185.642 rows=42 loops=3)
Sort Key: created_at DESC, pk DESC
Sort Method: top-N heapsort Memory: 27kB
Worker 0: Sort Method: top-N heapsort Memory: 27kB
Worker 1: Sort Method: top-N heapsort Memory: 27kB
-> Parallel Bitmap Heap Scan on foo (cost=3345.24..59020.38 rows=41112 width=16) (actual time=25.150..181.804 rows=33333 loops=3)
Recheck Cond: (owner_fk = 23)
Heap Blocks: exact=18014
-> Bitmap Index Scan on idx_foo_on_owner_and_created_at (cost=0.00..3320.57 rows=98668 width=0) (actual time=16.992..16.992 rows=100000 loops=1)
Index Cond: (owner_fk = 23)
Planning Time: 0.384 ms
Execution Time: 191.704 ms

I have a recursive CTE that implements the algorithm:
- Find first n+1 results
- If result at n+1’s created_at value differs from the n’s value, return first n values.
- If those equal, gather more results until a new created_at value is encountered.
- Sort all results by created_at and a tie-breaker (e.g., pk) and return the first n values.
But nobody wants to use/write that normally (it's quite complex).

This patch solves the problem presented; here's the plan:

Limit (cost=2.70..2.76 rows=50 width=16) (actual time=0.233..0.367 rows=50 loops=1)
-> Incremental Sort (cost=2.70..111.72 rows=98668 width=16) (actual time=0.232..0.362 rows=50 loops=1)
Sort Key: created_at DESC, pk DESC
Presorted Key: created_at
Sort Method: quicksort Memory: 26kB
Sort Groups: 2
-> Index Scan Backward using idx_foo_on_owner_and_created_at on foo (cost=0.56..210640.79 rows=98668 width=16) (actual time=0.054..0.299 rows=65 loops=1)
Index Cond: (owner_fk = 23)
Planning Time: 0.428 ms
Execution Time: 0.393 ms

While check world fails, the only failure appears to be a plan output change in test/isolation/expected/drop-index-concurrently-1.out that just needs to be updated (incremental sort is now used in this plan); I don't see any functionality breakage.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-09-06 17:39:47 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Tom Lane 2018-09-06 16:25:34 Re: *_to_xml() should copy SPI_processed/SPI_tuptable