Re: Fetching last n records from Posgresql

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: "Deole, Pushkar (Pushkar)" <pdeole(at)avaya(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fetching last n records from Posgresql
Date: 2016-03-30 18:36:53
Message-ID: CA+bJJbx+NU1AhVLAe5T1M7toPi9uinvGJ4UrKkp9=fJJQM0nZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Pushkar:

On Wed, Mar 30, 2016 at 8:40 AM, Deole, Pushkar (Pushkar)
<pdeole(at)avaya(dot)com> wrote:
> Does PostgreSQL support a query to fetch last ‘n’ records that match the
> selection criteria. I am trying to fetch records from a table with start
> date that falls in last 30 days, however, I want to fetch the oldest ‘n’
> records and not the recent ones. I know there is a LIMIT clause which I can
> use but it will fetch the first ‘n’ records.
>
> I came across an approach which says that I can reverse the order and then
> use LIMIT and then order the records back using timestamp as below, but
> looking at the execution plan, it has to do a sort twice which may affect
> the performance of query if ‘n’ is large number:

To get at the last N records you generally have to approaches, read all of them

Do you have indexes on the record date? Because in this case it seems
that could be solved by a reverse index scan, In my case with a
somehow big table:

$ \d carrier_cdrs_201603
Table "public.carrier_cdrs_201603"
Column | Type | Modifiers
---------+--------------------------+-----------
...
setup | timestamp with time zone |
...
Indexes:
"idx_carrier_cdrs_201603_setup" btree (setup)
...
$ explain select * from carrier_cdrs_201603 order by setup desc limit 1000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..46.25 rows=1000 width=81)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup on
carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81)
(2 rows)

$ explain select * from ( select * from carrier_cdrs_201603 order by
setup desc limit 1000 ) last_1000 order by setup;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=106.08..108.58 rows=1000 width=81)
Sort Key: carrier_cdrs_201603.setup
-> Limit (cost=0.42..46.25 rows=1000 width=81)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup
on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81)
(4 rows)

$ explain with last_1000 as ( select * from carrier_cdrs_201603 order
by setup desc limit 1000 ) select * from last_1000 order by setup;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=116.08..118.58 rows=1000 width=184)
Sort Key: last_1000.setup
CTE last_1000
-> Limit (cost=0.42..46.25 rows=1000 width=81)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup
on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81)
-> CTE Scan on last_1000 (cost=0.00..20.00 rows=1000 width=184)
(6 rows)

The faster for me seems to be the subquery way, with timings and
usaing 10k records it says:

$ explain analyze select * from ( select * from carrier_cdrs_201603
order by setup desc limit 10000 ) last_10000 order by setup;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1223.09..1248.09 rows=10000 width=81) (actual
time=29.646..35.780 rows=10000 loops=1)
Sort Key: carrier_cdrs_201603.setup
Sort Method: quicksort Memory: 1791kB
-> Limit (cost=0.42..458.70 rows=10000 width=81) (actual
time=0.015..20.707 rows=10000 loops=1)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup
on carrier_cdrs_201603 (cost=0.42..33627.38 rows=733773 width=81)
(actual time=0.013..8.835 rows=10000 loops=1)
Total runtime: 41.913 ms
(6 rows)

And I fear its scanning and feeding into the sort, and accounting for
a part of the scan time in the sort phase as just the inner query
gives:

$ explain analyze select * from carrier_cdrs_201603 order by setup
desc limit 10000 ;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..458.70 rows=10000 width=81) (actual
time=0.015..20.938 rows=10000 loops=1)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup on
carrier_cdrs_201603 (cost=0.42..33627.38 rows=733773 width=81)
(actual time=0.013..8.803 rows=10000 loops=1)
Total runtime: 27.020 ms
(3 rows)

So, 14 ms to sort 10k records seems like a reasonable price to pay.

As you see, only one sort, in whichever order I do it, and postgres
sorts really fast. This is very difficult to avoid. A smarter
optimizer could turn the sort into a reverse, but it seems difficult.
Or you could try to use a cursor, goto to the last record, and then
skip N backwards and go fro there, but IMHO it's not worth the
complexity, and, at least in my case, it is slower for 1000 records,
but YMMV

$ begin;
BEGIN
Time: 61.229 ms
$ declare last_1000 scroll cursor for select * from
carrier_cdrs_201603 order by setup;
DECLARE CURSOR
Time: 61.025 ms
$ move last in last_1000;
MOVE 1
Time: 282.142 ms
$ move backward 1000 in last_1000;
MOVE 1000
Time: 61.969 ms
$ fetch all from last_1000;
Time: 248.071 ms
$ close last_1000;
CLOSE CURSOR
Time: 60.922 ms
$ commit;
COMMIT
Time: 60.814 ms

Note how once you account for my 60ms RTT It's taking 220 ms to go to
the end, and 188 to fetch the result, while:

cdrs=# select * from ( select * from carrier_cdrs_201603 order by
setup desc limit 1000 ) last_1000 order by setup;
Time: 248.566 ms

I can do the select in just 188 too. ( This are just 1000 records, but
without explain analyze a nice chunk of the time is spent sending them
over my 60 ms RTT connection ).

Anyway, try things, measure, post results so we know what happens.

Francisco Olarte.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John Stewart 2016-03-30 20:09:26 Please advise on this trigger function
Previous Message Amit Langote 2016-03-30 16:51:51 Re: [GENERAL] pg_restore casts check constraints differently