From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | "Deole, Pushkar (Pushkar)" <pdeole(at)avaya(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Fetching last n records from Posgresql |
Date: | 2016-03-30 18:36:53 |
Message-ID: | CA+bJJbx+NU1AhVLAe5T1M7toPi9uinvGJ4UrKkp9=fJJQM0nZg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Pushkar:
On Wed, Mar 30, 2016 at 8:40 AM, Deole, Pushkar (Pushkar)
<pdeole(at)avaya(dot)com> wrote:
> Does PostgreSQL support a query to fetch last ‘n’ records that match the
> selection criteria. I am trying to fetch records from a table with start
> date that falls in last 30 days, however, I want to fetch the oldest ‘n’
> records and not the recent ones. I know there is a LIMIT clause which I can
> use but it will fetch the first ‘n’ records.
>
> I came across an approach which says that I can reverse the order and then
> use LIMIT and then order the records back using timestamp as below, but
> looking at the execution plan, it has to do a sort twice which may affect
> the performance of query if ‘n’ is large number:
To get at the last N records you generally have to approaches, read all of them
Do you have indexes on the record date? Because in this case it seems
that could be solved by a reverse index scan, In my case with a
somehow big table:
$ \d carrier_cdrs_201603
Table "public.carrier_cdrs_201603"
Column | Type | Modifiers
---------+--------------------------+-----------
...
setup | timestamp with time zone |
...
Indexes:
"idx_carrier_cdrs_201603_setup" btree (setup)
...
$ explain select * from carrier_cdrs_201603 order by setup desc limit 1000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..46.25 rows=1000 width=81)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup on
carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81)
(2 rows)
$ explain select * from ( select * from carrier_cdrs_201603 order by
setup desc limit 1000 ) last_1000 order by setup;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=106.08..108.58 rows=1000 width=81)
Sort Key: carrier_cdrs_201603.setup
-> Limit (cost=0.42..46.25 rows=1000 width=81)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup
on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81)
(4 rows)
$ explain with last_1000 as ( select * from carrier_cdrs_201603 order
by setup desc limit 1000 ) select * from last_1000 order by setup;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=116.08..118.58 rows=1000 width=184)
Sort Key: last_1000.setup
CTE last_1000
-> Limit (cost=0.42..46.25 rows=1000 width=81)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup
on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81)
-> CTE Scan on last_1000 (cost=0.00..20.00 rows=1000 width=184)
(6 rows)
The faster for me seems to be the subquery way, with timings and
usaing 10k records it says:
$ explain analyze select * from ( select * from carrier_cdrs_201603
order by setup desc limit 10000 ) last_10000 order by setup;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1223.09..1248.09 rows=10000 width=81) (actual
time=29.646..35.780 rows=10000 loops=1)
Sort Key: carrier_cdrs_201603.setup
Sort Method: quicksort Memory: 1791kB
-> Limit (cost=0.42..458.70 rows=10000 width=81) (actual
time=0.015..20.707 rows=10000 loops=1)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup
on carrier_cdrs_201603 (cost=0.42..33627.38 rows=733773 width=81)
(actual time=0.013..8.835 rows=10000 loops=1)
Total runtime: 41.913 ms
(6 rows)
And I fear its scanning and feeding into the sort, and accounting for
a part of the scan time in the sort phase as just the inner query
gives:
$ explain analyze select * from carrier_cdrs_201603 order by setup
desc limit 10000 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..458.70 rows=10000 width=81) (actual
time=0.015..20.938 rows=10000 loops=1)
-> Index Scan Backward using idx_carrier_cdrs_201603_setup on
carrier_cdrs_201603 (cost=0.42..33627.38 rows=733773 width=81)
(actual time=0.013..8.803 rows=10000 loops=1)
Total runtime: 27.020 ms
(3 rows)
So, 14 ms to sort 10k records seems like a reasonable price to pay.
As you see, only one sort, in whichever order I do it, and postgres
sorts really fast. This is very difficult to avoid. A smarter
optimizer could turn the sort into a reverse, but it seems difficult.
Or you could try to use a cursor, goto to the last record, and then
skip N backwards and go fro there, but IMHO it's not worth the
complexity, and, at least in my case, it is slower for 1000 records,
but YMMV
$ begin;
BEGIN
Time: 61.229 ms
$ declare last_1000 scroll cursor for select * from
carrier_cdrs_201603 order by setup;
DECLARE CURSOR
Time: 61.025 ms
$ move last in last_1000;
MOVE 1
Time: 282.142 ms
$ move backward 1000 in last_1000;
MOVE 1000
Time: 61.969 ms
$ fetch all from last_1000;
Time: 248.071 ms
$ close last_1000;
CLOSE CURSOR
Time: 60.922 ms
$ commit;
COMMIT
Time: 60.814 ms
Note how once you account for my 60ms RTT It's taking 220 ms to go to
the end, and 188 to fetch the result, while:
cdrs=# select * from ( select * from carrier_cdrs_201603 order by
setup desc limit 1000 ) last_1000 order by setup;
Time: 248.566 ms
I can do the select in just 188 too. ( This are just 1000 records, but
without explain analyze a nice chunk of the time is spent sending them
over my 60 ms RTT connection ).
Anyway, try things, measure, post results so we know what happens.
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | John Stewart | 2016-03-30 20:09:26 | Please advise on this trigger function |
Previous Message | Amit Langote | 2016-03-30 16:51:51 | Re: [GENERAL] pg_restore casts check constraints differently |