getting most recent row efficiently

From: Fran Fabrizio <ffabrizio(at)mmrd(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: getting most recent row efficiently
Date: 2001-12-17 21:30:18
Message-ID: 3C1E63EA.D46B8B58@mmrd.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hello,

I've got a table that keeps a log of a person's favorites over time
(what follows is a simplified example)

person_id favorite_color
1 red
1 blue
2 green
3 yellow
3 purple

I want the set of most recent entries for each person. So assuming the
order they appear above is the order they were inserted, I want:

person_id favorite_color
1 blue
2 green
3 purple

Is there any way to retrieve "the most recent row for each distinct
person_id?" without making a timestamp column for when the row was
inserted and then grouping them by person_id?

What I include below is the real world example. There's a table,
doctor_favorites, that functions like my sample table above. There's a
view, current_doctor_favorites, which contains the most recent row for
each doctor. This view has a high cost as you can see. There's a
second view, doctor_metrics, which combines the data from the
current_doctor_favorites view with that of another table, and that is my
ultimate goal. It's taking a loooong time to run, I think because of
the sequential scan when creating the current_doctor_favorites view from
the doctor_favorites table. I'm wondering if there's a way to perhaps
use an OID or something to determine the most recent rows from
doctor_favorites rather than the timestamp and the subselect. I tried
to create an index on the local_ts timestamp field of the underlying
doctor_favorites table, but it's hard to index something that's
basically unique each time you insert a row anyhow.

monitor-prod=# \d doctor_favorites
Table "doctor_favorites"
Attribute | Type | Modifier
-----------------+--------------------------+------------------------
favorites | integer |
remote_ts | timestamp with time zone | not null
local_ts | timestamp with time zone | not null default now()
med_practice_id | integer |
doctor_id | integer |
Indices: docid_index,
docid_medpracid_index,
localts_index,
medpracid_index

monitor-test=# \d current_doctor_favorites
View "current_doctor_favorites"
Attribute | Type | Modifier
-----------------+---------+----------
doctor_id | integer |
med_practice_id | integer |
favorites | integer |
View definition: SELECT df.doctor_id, df.med_practice_id, df.favorites
FROM doctor_favorites df WHERE (df.local_ts = (SELECT
max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE
((doctor_favorites.doctor_id = df.doctor_id) AND
(doctor_favorites.med_practice_id = df.med_practice_id))));

monitor-test=# explain SELECT df.doctor_id, df.med_practice_id,
df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT
max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE
((doctor_favorites.doctor_id = df.doctor_id) AND
(doctor_favorites.med_practice_id = df.med_practice_id))));
NOTICE: QUERY PLAN:

Seq Scan on doctor_favorites df (cost=0.00..9564.75 rows=47 width=12)
SubPlan
-> Aggregate (cost=2.02..2.02 rows=1 width=8)
-> Index Scan using docid_medpracid_index on
doctor_favorites (cost=0.00..2.02 rows=1 width=8)

EXPLAIN
monitor-test=#

monitor-prod=# \d doctor_metrics
View "doctor_metrics"
Attribute | Type | Modifier
-----------------+---------+----------
doctor_id | integer |
med_practice_id | integer |
lab_requests | integer |
lab_results | integer |
rx_auth | integer |
transcriptions | integer |
omnidocs | integer |
phone | integer |
favorites | integer |
View definition: SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN
doctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS
doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN
doctor_tasks.med_practice_id ELSE
current_doctor_favorites.med_practice_id END AS med_practice_id,
doctor_tasks.lab_requests, doctor_tasks.lab_results,
doctor_tasks.rx_auth, doctor_tasks.transcriptions,
doctor_tasks.omnidocs, doctor_tasks.phone,
current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN
current_doctor_favorites USING (doctor_id, med_practice_id));

monitor-prod=# explain SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL)
THEN do
ctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS
doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN
doctor_tasks.med_practice_id ELSE
current_doctor_favorites.med_practice_id END AS med_practice_id,
doctor_tasks.lab_requests, doctor_tasks.lab_results,
doctor_tasks.rx_auth, doctor_tasks.transcriptions,
doctor_tasks.omnidocs, doctor_tasks.phone,
current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN
current_doctor_favorites USING (doctor_id, med_practice_id));
NOTICE: QUERY PLAN:

Merge Join (cost=9566.05..9640.75 rows=1000 width=44)
-> Index Scan using doctor_tasks_pkey on doctor_tasks
(cost=0.00..59.00 rows=1000 width=32)
-> Sort (cost=9566.05..9566.05 rows=47 width=12)
-> Subquery Scan current_doctor_favorites (cost=0.00..9564.75
rows=47 width=12)
-> Seq Scan on doctor_favorites df (cost=0.00..9564.75
rows=47 width=12)
SubPlan
-> Aggregate (cost=2.02..2.02 rows=1 width=8)
-> Index Scan using docid_medpracid_index
on doctor_favorites (cost=0.00..2.02 rows=1 width=8)

EXPLAIN
monitor-prod=#

Thanks for your feedback,
Fran

Responses

Browse pgsql-general by date

  From Date Subject
Next Message SHELTON,MICHAEL (Non-HP-Boise,ex1) 2001-12-17 22:11:03 Re: getting most recent row efficiently
Previous Message Darren Ferguson 2001-12-17 21:10:13 Re: Query Help