Re: efficient data reduction (and deduping)

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: efficient data reduction (and deduping)
Date: 2012-03-01 18:35:26
Message-ID: CAGTBQpYfx9Vt6zMK40jziMcHHRLZABDYQiTwq1SNkkE4F4k03w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 1, 2012 at 3:27 PM, Alessandro Gagliardi
<alessandro(at)path(dot)com> wrote:
> INSERT INTO hourly_activity
>     SELECT DISTINCT date_trunc('hour', hr_timestamp) AS activity_hour,
> activity_unlogged.user_id,
>                     client_ip, hr_timestamp, locale, log_id, method,
> server_ip, uri, user_agent
>         FROM activity_unlogged,
>             (SELECT user_id, MAX(hr_timestamp) AS last_timestamp
>                 FROM activity_unlogged GROUP BY user_id, date_trunc('hour',
> hr_timestamp)) AS last_activity
>     WHERE activity_unlogged.user_id = last_activity.user_id AND
> activity_unlogged.hr_timestamp = last_activity.last_timestamp;

Try

INSERT INTO hourly_activity
SELECT ... everything from au1 ...
FROM activity_unlogged au1
LEFT JOIN activity_unlogged au2 ON au2.user_id = au1.user_id
AND
date_trunc('hour', au2.hr_timestamp) = date_trunc('hour',
au1.hr_timestamp)
AND
au2.hr_timestamp < au1.hr_timestamp
WHERE au2.user_id is null;

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2012-03-01 18:35:27 Re: efficient data reduction (and deduping)
Previous Message Alessandro Gagliardi 2012-03-01 18:27:27 efficient data reduction (and deduping)