Re: efficient data reduction (and deduping)

From: Alessandro Gagliardi <alessandro(at)path(dot)com>
To: Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: efficient data reduction (and deduping)
Date: 2012-03-01 19:28:32
Message-ID: CAAB3BBKS2ncPZROej1OtE_k-DAEsRpJ2JP8w7zhoVJ8+i3EfDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I was thinking of adding an index, but thought it would be pointless since
I would only be using the index once before dropping the table (after its
loaded into hourly_activity). I assumed it would take longer to create the
index and then use it than to just seq scan once or twice. Am I wrong in
that assumption?

On Thu, Mar 1, 2012 at 10:40 AM, Peter van Hardenberg <pvh(at)pvh(dot)ca> wrote:

> On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi
> <alessandro(at)path(dot)com> wrote:
> > Now, I want to reduce that data to get the last activity that was
> performed
> > by each user in any given hour. It should fit into a table like this:
> >
>
> How about:
>
> 1) Create an expression based index on date_trunc('hour', hr_timestamp)
> 2) Create a view on that showing the last value
> 3) If you want to throw away the data use CREATE TABLE AS on the
> results of the view.
>
> You may also want to investigate window functions.
>
> -p
>
> --
> Peter van Hardenberg
> San Francisco, California
> "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alessandro Gagliardi 2012-03-01 19:29:00 Re: efficient data reduction (and deduping)
Previous Message Kääriäinen Anssi 2012-03-01 19:06:40 Re: Large insert and delete batches