Re: Is there a faster way to do this?

From: Richard Huxton <dev(at)archonet(dot)com>
To: Patrick Hatcher <PHatcher(at)macys(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Is there a faster way to do this?
Date: 2004-06-15 20:17:05
Message-ID: 40CF5941.3090301@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Patrick Hatcher wrote:
>
> Scenario: Each day I update a column in a table with an internal
> percentile value. To arrive at this value, I'll get a count of records
> with sales > 0 and then divide this count by the total number of tile
> groups I want. So for example:
> Total records w/sales > 0 = 730,000
> tile# = 100
> total percentile groups (730,000 / 100):7300
>
> Now that I have the total number of groups I need, I cycle through my
> recordset, grab the key field and the current percentile number and stuff
> the values into a temp table. I mod the current row counter against the
> total percentile group number. If it is 0 then I add 1 to the current
> percentile number . After inserting records into the temp file I then
> update the main table.
>
> Using the example above, the first 7300 records would get a percentile
> number of 1, the next 7300 records would get a percentile number of 2,
> then 3,4,5,etc.
>
> Unfortunately, I am going record by record in a loop and the process takes
> upwards of 20mins. Is there a faster way to do this? I thought about
> using limit and offset, but I'm not sure how I would accomplish it.

Idea #1:
Well, if you ordered by something unique, you could use ORDER BY, LIMIT
and OFFSET

UPDATE my_table SET my_group=1 WHERE val > 0 ORDER BY val, my_id_column
OFFSET 0 LIMIT 7300;
UPDATE my_table SET my_group=2 WHERE val > 0 ORDER BY val, my_id_column
OFFSET 7300 LIMIT 7300;
etc.

You'll need to make sure the order is guaranteed for all rows when "val"
is the same though.

Now, the problem with this is that by the time you get to group 99,
you've had to scan over group 1 98 times - not good. If you knew how
many rows there were for any given "val" then you could restrict it much
more though.

Idea #2:
Turn your function around. Declare a cursor on the sorted SELECT of
target-rows. SKIP 7300 rows and read the primary-key. Use that to issue
an update of all rows between two values. That's only 100 updates issued
rather than one for each target row.

Idea #3:
Someone might well be able to come up with a clever idea involving a
join against a set-returning function, but I'm not sure about
guaranteeing the order of the join vs the returned set (and it's getting
late here). Any ideas people?

Maybe one of those is some use
--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Edmund Bacon 2004-06-15 20:20:09 Re: Is there a faster way to do this?
Previous Message Patrick Hatcher 2004-06-15 18:55:06 Is there a faster way to do this?