Re: Is there a faster way to do this?

From: Edmund Bacon <ebacon(at)onesystem(dot)com>
To: Patrick Hatcher <PHatcher(at)macys(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Is there a faster way to do this?
Date: 2004-06-15 20:20:09
Message-ID: 40CF59F9.8030006@onesystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Perhaps something along the lines of

CREATE TEMP SEQUENCE pctile_seq;

SELECT ... , ((nextval('pctile_seq')/(rec_per_group+1) +1)
FROM (SELECT ... WHERE sales > 0 ORDER BY SALES) sales;

DROP TEMP SEQUENCE pctile_seq;

This doesn't add in the 'sales = 0 are in percentile 0, but that
shouldn't be too hard. I think, with some playing around, one might be
able to to an UPDATE from the above SELECT, which will eliminate the
read/update loop.

Patrick Hatcher wrote:
>
>
>
> pg: 7.4.2
> RedHat 7.2
>
> Can I get some advice on a possible faster way of doing this:
>
> Scenario: Each day I update a column in a table with an internal
> percentile value. To arrive at this value, I'll get a count of records
> with sales > 0 and then divide this count by the total number of tile
> groups I want. So for example:
> Total records w/sales > 0 = 730,000
> tile# = 100
> total percentile groups (730,000 / 100):7300
>
> Now that I have the total number of groups I need, I cycle through my
> recordset, grab the key field and the current percentile number and stuff
> the values into a temp table. I mod the current row counter against the
> total percentile group number. If it is 0 then I add 1 to the current
> percentile number . After inserting records into the temp file I then
> update the main table.
>
> Using the example above, the first 7300 records would get a percentile
> number of 1, the next 7300 records would get a percentile number of 2,
> then 3,4,5,etc.
>
> Unfortunately, I am going record by record in a loop and the process takes
> upwards of 20mins. Is there a faster way to do this? I thought about
> using limit and offset, but I'm not sure how I would accomplish it.
>
> Below is the function I currently use. Thanks for any help provided
>
> CREATE OR REPLACE FUNCTION cdm.percentile_calc()
> RETURNS text AS
> 'DECLARE
> v_interval int4;
> v_tile int4;
> v_percentile int4;
> v_check int4;
> v_count int4;
> v_rowcount int4;
> myRec RECORD;
>
> BEGIN
> v_count:=0;
> v_tile:= 100;
> v_percentile:=1;
> v_rowcount :=1;
> v_check:=0;
>
>
> /* Get count of records with val_purch > 0 */
> select into v_count count(*) from cdm.cdm_indiv_mast where
> val_purch_com >0;
>
> /* this number will be used as part of our MOD to tell when to add one
> to our percentile */
> v_interval := v_count / v_tile;
>
> CREATE TEMP TABLE cdmperct (f1 int8, f2 int2);
>
>
> FOR myRec IN select indiv_key from cdm.cdm_indiv_mast where
> val_purch_com >0 order by val_purch_com desc LOOP
> INSERT INTO cdmperct values (myRec.indiv_key,v_percentile);
> v_check = mod(v_rowcount,v_interval);
> IF v_check = 0 THEN
> v_percentile:=v_percentile+1;
> END IF;
> v_rowcount:= v_rowcount+1;
> END LOOP;
>
>
> UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from cdmperct where
> indiv_key = f1;
>
>
> DROP TABLE cdmperct;
> RETURN \'DONE\';
> END; '
> LANGUAGE 'plpgsql' IMMUTABLE;
>
> Patrick Hatcher
> Macys.Com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
Edmund Bacon <ebacon(at)onesystem(dot)com>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Jean-Luc Lachance 2004-06-15 20:38:21 Re: Is there a faster way to do this?
Previous Message Richard Huxton 2004-06-15 20:17:05 Re: Is there a faster way to do this?