Re: Is there a faster way to do this?

From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Is there a faster way to do this?
Date: 2004-06-17 03:03:31
Message-ID: a2a9e91124547cf39eefa84ee9b6dca0@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hard to imagine it taking that long, even for a table that large,
but we don't know the details of your system. I got a sample
table with 800,000 records down to 92 seconds using the function
below. See how it does for you. Notes follow.

CREATE OR REPLACE FUNCTION percentile_calc(INTEGER) RETURNS text AS '
DECLARE
v_tile ALIAS FOR $1;
v_interval integer;
v_record record;
v_rowcount integer := 0;
v_percentile integer := 1;

BEGIN

SELECT COUNT(*)/v_tile FROM cdm_indiv_mast WHERE val_purch_com > 0
INTO v_interval;

FOR v_record IN
SELECT ctid FROM cdm_indiv_mast WHERE val_purch_com>0
ORDER BY val_purch_com DESC
LOOP
v_rowcount := v_rowcount + 1;
UPDATE cdm_indiv_mast SET percentiler=v_percentile WHERE ctid = v_record.ctid;
IF v_rowcount >= v_interval THEN
v_percentile := v_percentile + 1;
v_rowcount := 0;
END IF;
END LOOP;

RETURN \'DONE\';
END;
' LANGUAGE plpgsql STABLE STRICT;

CREATE OR REPLACE FUNCTION percentile_calc() RETURNS text
AS 'SELECT new10(100);' LANGUAGE SQL;

Notes:

Since 100 seemed to be hard-coded into the original function, there was
no need for v_count. Instead, I made "v_tile" a variable, with a default
of "100" if the function is called with no argument.

There may be a false assumption here. If the values of val_purch_com are
not unique, then two items with the same val_purch_com may have different
percentiler values. If this is the case, you may want to at least enforce
some ordering of these values by adding more to the ORDER BY clause.
Without knowing the full table structure, I can't recommend what columns
to add there.

To really speed this up, make sure that you do not have any indexes on
the table. By using tids, we neatly avoid having to use any indexes in the
function itself. Unless you are using oids and really need them (highly
unlikely because of the "indiv_key" column), you should remove them:

ALTER TABLE cdm_indiv_mast SET WITHOUT OIDS;

Of course, vacuuming completely and often is recommended for a table this
size as well, especially when updating this many rows at once. I'd
recommend a VACUUM FULL immediately before running it.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200406162303

-----BEGIN PGP SIGNATURE-----

iD8DBQFA0QpdvJuQZxSWSsgRAk88AKDtGrs6+/ypaaNU6DWqvhCgtsrM0gCg+2ve
J8JKOPgxp42c54Nx/rzHdxs=
=sNFW
-----END PGP SIGNATURE-----

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Kris Jurka 2004-06-17 04:30:03 Re: Prepare Statement
Previous Message Jie Liang 2004-06-17 00:53:07 Re: Prepare Statement