Re: Capacity questions

From: Federico Di Gregorio <fog(at)dndg(dot)it>
To: Bill House <wch-tech(at)house-grp(dot)net>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Capacity questions
Date: 2012-06-18 10:04:03
Message-ID: 4FDEFD13.2040206@dndg.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 17/06/12 17:32, Bill House wrote:
> I have a written a script which does what I want it to do on a table
> with 999 records.
>
> I was wondering if there is anything I need to consider if I run this
> script on a table with 1.1 million records?
>
> The purpose of the script is to flag records representing files if they
> are redundant (duplicate md5sum).
>
> Later this table will be used to move or remove the files represented by
> the flagged records.
>
> The script performs the following actions
>
> 1 resets the del column to false for all records
>
> 2 creates a temporary table consisting of aggregate records of
> identical md5sums which count more than 1.
>
> 3 iterates down the cursor and with this information
>
> 4 updates the del column to True on all records in a batch of
> identical md5sums except for one.

If your recno column is a primary key, you can do this with a single SQL
UPDATE, without the initial update, temporary table or psycopg (with the
right indices this is blazing fast even on a gazzilion rows):

UPDATE files_test SET del = T.n > 1
FROM (SELECT recno,
row_number() OVER (PARTITION BY md5sum) AS n
FROM files_test) T
WHERE files_test.recno = T.recno;

Hope this helps,

federico (that *loves* UPDATE/FROM and window functions)

--
Federico Di Gregorio federico(dot)digregorio(at)dndg(dot)it
Studio Associato Di Nunzio e Di Gregorio http://dndg.it
- Ma cos'ha il tuo pesce rosso, l'orchite?
- Si, ha un occhio solo, la voce roca e mangia gli altri pesci.

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Federico Di Gregorio 2012-06-18 10:27:02 Re: Patches to add support for tox and Travis CI tools to psycopg2
Previous Message Daniele Varrazzo 2012-06-18 09:28:22 Re: Capacity questions