Re: Network Flow Schema + Bulk Import/Updates

From: Tony Wasson <ajwasson(at)gmail(dot)com>
To: "Michael L(dot) Artz" <dragon(at)october29(dot)net>
Cc: pgsql-general(at)postgresql(dot)org, miker(at)purplefrog(dot)com
Subject: Re: Network Flow Schema + Bulk Import/Updates
Date: 2005-09-21 19:04:20
Message-ID: 6d8daee30509211204548f314f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 9/20/05, Michael L. Artz <dragon(at)october29(dot)net> wrote:
> I'm fairly new at Postgres and had some basic design questions. My
> problem is basically that I want to do large bulk imports (millions of
> rows) into a large DB (billions of rows) where there might already be
> data that needs to be updated instead of inserting a new row. I read a
> similar post a few days ago, but I just had a couple more questions.

You can use the merge trigger below to do this. You'll need to add
some code to update the count. You may also benefit from using the new
constraint exclusion (table partitioning) in PostgreSQL 8.1. I am not
sure if CE works against the inet datatype -- if not, try converting
the IP to an integer.

This merge function is from an earlier version by Mike Rylander I got
from here:
http://archives.postgresql.org/pgsql-sql/2004-05/msg00135.php

His version worked fine, all mistakes are my own. I wanted to allow
multiple key columns.

--
-- Merge on INSERT functionallity for Postgres 8.0+
--
-- Original Author: miker ( at ) purplefrog ( dot ) com / 5-14-04
-- ajwasson (at) gmail (dot) cot -- Added support for multiple key
columns 8-20-05
--
-- CAVEAT EMPTOR: Uses table locks to avoid concurrency issues,
-- so it WILL slow down heavily loaded tables.
-- This effecivly puts the table into
-- TRANSACTION ISOLATION LEVEL SERIALIZABLE mode.
--
-- NOTE: You don't want to use key columns that are NULLable.

CREATE OR REPLACE FUNCTION add_merge_on_insert2(
tablename TEXT, -- table name
keycols TEXT[], -- key columns
updatecols TEXT[] -- columns to update is key columns match
) RETURNS TEXT
AS $BODY$

DECLARE
trig TEXT;
upclause TEXT := '';
keyclause TEXT := '';
out_msg TEXT;
BEGIN
-- setup the where clause with all key columns (probably primary keys)
FOR a IN 1 .. array_upper(keycols,1) LOOP
keyclause := keyclause || quote_ident(keycols[a]) || ' = NEW.'
|| quote_ident(keycols[a]) || ' AND ';
END LOOP;
--trim the last AND
keyclause := trim(trailing ' AND ' FROM keyclause);

-- setup the columns to UPDATE
FOR i IN 1 .. array_upper(updatecols,1) LOOP
upclause := upclause || quote_ident(updatecols[i])
|| ' = COALESCE(NEW.' || quote_ident(updatecols[i])
|| ', orig.' || quote_ident(updatecols[i]) || '), ';
END LOOP;
--trim the last comma and space
upclause := trim(trailing ', ' FROM upclause);

----- put together the function now
EXECUTE 'CREATE FUNCTION "' || tablename || '_merge_on_insert_f" ()
RETURNS TRIGGER AS $$
DECLARE
orig ' || quote_ident(tablename) || '%ROWTYPE;
BEGIN
-- NOTE: This function was dynamically built by add_merge_on_insert2
LOCK TABLE ' || quote_ident(tablename) || ' IN ROW EXCLUSIVE MODE;

SELECT INTO orig * FROM ' || quote_ident(tablename) || ' WHERE '
|| keyclause || ';

IF NOT FOUND THEN
RETURN NEW;
END IF;

UPDATE ' || quote_ident(tablename) || ' SET ' || upclause || '
WHERE ' || keyclause || ';

RETURN NULL;
END;
$$ LANGUAGE plpgsql
'; -- end of execute

EXECUTE 'CREATE TRIGGER "' || tablename || '_merge_on_insert_t" BEFORE INSERT
ON ' || quote_ident(tablename) || ' FOR EACH ROW
EXECUTE PROCEDURE "' || tablename || '_merge_on_insert_f" ();
'; -- end of execute

out_msg := 'FUNCTION ' || tablename || '_merge_on_insert_f ();
TRIGGER ' || tablename || '_merge_on_insert_t';
RETURN out_msg;
END;
$BODY$ LANGUAGE 'plpgsql';

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jason Tesser 2005-09-21 19:22:22 returning the primary key value
Previous Message Philip Hallstrom 2005-09-21 18:46:33 Re: Fetching column names for a table