Buld Insert and Index use.

From: Rudi Starcevic <tech(at)wildcash(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Buld Insert and Index use.
Date: 2004-08-12 00:00:02
Message-ID: 411AB302.809@wildcash.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

This email is picking up a thread from yesterday on INSERTS and INDEXES.

In this case the question is to use and index or a sequential scan.

I have included the db DDL and SELECT query.

For each month I have a csv data dump of council property data.

So the First CD will have almost all unique records.
From there most properties will already be in the db.

Each line of the csv file populates three tables.

gc_prop
gc_value
gc_owner

gc_prop has a one to many relationship with gc_value and gc_owner.

For-each line we first SELECT from the gc_prop table.
If there is no record we INSERT into the gc_prop,gc_value and gc_owner
tables.

If there is a matching property we do not need to INSERT into the
gc_prop table only the gc_value and gc_owner tables.

So each row will require either:

a) One SELECT + three INSERTS
or
b) One SELECT + two INSERTS

CREATE TABLE gc_prop
(
gc_id serial PRIMARY KEY,
acc_no numeric(9,0),
master_code varchar(32),
no_prop numeric(4,0),
lot_no numeric(9,0),
plan_no varchar(32),
volume_no numeric(8,0),
folio_no numeric(8,0),
street_no varchar(32),
street_name varchar(30),
suburb_name varchar(30),
suburb_postcode numeric(4,0),
address1_txt varchar(80),
address2_txt varchar(80),
address3_txt varchar(80),
address4_txt varchar(80),
address5_txt varchar(80),
address6_txt varchar(80),
owner_postcode text,
planning_zone varchar(80),
waterfront_code varchar(80),
vacant_land varchar(32),
property_area text,
improvemeny_type varchar(20),
bup_indicator varchar(4),
property_name varchar(30),
lot_quantity text,
pool_indicator varchar(32),
owner_occupied varchar(32),
prsh_text varchar(30),
vg_no varchar(20),
property_part varchar(20),
lot_no_txt varchar(32),
plan_qualifier1 varchar(80),
plan_qualifier2 varchar(80),
owner_trans_date date,
legal_entity varchar(32),
"1st_surname_com" varchar(80),
"1st_given_name" varchar(64),
"2nd_surname" varchar(64),
"2nd_given_name" varchar(64),
easement_flag varchar(32),
trans_code varchar(32)
);

CREATE TABLE gc_value
(
gc_v_id serial PRIMARY KEY,
gc_id integer NOT NULL, -- foreign key to gc_prop table
current_valuation numeric(10,0),
valuation_date date,
last_sale_date date,
annual_rates numeric(7,0),
sale_amount numeric(9,0),
type_sale varchar(32),
date_sale date,
pre_val_date date,
pre_val_amount numeric(10,0),
pre_vg_no varchar(20),
future_val_date varchar(32),
fut_val_amount numeric(10,0),
fut_val_no varchar(32)
);

CREATE TABLE gc_owner
(
gc_o_id serial PRIMARY KEY,
gc_id integer NOT NULL, -- foreign key to gc_prop table
pre_legal_entity varchar(32),
pre_surname_com varchar(76),
pre_given_name varchar(32),
pre_2nd_surname varchar(40),
pre_2nd_given varchar(32),
orig_lot_no numeric(9,0),
orig_lot_txt varchar(32),
orig_prop_txt text,
orig_plan_qualifier1 varchar(32),
orig_plan_qualifier2 varchar(32),
orig_plan_no varchar(32)
);

CREATE INDEX gc_prop_check ON gc_prop (
acc_no,no_prop,lot_no,plan_no,street_no,street_name,suburb_postcode );

//check if this property already exists using gc_prop_check index
$sql = "
SELECT
gp.gc_id,
gp.acc_no,
gp.no_prop,
gp.lot_no,
gp.plan_no,
gp.street_no,
gp.street_name,
gp.suburb_postcode,
gp.owner_trans_date
FROM gc_prop gp
WHERE
gp.acc_no = $acc_no,
AND
gp.no_prop = $no_prop,
AND
gp.lot_no = $lot_no
AND
gp.plan_no = '$plan_no'
AND
gp.street_no = '$street_no'
AND
gp.street_name = '$street_name'
AND
gp.suburb_postcode = $suburb_postcode
";

Do you think an Index or Seq. scan should be used?

Thanks.
Regards,
Rudi.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gaetano Mendola 2004-08-12 00:06:31 Re: [HACKERS] fsync vs open_sync
Previous Message Russell Smith 2004-08-11 23:52:20 Re: Hardware upgrade for a high-traffic database