Re: Importing Large Amounts of Data

From: Curt Sampson <cjs(at)cynic(dot)net>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Importing Large Amounts of Data
Date: 2002-04-15 08:19:20
Message-ID: Pine.NEB.4.43.0204151655080.439-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 15 Apr 2002, Christopher Kings-Lynne wrote:

> OK, well now it depends on what kind of selects you're doing. Do you
> regularly select over a certain subset of the data, in which case using
> partial indices might give you significant speedup.

I believe from the information I've been given that we will indeed
be regularly selecting over certain subsets, based on day. (One of
the test queries I've been asked to use selects based on user_id
and a date range.) But I was intending to partition the tables
based on date range (to keep the index rebuild time from getting
completely out of hand), so that will handily take care of that
requirement anyway.

> Do you select functions of columns?

No.

> It depends on your definition. You have to accept a certain overhead if
> you're to have data integrity and MVCC. If you can't handle that overhead,
> then you can't have data integrity and vice versa.

Well, a few points:

a) I am not convinced that data integrity should cost a five-fold
decrease in performance,

b) In fact, at times I don't need that data integrity. I'm prefectly
happy to risk the loss of a table during import, if it lets me do the
import more quickly, especially if I'm taking the database off line
to do the import anyway. MS SQL server in fact allows me to specify
relaxed integrity (with attendant risks) when doing a BULK IMPORT; it
would be cool if Postgres allowed that to.

> BTW, instead of:
>
> CREATE UNIQUE INDEX bigone_pkey ON bigone (rec_no);
>
> do:
>
> ALTER TABLE bigone ADD PRIMARY KEY(rec_no);
>
> And remember to run "VACUUM ANALYZE bigone;" or just "ANALYZE bigone;" after
> the COPY and before trying to use the table. I'm not sure if it's better to
> analyze before or after the indexes are added, but it's definitely better to
> vaccum before the indexes are added.

Thanks. This is the kind of useful information I'm looking for. I
was doing a vacuum after, rather than before, generating the indices.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-04-15 08:24:36 Re: Importing Large Amounts of Data
Previous Message Christopher Kings-Lynne 2002-04-15 07:53:51 Re: Importing Large Amounts of Data