Help with extracting large volumes of records across related tables

From: "Damien Dougan" <damien(dot)dougan(at)mobilecohesion(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Help with extracting large volumes of records across related tables
Date: 2004-09-13 11:38:05
Message-ID: 004b01c49986$228c3d60$6e01a8c0@a31p005
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hi All,

I am having a performance problem extracting a large volume of data from
Postgres 7.4.2, and was wondering if there was a more cunning way to get
the data out of the DB...

This isn't a performance problem with any particular PgSQL operation,
its more a strategy for getting large volumes of related tables out of
the DB whilst perserving the relations between them.

Basically we have a number of tables, which are exposed as 2 public
views (say PvA and PvB). For each row in PvA, there are a number of
related rows in PvB (this number is arbitrary, which is one of the
reasons why it cant be expressed as additional columns in PvA - so we
really need 2 sets of tables - which leads to two sets of extract calls
- interwoven to associate PvA with PvB).

The problem is that for extraction, we ultimately want to grab a row
from PvA, and then all the related rows from PvB and store them together
offline (e.g. in XML).

However, the number of rows at any time on the DB is likely to be in the
millions, with perhaps 25% of them being suitable for extraction at any
given batch run (ie several hundred thousand to several million).

Currently, the proposal is to grab several hundred rows from PvA (thus
avoiding issues with the resultset being very large), and then process
each of them by selecting the related rows in PvB (again, several
hundred rows at a time to avoid problems with large result sets).

So the algorithm is basically:

Do

Select the next 200 rows from PvA

For each PvA row Do
Write current PvA row as XML

Do
Select the next 200 rows from PvB

For each PvB row Do
Write current PvB row as XML
within the parent PvA XML Element
End For
While More Rows
End For
While More Rows

However, this has a fairly significant performance impact, and I was
wondering if there was a faster way to do it (something like taking a
dump of the tables so they can be worked on offline - but a basic dump
means we have lost the 1:M relationships between PvA and PvB).

Are there any tools/tricks/tips with regards to extracting large volumes
of data across related tables from Postgres? It doesnt have to export
into XML, we can do post-processing on the extracted data as needed -
the important thing is to keep the relationship between PvA and PvB on a
row-by-row basis.

Many thanks,

Damien

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Paul Thomas 2004-09-13 12:10:23 Re: Help with extracting large volumes of records across related tables
Previous Message Christopher Browne 2004-09-13 09:21:42 Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables