Quick Links

Help with extracting large volumes of records across related tables

From:	"Damien Dougan" <damien(dot)dougan(at)mobilecohesion(dot)com>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Help with extracting large volumes of records across related tables
Date:	2004-09-13 11:38:05
Message-ID:	004b01c49986$228c3d60$6e01a8c0@a31p005
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi All,

I am having a performance problem extracting a large volume of data from
Postgres 7.4.2, and was wondering if there was a more cunning way to get
the data out of the DB...

This isn't a performance problem with any particular PgSQL operation,
its more a strategy for getting large volumes of related tables out of
the DB whilst perserving the relations between them.

Basically we have a number of tables, which are exposed as 2 public
views (say PvA and PvB). For each row in PvA, there are a number of
related rows in PvB (this number is arbitrary, which is one of the
reasons why it cant be expressed as additional columns in PvA - so we
really need 2 sets of tables - which leads to two sets of extract calls
- interwoven to associate PvA with PvB).

The problem is that for extraction, we ultimately want to grab a row
from PvA, and then all the related rows from PvB and store them together
offline (e.g. in XML).

However, the number of rows at any time on the DB is likely to be in the
millions, with perhaps 25% of them being suitable for extraction at any
given batch run (ie several hundred thousand to several million).

Currently, the proposal is to grab several hundred rows from PvA (thus
avoiding issues with the resultset being very large), and then process
each of them by selecting the related rows in PvB (again, several
hundred rows at a time to avoid problems with large result sets).

So the algorithm is basically:

Select the next 200 rows from PvA

For each PvA row Do
Write current PvA row as XML

Do
Select the next 200 rows from PvB

For each PvB row Do
Write current PvB row as XML
within the parent PvA XML Element
End For
While More Rows
End For
While More Rows

However, this has a fairly significant performance impact, and I was
wondering if there was a faster way to do it (something like taking a
dump of the tables so they can be worked on offline - but a basic dump
means we have lost the 1:M relationships between PvA and PvB).

Are there any tools/tricks/tips with regards to extracting large volumes
of data across related tables from Postgres? It doesnt have to export
into XML, we can do post-processing on the extracted data as needed -
the important thing is to keep the relationship between PvA and PvB on a
row-by-row basis.

Many thanks,

Damien

Responses

Re: Help with extracting large volumes of records across related tables at 2004-09-13 12:10:23 from Paul Thomas
Re: Help with extracting large volumes of records across related at 2004-09-13 12:58:57 from Mischa Sandberg

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Paul Thomas	2004-09-13 12:10:23	Re: Help with extracting large volumes of records across related tables
Previous Message	Christopher Browne	2004-09-13 09:21:42	Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables