Quick Links

Re: [HACKERS] sorting big tables :(

From:	Michael Richards <miker(at)scifair(dot)acadiau(dot)ca>
To:	Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [HACKERS] sorting big tables :(
Date:	1998-05-20 00:02:38
Message-ID:	Pine.BSF.3.96.980519205445.20230A-100000@scifair.acadiau.ca
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, 17 May 1998, Bruce Momjian wrote:

> > > > I have a big table. 40M rows.
> > > > On the disk, it's size is:
> > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
> > > > How should one decide based on table size how much room is needed?
>
> Tape sort is a standard Knuth sorting. It basically sorts in pieces,
> and merges. If you don't do this, the accessing around gets very poor
> as you page fault all over the file, and the cache becomes useless.
Right. I wasn't reading the right chapter. Internal sorting is much
different than external sorts. Internal suggests the use of a Quicksort
algorithim.
Marc and I discussed over lunch. If I did a select * into, would it not
make more sense to sort the results into the resulting table rather than
into pieces and then copy into a table? From my limited knowlege, I think
this should save 8/7 N the space.
In this issue, I think there must be a lot more overhead than necessary.
The table consists of only
int4, int4, int2
I read 10 bytes / row of actual data here.
Instead, 40M/2gigs is about
50 bytes / record
What is there other than oid (4? bytes)

-Mike

In response to

Re: [HACKERS] sorting big tables :( at 1998-05-17 04:22:36 from Bruce Momjian

Responses

Re: [HACKERS] sorting big tables :( at 1998-05-20 01:50:34 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas G. Lockhart	1998-05-20 01:07:57	Re: [DOCS] Re: FE/BE protocol revision patch
Previous Message	Bruce Momjian	1998-05-19 19:36:30	Re: [HACKERS] Cancell/OOB over a Unix Domain Socket