Skip site navigation (1) Skip section navigation (2)

Re: [PERFORM] A Better External Sort?

From: Ron Peacetree <rjpeace(at)earthlink(dot)net>
To: Zeugswetter Andreas DAZ SD <ZeugswetterA(at)spardat(dot)at>,pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] A Better External Sort?
Date: 2005-09-30 02:57:19
Message-ID: 6812451.1128049039096.JavaMail.root@elwamui-polski.atl.sa.earthlink.net (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
>From: Zeugswetter Andreas DAZ SD <ZeugswetterA(at)spardat(dot)at>
>Sent: Sep 29, 2005 9:28 AM
>Subject: RE: [HACKERS] [PERFORM] A Better External Sort?
>
>>In my original example, a sequential scan of the 1TB of 2KB 
>>or 4KB records, => 250M or 500M records of data, being sorted 
>>on a binary value key will take ~1000x more time than reading 
>>in the ~1GB Btree I described that used a Key+RID (plus node 
>>pointers) representation of the data.
>
>Imho you seem to ignore the final step your algorithm needs of
>collecting the data rows. After you sorted the keys the collect
>step will effectively access the tuples in random order (given a 
>sufficiently large key range).
>
"Collecting" the data rows can be done for each RAM buffer full of
of data in one pass through RAM after we've built the Btree.  Then
if desired those data rows can be read out to HD in sorted order
in essentially one streaming burst.  This combination of index build
+ RAM buffer rearrangement + write results to HD can be repeat
as often as needed until we end up with an overall Btree index and
a set of sorted sublists on HD.  Overall HD IO for the process is only
two effectively sequential passes through the data.

Subsequent retrieval of the sorted information from HD can be
done at full HD streaming speed and whatever we've decided to
save to HD can be reused later if we desire.

Hope this helps,
Ron

pgsql-performance by date

Next:From: Luke LonerganDate: 2005-09-30 04:22:27
Subject: Re: [PERFORM] A Better External Sort?
Previous:From: Ron PeacetreeDate: 2005-09-30 02:03:26
Subject: Re: [PERFORM] A Better External Sort?

pgsql-hackers by date

Next:From: Luke LonerganDate: 2005-09-30 04:22:27
Subject: Re: [PERFORM] A Better External Sort?
Previous:From: Ron PeacetreeDate: 2005-09-30 02:03:26
Subject: Re: [PERFORM] A Better External Sort?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group