Quick Links

Re: Merge algorithms for large numbers of "tapes"

From:	"Zeugswetter Andreas DCP SD" <ZeugswetterA(at)spardat(dot)at>
To:	"Dann Corbit" <DCorbit(at)connx(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Merge algorithms for large numbers of "tapes"
Date:	2006-03-10 08:57:28
Message-ID:	E1539E0ED7043848906A8FF995BDA579D99381@m0143.s-mxs.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Two pass will create the count of subfiles proportional to:
> Subfile_count = original_stream_size/sort_memory_buffer_size
>
> The merge pass requires (sizeof record * subfile_count) memory.

That is true from an algorithmic perspective. But to make the
merge efficient you would need to have enough RAM to cache a reasonably
large block per subfile_count. Else you would need to reread the same
page/block from one subfile multiple times.
(If you had one disk per subfile you could also rely on the disk's own
cache,
but I think we can rule that out)

> Example:
> You have a 7 gigabyte table to sort and you have 100 MB sort buffer.
> The number of subfiles will be:
> 7000000000 / 100000000 = 70 files

To be efficient you need (70 + 1) \* max(record_size, 256k) = 18 Mb

Plus you need a structure per subfile that points to the current record
in the buffer.

Andreas

Responses

Re: Merge algorithms for large numbers of "tapes" at 2006-03-10 09:44:33 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2006-03-10 09:41:17	Function's final statement must not be a SELECT
Previous Message	王宝兵	2006-03-10 06:34:13	Re: Where Can I Find The Code Segment For WAL Control?