Re: Merge algorithms for large numbers of "tapes"

From: "Zeugswetter Andreas DCP SD" <ZeugswetterA(at)spardat(dot)at>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-10 11:59:32
Message-ID: E1539E0ED7043848906A8FF995BDA579D9941F@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > > Two pass will create the count of subfiles proportional to:
> > > Subfile_count = original_stream_size/sort_memory_buffer_size
> > >
> > > The merge pass requires (sizeof record * subfile_count) memory.
> >
> > That is true from an algorithmic perspective. But to make the merge
> > efficient you would need to have enough RAM to cache a reasonably
> > large block per subfile_count. Else you would need to
> reread the same
> > page/block from one subfile multiple times.
> > (If you had one disk per subfile you could also rely on the
> disk's own
> > cache, but I think we can rule that out)
>
> But what about the OS cache? Linux will read upto the next
> 128KB of a file if it's contiguous on disk, which is likely
> with modern filesystems. It's likely to be much "fairer" than
> any way we can come up with to share memory.

We were discussing how much RAM is needed, and not how much
the backend allocates itself. So if the backend needs to duplicate some
of the OS cache, that will only add to the memory requirement.
The most likely scenario is, that the backend additionally holds one
page
per subfile.

> Question is, do we want our algorithm to rely on that caching?

Currently we do, and I don't think that is so bad actually.
The only optimization I would consider, is adding a sequential access
hint
to the "tape file :-)" open.

Andreas

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-03-10 14:31:44 Re: problem with large maintenance_work_mem settings and
Previous Message Matteo Beccati 2006-03-10 11:36:13 Re: Enhanced containment selectivity function