Re: Escaping the ARC patent

From: "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Escaping the ARC patent
Date: 2005-02-03 22:37:51
Message-ID: 4202A7BF.5090505@tvi.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm familiar with the 2Q algorithm. I also remember seeing, I believe,
a public domain 2Q implementation floating around somewhere.

Tom Lane wrote:

>I've been doing a bit of research on $subj, and coming to the conclusion
>that the ARC patent is a lot narrower than it might appear. In fact
>most of the parts of the algorithm that we actually want have prior art.
>I looked in particular at Johnson and Shasha's well-known "2Q" paper,
>published in 1994 (http://citeseer.ist.psu.edu/63909.html). This paper
>describes the use of two lists, which they call A1 and Am (as opposed to
>ARC's T1 and T2) but the basic principle is the same: a page goes into
>A1 on first use, and doesn't get to Am unless used a second time before
>aging out of the cache. 2Q also includes a list of pages that have
>recently been in the cache but no longer are. So the actually
>patentable parts of ARC are just some rather narrow decisions about the
>management of these lists, in particular the use of a target T1len to
>dynamically adapt the sizes of the lists. The 2Q paper proposes using
>fixed fractions of the total available space for each list --- and it
>includes statistics showing that the algorithm isn't excessively
>sensitive to the exact values used, so ARC's claimed "self tuning"
>advantage isn't all that great after all.
>
>These conclusions are borne out by a close reading of the patent
>application (which is at
>http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220040098541%22.PGNR.&OS=DN/20040098541&RS=DN/20040098541
>if you want to look for yourself). Claim 1 reads
>
>1. A method for adaptively managing pages in a cache memory with a
>variable workload, comprising: maintaining the cache memory into a first
>list L1 and a second list L2; wherein the cache memory has a capacity to
>store c pages; and adaptively distributing the workload between the
>first list L1 and the second list L2, to a total capacity of c pages.
>
>Given the prior art, the critical word in this sentence is "adaptively";
>take that out and you have nothing that wasn't published long before.
>If we remove the adaptivity --- ie, just use a fixed division of list
>sizes --- we escape claim 1 and all the other claims that depend on it.
>
>The only other claim that isn't dependent on claim 1 or a restatement of
>it is
>
>45. A method for adaptively managing pages in a memory, comprising:
>defining a cache memory; defining a cache directory; organizing the
>cache directory into fours disjoint lists of pages: list T1, list T2,
>list B1, and list B2; and wherein the cache memory contains pages that
>are members of any of the list T1 or the list T2.
>
>So if we use non-variable sizes of T1/T2 and don't use the four-way
>list structure to manage remembrance of pages-formerly-in-cache,
>we escape the patent. But we still have scan resistance, which is the
>main thing that ARC was going to buy us. Pages that are scanned only
>once don't get out of A1 and so aren't able to swamp out pages
>referenced multiple times.
>
>After reading the 2Q paper my inclination is to use exactly Johnson and
>Shasha's "simplified 2Q" algorithm, which uses just A1 and Am with no
>remembrance of formerly cached pages. Their "full 2Q" algorithm strikes
>me as a tad bizarre because it will only promote a page into Am after it
>has fallen out of A1, ie, it takes two physical reads of the page to
>get into Am. That's just weird. I think that pages should advance from
>A1 into Am on second reference. Given that, you don't need any
>remembrance of pages that were formerly in A1, which basically halves
>the memory overhead of the ARC algorithm.
>
>An advantage of heading in this direction (as opposed to, say, LRU/k
>or other algorithms) is that this represents a direct simplification
>of the ARC code we have now. We can probably implement it almost
>entirely by deletions from freelist.c, with little newly written code.
>That gives me a whole lot more confidence that the result will be
>reliable enough to back-patch into 8.0.*.
>
>Comments?
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-02-03 22:49:28 Re: Escaping the ARC patent
Previous Message Tom Lane 2005-02-03 22:27:29 Escaping the ARC patent