Skip site navigation (1) Skip section navigation (2)

Re: caching query results

From: wieck(at)debis(dot)com (Jan Wieck)
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
Cc: Jan Wieck <wieck(at)debis(dot)com>, Adriaan Joubert <a(dot)joubert(at)albourne(dot)com>, pghackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: caching query results
Date: 2000-04-04 14:27:37
Message-ID: m12cUIr-0003lKC@orion.SAPserv.Hamburg.dsh.de (view raw or flat)
Thread:
Lists: pgsql-hackers
Karel Zak wrote:

> On Mon, 3 Apr 2000, Jan Wieck wrote:
>
> It is good idea. What exactly is a key? If I good understand this key
> is for query identification only. Right?

    Right.  Imagine  a querytree (after overhaul) that looks like
    this:

                       +------+
                       | SORT |
                       +------+
                          ^
                          |
           +-----------------------------+
           | JOIN                        |
           | atts: rel1.att1, rel2.att2  |
           | qual: rel1.att2 = rel2.att1 |
           +-----------------------------+
                ^                     ^
                |                     |
      +------------------+  +------------------+
      | SCAN             |  | SCAN             |
      | rel:  rel1       |  | rel:  rel2       |
      | atts: att1, att2 |  | atts: att1, att2 |
      +------------------+  +------------------+

    which is a node structure describing a query of:

    SELECT rel1.att1, rel2.att2 FROM rel1, rel2
        WHERE rel1.att2 = rel2.att1;

    The "key" identifying this querytree now could look like

    SORT(JOIN(1.1,2.2;SCAN(78991;1,2),SCAN(78995;1,2);))

    78991 and 78995 are the OIDs of rel1 and rel2. So the key  is
    a  very  simplified  description  of what the query does, and
    maybe the qualification should  be  included  too.  But  it's
    enough to find a few candidates to look at closer on the node
    level out of hundreds of cached plans.

> >     These keys could be managed in a shared LRU table, and if the
>
> My current code is based on HASH table with keys and query&plan is
> saved in special for a plan created MemoryContext (it is good for
> a example SPI_freeplan()).

    IIRC our hash table code insists on using global, per backend
    memory.   I thought about managing the entire querycache with
    a new type of memory context, using  different  routines  for
    palloc()/pfree(),  working  in  a shared memory area only and
    eventually freeing  longest  unused  plans  until  allocation
    fits.  Let's  see  if using hash tables here would be easy or
    not.

> >     same  key  appears  a  number  of  times  (0-n),  it's entire
> >     querytree + plan (after planning)  will  be  saved  into  the
> >     shared  mem.
>
> Here I not understend. Why is here any time checking?

    There's not that big of a win if you do all the shared memory
    overhead  for  any  query  at  it's  first  occurance. Such a
    generic query cache only makes sense for queries  that  occur
    often.  So  at  it's first to n-th occurance we only count by
    key and after we know that it's one  of  these  again'n'again
    thingies, we pay the cache overhead.

    Also  I  think,  keeping  the number of exclusive cache locks
    (for writing) as small as possible would be a good  idea  WRT
    concurrency.

> IMHO users can use PREPARE / EXECUTE for same query. Suggested idea is
> really good if this query cache will in shared memory and more backends
> can use it.

    Exactly  that's  the idea. And since the postmaster will hold
    the shared memory as it does  for  the  block  and  syscache,
    it'll survive even times of no DB activity.

> Good. It is solution for 'known-query' and allow it skip any steps in the
> query path. But we still not have any idea for cached plans validity. What
> if user changes oid for any operator, drop column (etc)?

    That's  why  the  key  is only good to find "candidates". The
    cacheing has to look very close to the nodes in the tree  and
    maybe  compare  down  to pg_attribute oid's etc. to decide if
    it's really the same query or not.

> Is really sure that this will faster? (it must create key for nodes,
> search same query in any table (cache), copy new query&plan to cache
> ..etc.)

    Only some timing code put into backends in various real world
    databases  can tell how much of the entire processing time is
    spent in the optimizer.

    And I'd not be surprised if most of the time is already spent
    during   the  parse  step,  which  we  cannot  skip  by  this
    technique.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #



In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2000-04-04 14:42:35
Subject: Re: Should pg_dump refuse to run if DB has different version?
Previous:From: Bruce MomjianDate: 2000-04-04 13:50:02
Subject: BSD/OS regression on 7.0

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group