Skip site navigation (1) Skip section navigation (2)

Re: Does "correlation" mislead the optimizer on large

From: Ron Mayer <ron(at)intervideo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: Does "correlation" mislead the optimizer on large
Date: 2003-01-24 23:09:19
Message-ID: Pine.LNX.4.44.0301241417140.4023-100000@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-performance
On Fri, 24 Jan 2003, Tom Lane wrote:
>
> Ron Mayer <ron(at)intervideo(dot)com> writes:
> > A proposal.... (yes I I'm volunteering if people point me in the right 
> > direction)...
> 
> I do not think ANALYZE is the problem here; at least, it's premature to
> worry about that end of things until you've defined (a) what's to be
> stored in pg_statistic, and (b) what computation the planner needs to
> make to derive a cost estimate given the stats.

Cool.  Thanks for a good starting point.  If I wanted to brainstorm
further, should I do so here, or should I encourage interested people
to take it off line with me (ron(at)intervideo(dot)com) and I can post
a summary of the conversation?

       Ron

For those who do want to brainstorm with me, my starting point is this:

 With my particular table, I think the main issue is still that I have a 
 lot of data that looks like:

  values:    aaaaaaaaaaabbbbbbbbccccccccddddddddddaaaabbbbbbbccccccccddddd...
  disk page: |page 1|page 2|page 3|page 4|page 5|page 6|page 7|page 8|page 9|

 The problem I'm trying to address is that the current planner guesses 
 that most of the pages will need to be read; however the local clustering
 means that in fact only a small subset need to be accessed.  My first
 guess is that modifying the definition of "correlation" to account for
 page-sizes would be a good approach.

 I.e. Instead of the correlation across the whole table, for each row
 perform an auto-correlation 
 (http://astronomy.swin.edu.au/~pbourke/analysis/correlate/)
 and keep only the values with a "delay" of less than 1 page-size.

If you want to share thoughts offline (ron(at)intervideo(dot)com), I'll gladly
post a summary of responses here to save the bandwidth of the group.




In response to

pgsql-performance by date

Next:From: Josh BerkusDate: 2003-01-24 23:22:28
Subject: Re: Multiple databases one directory
Previous:From: Noah SilvermanDate: 2003-01-24 22:39:42
Subject: Multiple databases one directory

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group