Skip site navigation (1) Skip section navigation (2)

Re: column without pg_stats entry?!

From: Bernd Heller <bdheller(at)users(dot)sourceforge(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org,Bruno Wolff III <bruno(at)wolff(dot)to>
Subject: Re: column without pg_stats entry?!
Date: 2005-01-21 09:55:00
Message-ID: 83971C76-6B92-11D9-A12E-000A957B8C6E@users.sourceforge.net (view raw or flat)
Thread:
Lists: pgsql-performance
Ah no, I think both of you have mistaken me. The problem here is not 
about partial indexes (not really anyway).
I do have a partial index with "WHERE purge_date IS NOT NULL", and my 
query does contain "WHERE purge_date IS NOT NULL" as well. The problem 
here is, that all rows (or almost all) have the column purge_date set 
to NULL. The planner expected the query to return 33% of all rows in 
the table. So it made the seq scan MUCH cheaper, which was right in the 
planner's way of thinking because it didn't know anything about the 
column from pg_stats.

I had a look at the source code of the analyze command meanwhile:
the compute_*_stats functions don't return valid statistics if they 
can't find any non-null values, and as a result no statistics tuple for 
that column is created in pg_stats. I think this is wrong. Not finding 
any non-null values IS a very useful information, it means a 
null-fraction of 100%. I have patched my postgres to return valid 
statistics even in that case (patch below).
The difference now is that the planner doesn't assume anymore it would 
get about 33% of rows back, instead it knows that the null-fraction of 
that column is approximately 1.0 and it chooses the index scan because 
that is now the by far cheapest plan.

--- analyze.c   Thu Jan 20 11:37:58 2005
+++ analyze.c.orig      Sun Nov 14 03:04:13 2004
@@ -1704,9 +1704,6 @@
                         stats->stavalues[0] = mcv_values;
                         stats->numvalues[0] = num_mcv;
                 }
-       } else {
-               stats->stats_valid = true;
-               stats->stanullfrac = 1.0;
         }

         /* We don't need to bother cleaning up any of our temporary 
palloc's */
@@ -2164,9 +2161,6 @@
                         stats->numnumbers[slot_idx] = 1;
                         slot_idx++;
                 }
-       } else {
-               stats->stats_valid = true;
-               stats->stanullfrac = 1.0;
         }

         /* We don't need to bother cleaning up any of our temporary 
palloc's */


On 21.01.2005, at 7:02 Uhr, Tom Lane wrote:

> Bruno Wolff III <bruno(at)wolff(dot)to> writes:
>>   Bernd Heller <bdheller(at)users(dot)sourceforge(dot)net> wrote:
>>> there is no entry in pg_stats for that column at all!! I can only
>>> suspect that this has to do with the column being all null.
>
>> Someone else reported this recently and I think it is going to be 
>> fixed.
>
> Yeah, this was griped of a little bit ago, but I felt it was too close
> to 8.0 release to risk fooling with for this cycle.
>
>> In the short run you could add an IS NOT NULL clause to your query.
>> The optimizer doesn't know that < being TRUE implies IS NOT NULL and
>> so the partial index won't be used unless you add that clause 
>> explicitly.
>
> Actually, as of 8.0 the optimizer *does* know that.  I'm a bit 
> surprised
> that it didn't pick the partial index, since even without any analyze
> stats, the small physical size of the partial index should have clued 
> it
> that there weren't many such tuples.  Could we see EXPLAIN output for
> both cases (both settings of enable_seqscan)?
>
> 			regards, tom lane
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


In response to

pgsql-performance by date

Next:From: Yann MichelDate: 2005-01-21 12:30:08
Subject: Re:
Previous:From: Andrei BintintanDate: 2005-01-21 09:20:48
Subject: Re: [SQL] OFFSET impact on Performance???

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group