Re: Statistic Estimation in PostgreSQL

From: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To: rahulg <rahulgupta83(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Statistic Estimation in PostgreSQL
Date: 2008-10-15 19:56:04
Message-ID: 48F64AD4.8080401@students.mimuw.edu.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

rahulg wrote:
> I am facing problem in tracing in what events the selectivity
> histogram in pg_statistic is stored/updated.
>
> I went through the code in src/backend/commands/analyze.c and got to
> see the code computing the histogram but when I tried to trace the
> caller of analyze_rel or compute_index_stats, I find out that only
> during Vaccum command, these functions are invoked. I am little
> surprised as I did get to see PostgreSQL giving good selectivity
> estimations despite me running Vaccuum command even once.
>
> I did try to trace the control flow during table operations
> (creation / insertion) but couldn't find any histogram updation
> functions being invoked though I did get to know that some functions
> related to catalog updations is being invoked.
>
> Can anybody tell me in what all events is the pg_statistic table
> exactly updated?

The flow is like this:
analyze_rel() determines the type-specific analyze function to call for
each column:
http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l270

the function OID gets pulled from pg_type, from the typanalyze column:
http://www.postgresql.org/docs/8.3/static/catalog-pg-type.html

currently there is only one datatype that has a custom analyze function
(tsvector). All others get analyzed using the standard one:
http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l1489

depending on whether the datatype has a < operator or not, the column is
analyzed using compute_scalar_stats() or compute_minimal_stats().
The actual call of the chosen function happens here:
http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l425

And the write to pg_statistic happens here:
http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l446

HTH,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2008-10-15 19:58:16 Re: Deriving Recovery Snapshots
Previous Message Alvaro Herrera 2008-10-15 19:25:33 Re: Statistic Estimation in PostgreSQL