New WIP patch for cross column statistics Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject: New WIP patch for cross column statistics Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
Date: 2011-08-04 12:13:18
Message-ID: 4E3A8CDE.7020409@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2011-04-28 17:20 keltezéssel, Alvaro Herrera írta:
> Excerpts from Boszormenyi Zoltan's message of jue abr 28 11:03:56 -0300 2011:
>> Hi,
>>
>> attached is the WIP patch for cross-column statistics and
>> extra expression statistics.
>>
>> My question is that why pg_node_tree is unusable as
>> syscache attribute? I attempted to alias it as text in the patch
>> but I get the following error if I try to use it by setting
>> USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c.
>> Directly using the underlying pg_statistic3 doesn't cause an error.
> Two comments:
> 1. it seems that expression stats are mostly separate from cross-column
> stats; does it really make sense to submit the two in the same patch?
>
> 2. there are almost no code comments anywhere
>
> 3. (bonus) if you're going to copy/paste pg_attribute.h verbatim into
> the new files, please remove the bits you currently have in "#if 0".
> (Not to mention the fact that the new catalogs seem rather poorly
> named).

OK, we went to a different route this time. Here is what we came
up with. Attached are two patches.

attnum-int2vector.patch implements:

- int2vector support routines and catalog entries for them
- pg_statistic is modified so "staattnum int2" it converted to
"staattnums int2vector". RemoveStatistics() is modified to take
an array of AttrNumber and the length of it.
- pg_attribute.attstattarget is moved to pg_statistic.statarget,
pg_statistic gains a new "stavalid" bool field. Two support routines
are added: AddStatistics() and InvalidateStatistics(). Entries
in pg_statistic for table columns are created upon table creation
and ALTER TABLE ADD COLUMN and maintained for the lifetime
of the column. Exceptions are system tables: calling AddStatistics()
for them during initdb is a Catch-22 when pg_statistic doesn't yet
exist. For these, ANALYZE creates the pg_statistic record just
as before. ALTER TABLE ALTER COLUMN SET DATA TYPE
only invalidates the record by setting "stavalid" to false.
- Factor out common code for getting the statistics tuple into a
new function called validate_statistics().

cross-col-syntax.patch builds on the first patch and implements:

CREATE CROSS COLUMN STATISTICS ON TABLE tabname (col, ...)
[ WITH ( statistics_target ) ] ;

DROP CROSS COLUMN STATISTICS ON TABLE tabname (col, ...) ;

CREATE CROSS COLUMN STATISTICS ON INDEX idxname
[ WITH ( statistics_target ) ] ;

DROP CROSS COLUMN STATISTICS ON INDEX idxname ;

and puts new records into pg_statistic with array_length(staattnums, 1) > 1.
Note: this patch should record dependencies on the respective table or
index and the fields but doesn't.

The data structure for storing the N-dimension histogram is not yet decided.

Comments?

Best regards,
Zoltán böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

Attachment Content-Type Size
attnum-int2vector.patch text/plain 73.2 KB
cross-col-syntax.patch text/plain 15.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-08-04 12:32:47 Re: TRUE/FALSE vs true/false
Previous Message Boszormenyi Zoltan 2011-08-04 10:08:21 TRUE/FALSE vs true/false