Re: [INTERFACES] need information about storing methods

From: "Gene Selkov Jr(dot)" <selkovjr(at)mcs(dot)anl(dot)gov>
To: Christian Saldarini <cristian(at)gemini(dot)como(dot)polimi(dot)it>, pgsql-interfaces(at)postgreSQL(dot)org
Cc: pgsql-general(at)postgreSQL(dot)org
Subject: Re: [INTERFACES] need information about storing methods
Date: 1998-11-11 16:56:16
Message-ID: 199811111655.KAA26844@antares.mcs.anl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-interfaces

(copying to GENERAL because that's where it seems to belong)

> Hi,
> I'm nowadays tryng to use postgresql to store geographical data to use
> them with a gis program (GRASS).
> The coordinates of each point are stored in a field that is a
> point-type.
> Anyone can tell me if postgress uses, for this kind of data, a storing
> method like HHCODE, poin-quad-trees (or anything better) , or it stores
> point-data like anything else?
> A query like this one:
> " SELECT coordinates FROM table WHERE coordinates @ '((1,1),1)'::circle
> "
> is executed like anything else, processing evry tuple in the table ?
> If so, anyone knows how to implement a more complex storing method?
>
> Thanks to all,
> Christian Saldarini

Let's clarify the terms first:

The only STORAGE METHOD used by postgres is a regular file or a set of
files. Objects are stored and retrieved via some sort of mapping
between oids and file pointers. So the answer to the first part of
your question is that all types of objects (except large objects) are
stored in the same way -- that is, as chunks of data within files.

ACCESS METHOD, in postgres lingo, is a way to speed up access to
individual objects by recording their placement within the physical
files (a.k.a. indexing) in a structured way. It generally takes
significantly less steps for a query to traverse an index structure
down to individual objects, than it would take to iterate through the
hole set. There is a class of access methods whose index structures
look like trees (so do the query execution paths). Those implemented
in postgres are B-tree and two-dimensional R-tree. Both methods rely
on a hierarchical grouping of the individual objects into broader
categories, akin to the indexing used by librarians to arrange books
in an orderly fashion. Categories in B-tree are numeric intervals and
in R-tree, they are box unions (majoritarian boxes). Each step of a
query path involves sequential comparison, or evaluation of fitness,
of the target object (criterion) against just a few broad
meta-objects. Once the decision of fitness is made at a certain level,
the query sinks one level below, but this time it compares the target
object to only those meta-objects which are confined within the
meta-object that satisfied the criterion one level above. That way, it
works all the way down to the original objects without making (too
many) futile comparisons.

Closer to the point,

> A query like this one:
> " SELECT coordinates FROM table WHERE coordinates @ '((1,1),1)'::circle

will be evaluated sequentially, provided 'coordinates' is a point and
there is an operator '@' for (point, circle) arguments. The reason is
that the built-in version of R-tree can only index a set of objects of
the same kind, and these objects are boxes. You can improve your query
like this:

SELECT * FROM t WHERE p @ '0,0,2,2' AND p::circle @ '((1,1),1)'::circle;

where p is a box type column that contains zero-area boxes
representing points, like so:

(0.603,0.378),(0.603,0.378)

This query uses an r-tree index:

CREATE INDEX pix ON t USING rtree (p box_ops);

to first reduce the set to the majoritarian box, '0,0,2,2', and then
scan that box for the points inside the circle:

explain SELECT * FROM t WHERE p @ '0,0,2,2' AND p::circle @ '((1,1),1)'::circle;
NOTICE: QUERY PLAN:

Index Scan using pix on t (cost=85.00 size=501 width=32)

For further improvement, read about GiST. It is a generalization of R-tree that allows you build indices for all objects that can be grouped based on the concept of containment. GiST allows you to build both exact and lossy indices. An exact index drives you right to the leaves of the search tree. For objects like polygons, which do not cope well with the concept of containment, majoritarian entities (e.g., boxes) can be used to build R-trees. Such is the lossy index: it leaves out some of the information about the original object (e.g., the number of polygon vertices and their arrangement), but it still remebers some kind of proximity. It will require additional sequential examination of polygons, but in the end, performance of even a lossy index will typically be better that that of a completely sequential search. I think the above query illustrates the concept of lossy indexing well enough.

You might want to see http://wit.mcs.anl.gov/~selkovjr/pggist-patched.tgz
(you will find more references to the original source inside)

For more sophisticated indexing methods, check out Joe Hellerstein's curriculum:

http://epoch.cs.berkeley.edu:8000/personal/jmh/

--Gene

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dustin Sallings 1998-11-11 17:35:38 strange query plans, won't use index
Previous Message A James Lewis 1998-11-11 16:54:42 Re: [GENERAL] rollback question...

Browse pgsql-interfaces by date

  From Date Subject
Next Message Cary B. O'Brien 1998-11-11 17:21:09 ODBC interface problem W/ Applix Data (was Re: AWL: Another Data Editing Problem [was Re: Editing problem in Data])
Previous Message Herouth Maoz 1998-11-11 11:44:22 Re: [INTERFACES] how to obtain column info