Re: 8.2 features status

From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 8.2 features status
Date: 2006-08-05 07:15:09
Message-ID: D3009F10-74C3-4A9B-9C92-BB73759392A1@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

If people are going to start listing features they want here's some
things I think would be nice. I have no idea though if they would be
useful to anyone else:

1) hierarchical / recursive queries. I realize it's just been
discussed at length but since there was some question as to whether
or not there's demand for it so I am just weighing in that I think
there is. I have to deal with hierarchy tables all the time and I
simply have several standard methods of dealing with them depending
on the data set / format. But they all suck. I've just gotten use
to using the workarounds since there is nothing else. If you are not
hearing the screams it's just because I think it's just become a fact
of life for most people (unless you're using oracle) that you've just
got to work around it. And everyone already has some code to do this
and they've already done it everywhere it needs to be done. And as
long as you're a little bit clever you can always work around it
without taking a big performance hit. But it would sure be nice to
have next time I have to deal with a tree table.

2) PITR on a per database basis. I think this would be nice but I'm
guessing that the work involved is big and that few people really
care or need it, so it will probably never happen.

3) A further refinement of PITR where some sort of deamon ships small
log segments as they are created so that the hot standby doesn't have
to be updated in 16MB increments or have to wait for some timeout to
occur. It could always be up to the minute data.

4) All the Greenplum Bizgress MPP goodness. In reality (and I don't
know if bizgress mpp can actually do this) I'd like to have a cluster
of cheap boxes. I'd like to install postgres on all of them and
configure them in such a way that it automatically partitions and
mirrors each table so that each piece of data is always on two boxes
and large tables and indexes get divided up intelligently. Sort of
like a raid10 on the database level. This way any one box could die
and I would be fine. Enormous queries could be handled efficiently
and I could scale up by just dropping in new hardware.

Maybe greeenplum has done this. Maybe we will get their changes soon
enough, maybe not. Maybe this sort of functionality will never
happen. My guess is that all the little bit's a pieces of this will
trickle in over the next several years and this sort of setup will be
slowly converged on over time as lot's of little things come
together. Table spaces and constraint exclusion come to mind here as
things that could eventually evolve to contribute to a larger solution.

5) Somehow make it so I NEVER HAVE TO THINK ABOUT OR DEAL WITH VACUUM
AGAIN. Once I get everything set up right everything works great but
I'm sure if there's one thing I think everyone would love it would be
getting postgres to the point where you don't even need to ship
vacuumdb because there's no way the user could outsmart postgres's
attempts to do garbage collection on it's own.

6) genuine updatable views. such that you just add an updatable
keyword when you create the view and it's automagically updatable.
I'm guessing that we'll get something like that, but its real magic
will be throwing an error to tell you when you try to make a view
updatable and it can't figure out how to make the rules properly.

7) allow some way to extract the data files from a single database
and insert them into another database cluster. In many cases it
would be a lot faster to copy the datafiles across the network than
it is to dump, copy dump file, reload.

8) some sort of standard "hooks" to be used for replication. I guess
when the replication people all get their heads together and tell the
core developers what they all need something like this could evolve.

Like I said, postgres more than satisfies my "needs". I am
especially happy when you factor in the cost of the software (free),
and the quality of the community support (excellent).

And you can definitely say that the "missing" list is shrinking. But
I think of it like this. There are tiers of database functionality
that different people need:
A) Correct me if I'm wrong but as great as postgres is there are
still people out there that MUST HAVE Oracle or DB2 to get done what
they need to get done. They just do things that the others can't.
They may be expensive. They may suck to use and administer but the
simple fact is that they have features that people need that are not
offered in less expensive databases.
B) Very, very powerful databases but lack the biggest, most
complicated "enterprise" features.
C) Light weight db for taking care of the basic need to store data
and query it with sql. (some would call these "toy" databases)
D) databases which are experimental, unreliable or have other limits
that make them not practical compared with the other options

I would say that with version 7.0 postgres moved from D to C (please
don't get offended if this is way off base, I never used 6.x but I
heard it was prone to crashes, data corruption and of course there
was that pesky row size limit). It then proceeded to move up within
tier C to become the best of it's class and pushing up into level B.
With 8.0 it was firmly in level B. It was fast, efficient, powerful
and began adding lots of really, really big features like PITR,
savepoints, tablespaces, etc. Add ons like slony also allowed it to
be used in places where it otherwise wouldn't have measured up.

Now there are only a few features left in the B range and so there
are tons of situations that can be taken care of by postgres now that
were out of it's reach just a few years ago. Once those features are
all gone there will still be some very big, very difficult features
on the table that once completed will begin to remove any advantage
that the really big guys have. I'm thinking especially of #4 above
here. But they will definitely take a while.

I may have tons of details wrong here but my point is that I think
that postgres isn't just taking stuff off a big to do list, but
rather is pushing itself upwards and is now in a position to start
working on some very hard problems that once completed will put it
into a very elite class of database systems. The "missing" list for
tier B type problems is shrinking down to almost nothing and items
from the tier A missing list are starting to come into view.

Maybe I'm way off base here but that's how I see it. Postgres has
come a long, long way, but the problems ahead are bigger and meaner
than the ones behind.

On Aug 4, 2006, at 12:02 AM, David Fetter wrote:

> On Fri, Aug 04, 2006 at 12:37:10AM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> To me new things are like PITR, Win32, savepoints, two-phase
>>> commit, partitioned tables, tablespaces. These are from 8.0 and
>>> 8.1. What is there in 8.2 like that?
>>
>> [ shrug... ] Five out of your six items have no basis in the SQL
>> spec. So it's not clear to me what your definition of "major
>> feature" is, unless maybe it's "anything except what we did for
>> 8.2". Can you enumerate ten things you would consider comparable to
>> the above features that aren't done yet?
>
> First, I'd like to say people are doing a fantastic job here. Kudos!
>
> One huge thing missing from the "done" list is that crucial bit of
> infrastructure and process that has shortened feedback loops--hence
> the beta period--by weeks if not months: the build farm. It's now
> smoothly integrated into the development process, and as a
> consequence, we can realistically have a release each year. :)
>
> As far as big missing features go, here's a short list:
>
> * Splitting queries among CPUs--possibly even among machines--for OLAP
> loads
>
> * In-place upgrades (pg_upgrade)
>
> * Several varieties of replication, which I believe we as a project
> will eventually endorse and ship
>
> * CALL
>
> * WITH RECURSIVE
>
> * MERGE
>
> * Windowing functions
>
> * On-the-fly in-line calls out to PL/your_choice without needing to
> issue DDL
>
> * Wild-eyed feral bits of the SQL standard like SQL/MED and SQL/XML
>
> But all that leaves out the oldest, most honored Postgres tradition:
>
> Breaking New Ground.
>
> We're definitely not done yet. :)
>
> Cheers,
> D
> --
> David Fetter <david(at)fetter(dot)org> http://fetter.org/
> phone: +1 415 235 3778 AIM: dfetter666
> Skype: davidfetter
>
> Remember to vote!
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oliver Jowett 2006-08-05 07:39:48 Re: [HACKERS] [PATCHES] log_statement output for protocol
Previous Message Rick Gigger 2006-08-05 05:28:04 Re: pg_upgrade (was: 8.2 features status)