Skip site navigation (1) Skip section navigation (2)

Re: Dreaming About Redesigning SQL

From: "Anthony W(dot) Youngman" <thewolery(at)nospam(dot)demon(dot)co(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dreaming About Redesigning SQL
Date: 2003-11-04 23:15:21
Message-ID: TQt3dLAJMDq$ (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
In article <bnhk4n$i3t$1(at)nyytiset(dot)pp(dot)htv(dot)fi>, Lauri Pietarinen
<lauri(dot)pietarinen(at)atbusiness(dot)com> writes
>Anthony W. Youngman wrote:
>>In article <bn72o3$as$1(at)nyytiset(dot)pp(dot)htv(dot)fi>, Lauri Pietarinen <lauri.pie
>>tarinen(at)atbusiness(dot)com> writes
>>>Anthony W. Youngman wrote:
>>>>In article <bn4cca$dj0$1(at)nyytiset(dot)pp(dot)htv(dot)fi>, Lauri Pietarinen
>>>><lauri(dot)pietarinen(at)atbusiness(dot)com> writes

>>>>>Well, if it is normalised, how easy is it for you to change the 
>>>>>customer_id of an order?  Anyway,

>>>>Incredibly easy. Just update the "customer_id" field of the invoice
>>>>record. A single change to a single "row"
>>>And I presume the system will automatically move all related stuff 
>>>(order details etc.) into
>>>the same block as the new customer?  How long will that take? What if 
>>>there is no room for it there?
>>Well, I'd view an order as an entity. As such, I would give it its own
>>FILE, and your question doesn't make sense. 
>But then your formula for disk head movements does not make sense either!

Why not? The order is a real-world "thing", and as such I would have an
ORDERS file, in which each order is a single entry, with "customer_id"
as one of its attributes. "order detail" is an attribute of "order", so
if I change "customer_id" it's the relational equivalent of just
changing one cell in one row. The chances of me having to move the
record is pretty near nil, and if I do it won't change bucket so at most
it involves two frames (or disk blocks, if that's what you want to call
>>But if the system did move
>>the stuff, it would be four disk accesses - read/write to delete the old
>>entry, read/write to save the new. As for "enough room" - well - it'll
>>fall over if we have a "disk full" (or it might not).
>"Not enough room" here means not enought room in the block of the 
>customer (from which you
>were supposed to get all data in one read, or disk head movement).  That 
>would mean that your
>order information would be moved perhaps to another block and result in 
>an extra head movement,
>or am I right?

Which I've taken in to account - if there isn't enough room in the
original "bucket", I need to either overflow into the next bucket which
might exist, or to create it if it doesn't. Ie two head movements to
delete from the first bucket, and two head movements to add to the

And it will only fall over if I need to create a new bucket and there's
no space left on the disk (or if (and this is very unlikely in this
scenario) it triggers a "split" which again needs space and there's none
left on disk).

Or have you not sussed that we view "order detail" as an attribute of
"order" (which is therefore stored as part of the same thing), but
"customer" is separate from "order", is stored separately, and is linked
by a relationship. (Whereas "order detail" is NOT related to "order",
because they are part of the same thing :-)
>>Well, it does result in data being stored multiple times ;-)
>What on earth is wrong with that?  Do you know how much 160GB of disk 
>cost's today?
>I could ask:  does your system work in, say 4KB?  That's how much memory 
>the first
>computer  I used (a Wang 2000)  had.  Probably it would not work at 
>all.  In the 50's
>they did amazing things with hardly any compilers and very little 
>memory. I am referring
>to Whirlwind. See
>Could you have done that with MV?  My point?  Why are we discussing 
>to memory and CPU speed of the 70's and 80's?  If an SQL DBMS uses more 
>and disk, and it is available, why complain about *that*.  Im not 
>impying that you
>cannot complain about other matters, e.g. ease of development etc. and 
>you might
>even be right.  Be it as it is,  I am not trying to make you abandon 
>your MV database.

As always, you're relying on hardware to help :-) You know what I think
of that :-) And 160Gb of disk is only cheap if you're using IDE on a
desktop PC - it costs a hell of a lot more for a laptop or SCSI for a
server. And if it's embedded it maybe that the *room* is expensive, not
the capacity ...
>>>>>>>And: what if I was just reading customer-data.  Would the same formula
>>>>>>>apply (= (2+N)*ST*1.05)?
>>>>>>Nope. If I understand you correctly, you want attributes that belong to
>>>>>>the entity "customer", not the entity "invoice". T = ST * 1.05. (By the
>>>>>>way, billing and/or invoice address (for example) are invoice
>>>>>>attributes, not company attributes.)
>>>>>No,  I want you to give me a list of all your customers.  How many disk 
>>>>T = N * 1.05 where N is the number of customers. What do you want to
>>>>know about those customers? Address? Phone number*s*? Anything else?
>>>>That's *all* at no extra cost.
>>>Well, no thanks.  I just wanted their names this time.
>>>The relational alternative, with an index on customer_name, would be 
>>>again an order
>>>of magnitune less disk reads.
>>Well, if you let me use an index here, I'm sorry, GAME OVER! The best
>>you can do would be a photo finish.
>>Assuming an overhead of, say, 4 bytes per index entry, the entire index
>>would be
>>Size = 4 * N + sigma(name_length) + sigma(key_length)
>>Okay, I've probably got some padding there as well, but so will you. And
>>note I didn't say N * field_length, I said sigma(name_length). ;-)
>What is sigma(key_length)?

The pointers from the index to the actual row in the actual table ...

There's no point in having an index if you can't get from the index back
to original full record :-)
>>I notice that at no point have you asked where this strange 1.05 keeps
>>coming from. That's why I keep hammering performance ... okay, maybe
>>I've lost this "order detail" but it's why I keep hammering my
>>confidence in general. 
>Sorry, I lost you here?

I'm trying to push the fact that I can get the data I'm looking for at
almost no cost. Okay, I do expect the hardware to cache things for me,

If your data is relational, almost all accesses will be made with known
primary keys. Given that the key is known, I can get at that data, FIRST
time EVERY time (near enough). And hardware helps me as much as it helps
you. And statistics helps me *more* than you - it's a pretty safe bet
that my access is going to retrieve more data of possible future to me
than yours to you.
>>Given that your data is all ordered optimally for
>>answering this "detail" request, what's it going to cost you in time or
>>disk space to answer the request "please recreate invoice X for me"?
>Well, first of all, the fact that I optimized the "detail" request does 
>not cost me anything regarding
>the other queries.  It will impose a slight cost on updates,  however 
>that would be hardly noticable
>execpt for mass updates (update batch jobs) that change the column value.
Although again you're relying on hardware to bale you out ... adding
this index has a very definite cost in disk storage, and using it has a
cost in RAM while it's sitting in memory.
>>MV stores data efficently - look at how little space the index took :-)
>>It accesses data efficiently - that 1.05 is actually the "how many
>>places do I need to look" value for the database to respond to a
>>userland request, given a known primary key or index value. Okay - that
>>means we push back at the programmer some of the management of data
>>access, but why should that be solely the response of the dbms? If it
>>makes sense for the app to do it, then it should ... why should the dbms
>>have to guess at how to optimise a request if the app has all the
>>necessary information at its fingertips?
>1) Your database might change over time and say a table that originally 
>had only a few rows
>could suddenty grow considerably.  Now an optimiser would insulate you 
>from these changes
>or in the worst case all that would need to be done would be to create 
>an index (and, yes, check
>that the DBMS starts using it).

Except that an optimiser is *irrelevant* to MV. What do we need to be
insulated from? MV doesn't care whether a FILE is 4Kb or 40Gb, the cost
of accessing a single record, AT RANDOM, from within that FILE is almost
identical. Where would we gain from an optimiser? In practice, it would
get in the way and slow us down!
>2) You might have a product that runs in a number of sites: large ones 
>and small
>ones.  Now you would not have to reoptimise the programs for each type site.

*D*O*N*'*T* *N*E*E*D* ONE!!!
>3) Complex SQL-queries do quite a lot of things and it might not be very 
>obvious for
>the programmer how to optimise best.

But a large chunk of SQL's complexity is reassembling a view of an
entity. MV doesn't have that complexity. An MV program views the
database the same way as a programmer views the real world. So it's
pretty obvious to a MV programmer how to optimise things.
>4) depending on input from user (say, a search screen) the optimal 
>access path may be different. An optimiser
>could generate a different path depending on this input.

Again, MV views the entity as a whole, so probably we don't need to
generate a "different path" - it's just "get me this entity" regardless
of what we need to know about it.
>>But surely, your requirement for grabbing data across multiple invoices
>>is statistically unusual. 
>You mean the product department would not be interested in seeing how 
>their products have
>been selling, and to whom?

Of course. But do you really want to know how much you've sold of every
product? What if the stuff went out of production 10 years ago? Surely
you'd more likely want to select order detail by invoice date? Etc etc.

Yep, I bet you could create another index, but suddenly, you're sorting
on product_id and selecting on order_date. Yes, RAM is going to make a
hell of a difference, but surely an *efficient* database underneath is
important :-)
>>And I benefit just as much as you from any ram
>>being available to cache :-) although I wouldn't benefit so much from
>>prefetch. The probability is that consecutive requests for data are
>>either "can I know something else about the entity I've just looked at",
>>or "can I access another entity at random".
>However, if you think of all data relating to a customer, that could 
>amount to, say, 300KB, if
>he had a long history.  Do you think it is a good idea to pull all that 
>into memory just in case
>the user want's to see his history for all 10 recent years?  And there 
>is lot's of different kinds
>of information related to customers.  Would the user want to see 
>everything?  Isn't it more
>probable that a spesific user want's a certain *view* of the customer?

Yes. But why would I pull it *all* in? Bear in mind, the fundamental
element in MV is the entity or RECORD which I would equate to the "row"
in SQL (yes I know relational theory says "tuple").

A customer's history is not one entity. It's a collection of entities
(customer detail, invoices, whatever, multiple entities...) and I'd only
pull in the entities that I wanted.
>>In the former case, if you've stored it in another table, it's another
>>request from the app to the dbms. With MV, it all came in the first
>>request. In the latter case, this is where my 1.05 factor cuts in - bear
>>in mind even for a simple btree file, this factor is only 1 for a
>>1-level root only file - it goes up to 1.5 when the root bucket splits
>>and keeps rising from there :-)
>>So as an engineer, here I am appealing to stats :-) But this is the real
>>world, and no stats? no real world! Because we have no optimiser, it
>>encourages the programmer to optimise - I've heard various people say
>>that if you want a SQL-using app to run fast you mustn't use views -
>>forcing the programmer to interest themselves in the db in a manner that
>>relational says they shouldn't.
>You are correct.  But those result mostly from irregularities of SQL (so 
>I have been told).  They result
>from the fact that SQL does *not* follow the theory, but instead was 
>changed to be more "practical".
>The story of views is not quite so simple.  In some cases views are very 
>useful and can be used safely.
>In other situations you might get ugly surprises.

And we don't get ugly surprises :-)
>Take the view
>create view california_customers as
> select cust_id, cust_name
>    from customer
>    where state = 'CA'
>The query
> select cust_id, from cust_name
>    from california_customers
>    where cust_name like 'A%'
>Will be transformed (under the covers) into
> select cust_id, cust_name
>    from customer
>    where state = 'CA' and
>    cust_name like 'A%'
>But in other cases the view will first be materialised into
>a temporary table and the rest of the query would be
>evaluated on that temp table.
>Understanding when a DBMS knows how to do what
>is not simple, and, hence,  you are quite correct in
>your observation.
>> We're not interested in being able to
>>improve the speed at which the db can find data to respond to an app
>>request - with an access factor of 1.05 (actually, it's nearer 1.02 or
>>1.03) we consider any effort there to be a waste of time ...
>But isn't it better to have NO disk reads than one?  I thought disk I/O 
>was rather expensive?  With
>that mentality you will always be disk bound.

I'm assuming we don't have sufficient RAM to cache stuff ...

Our mentality is to leave disk caching to the OS. The app says "get me
X". The database knows *exactly* where to look and asks the OS to "get
me disk sector Y". Any OS worth its salt will have that cached if it's
been asked for previously recently. That way, we're only caching stuff
that's been accessed recently. But because for us the "atomic" chunk is
an entity, there's a good chance that stuff has been accessed and is in

SQL optimisation *seems* to be more "efficient" because it tries to
predict what you're going to want next. But whereas SQL *guesses* that
because you've accessed one order detail, you're likely to want other
order details from the same invoice (a sensible guess), you cannot
compare this to MV because it gives you those order details as a side
effect. In order for MV optimisation to be of any use, it would need to
guess which INVOICE I'm going to access next, and frankly a random
number generator is probably as good an optimiser as any!
>>Basically, the only way you can beat us in the real world is to throw
>>hardware at the problem - and like I said with linux and macro/micro
>>kernels, we can do the same :-)
>Well, please do!

We do. Which is why we can smoke any relational db for speed unless the
hardware is big enough to store the entire database in RAM (and even
then we'd beat it for speed :-) (just not that much in absolute terms,
although probably a fair bit in percentages :-)
>>Can you improve on what I've just done? Is any improvement POSSIBLE?
>The whole trick is to have as much stuff in memory as possible, and 
>preferably the *right* stuff.  Even
>if we have a small memory the most used rows will remain in memory and 
>hence minimise the need
>for disk I/O.  And writing to disk is nearly always asyncronous so it 
>will not affect response time.
>"The only good I/O is a dead I/O" ;-)
Yep. Which is why our attitude of viewing the world as entities means
we're probably going to smoke you. Statistics says the chances of us
being right "by accident" and avoiding the need for i/o is very high.
While you need artificial intelligence - which has a habit of getting
things wrong :-)

And anyway. Aren't you jumping to conclusions? You are *assuming* that
there is such a thing as the "most used rows". In other words, you are
*assuming* that normal db access only accesses a well-defined subset of
the database. What if there is no way of predicting what the user is
going to want next? Your "trick" is worthless ... the last time it was
accessed is likely to be before the latest reboot ...

And, because we view the world with our "atom" of an entity, we almost
certainly stand a better chance than you of related data "just
happening" to be in RAM when we ask for it ...
>>>>>>But with MV, if our database is too large for current technology, we
>>>>>>kick the shit out of relational for speed ...
>>>What is "too large"? 

Too large to *preload* *everything* into RAM :-)
>>>>>>Don't forget. You've already said that, if nothing is cached, my average
>>>>>>case exceeds your best. And my case is *already* assuming that the
>>>>>>system is seriously stressed and struggling ...
>>>It does?
>>Yes. I'll only be in trouble if I'm so short of ram that my working set
>>gets forced into swap ...
>What if your customer has accumulated, over the years, say 1000 orders?  
>Would you want to pollute
>your cache with all those orders?  Note that this is a problem that you 
>will first accounter after the
>system has been running for quite a long time.  In MV,  what would you 
>do in a situation like this?

Ignore it? Because it's not a problem? The only time it's likely to be a
problem is if the question is "please get all orders for company X". And
even then, provided the ORDERS file is indexed on "customer_id", it's
still just a SINGLE access to the index and we have a list of EVERY
>>>>>>>When we design databases we can decouple logical planning from 
>>>>>>>considerations, which, you must agree, are two separate issues.
>>>>Yes. BUT what's the point of having a database that is logically
>>>>perfect, and who's performance is slow to the point of being unusable?
>>>>Don't forget - in practice MultiValue ends up with a database that is
>>>>*inherently* optimised such that it almost invariably outperforms an
>>>>equivalent SQL database, AND we don't normally have DBAs to help us
>>>>achieve that nirvana ...
>>>Frankly, it may well be that PICK systems run faster and cheaper than 
>>>relational ones, but certainly
>>>not for the reasons you state.
>>Well, could you optimise that index any more?
>Which index?

I was thinking of that customer names index you were talking about.
Which basically consists solely of the names, pointers to the records
they come from, and a bit of empty space. And if I know the name, I can
find the master record in two goes - one hit to read the index (from
which I retrieve the record key), and a second hit on the main file to
retrieve the company record.
>>>>>>I can't find the post now :-( but is Christopher reading this? You know
>>>>>>I compared that relational system on a twin Xeon 800, to an MV system
>>>>>>running on a P90? Christopher made the (reasonable in the circumstances)
>>>>>>assumption that the relational consultants must be crap, and the MV guy
>>>>>>a guru. Actually, I'd come to exactly the OPPOSITE conclusion. My MV
>>>>>>experience tells me that MV query was probably thrown together, by an
>>>>>>average programmer, in 30 seconds. On the other hand, those SQL
>>>>>>consultants had an axe to grind and a point to prove. They couldn't
>>>>>>afford to let this "old fashioned" system beat them. That SQL query
>>>>>>would have been optimised to within an inch of its life over weeks.
>>>>>>Don't forget how proud they were to beat this MV system! Yet with
>>>>>>hardware that was so much more powerful and a query that was heavily
>>>>>>optimised, they had great difficulty beating a query that was thrown
>>>>>>together in seconds by an average MV guy (or even just a luser!).
>>>>>>Don't forget. I said I am a database *engineer*. Engineers believe in
>>>>>>elegance, they believe in beauty. And when I look at relational, all I
>>>>>>see is the theorists pleading "power", "hardware", "brute force", to get
>>>>>>them out of trouble. 
>>>>You said that logical planning and performance are separate issues. And
>>>>I wouldn't expect you to address the above example in a discussion of
>>>>relational, because performance is irrelevant to relational.
>>>I would have to know a lot more details to address it properly.  
>>>Performance is irrelevant to the model.
>>>It's like E=mc**2.  Nice theory and it actually works.  But to get 
>>>performance out of it
>>>(=exploding bomb) you have to solve lots of practical details.  However, 
>>>without the theory
>>>you could experiment for a milloin years without being able to build an 
>>>atom bomb.
>>>>But surely, the fact that I am SUPREMELY CONFIDENT that I can get
>>>>superior performance from inferior hardware should give you pause for
>>>>thought that maybe, just maybe, the relational model is flawed from an
>>>>engineer's or scientist's viewpoint?
>>>That's OK with me.  But the most you can claim is that todays 
>>>IMPLEMENTATIONS are flawed,
>>>and you would be 100% correct. How would you go and prove that the model 
>>>is flawed?
>>>You should prove that a relational DBMS could not POSSIBLY be efficient.
>>Well, if the relational people insist on divorcing theory from
>>implementation, it's hard to see how they can prove it is efficient.
>>While that is exactly what I'm trying to prove for MV. Whether
>>relational is efficient or not is irrelevant, if I can prove MV is
>>efficient and you can't prove the same for relational.
>Well, I know of a lot of large banks, insurance companies etc... that 
>are using SQL DBMS'es
>and I think they are running just fine.  Amazon uses Oracle. Would you 
>say that their performance
>is adequate?  And I have first hand witnessed and built lot's of fast 
>systems that use SQL DBMS'es.

Mebbe. Why did Temenos (a major supplier of banking software) buy jBASE
then? jBASE is an MV database.

Unfortunately (a) marketing budget counts, and (b) marketing budgets can
also set the agenda. Witness that relational theory completely ignores
performance, and look at the trouble I'm having trying to prove to you
that MV is close to the *THEORETICAL* limit of performance (not helped
by my poor grasp of stats :-)

It is a *mathematical* *proof* that you cannot beat Huffman compression.
It shouldn't be that hard to prove that you can't beat MV. It's just
that we're mostly USERS of databases, not database computer scientists.
And, like me, not skilled in the necessary maths.
>>If that results in running SQL over MV then we've won, I think :-) We
>>can do that already ...
>Not really, because it's the SQL that is the "relational" part (well, 
>it's not purely relational).  So
>the funny thing is that, what ever lies below the surface (=whatever 
>engine we are using) relational
>get's all the credit!! Unfair, isn't it? As long as it LOOKS like its 
>relational to the user, it does not really matter
>what happens under the hood.

Yup, it is unfair :-( And yup, it's happening. The more I read about new
advances in how the underlying relational engines work, the more I see
that they are just copying 30-year-old MV technology :-(
>>>>From the mathematician's (or logician's) viewpoint I agree it's
>>>>flawless. But that's true of plenty of broken scientific theories...
>>>Could you give me some other examples?
>>Euclidean Geometry - just look at the equatorial event horizon of a
>>black hole.
>>Newtons laws of motion - just look at Mercury's orbit.
>>Quantum mechanics - just look at a black hole.
>>Relativity - just look at quantum mechanics :-) or Schrodinger's cat.
>>Actually, it's probably true of pretty much all of theoretical physics
>>since the start of last century ... in each case the only thing wrong
>>with the theory is that reality just doesn't happen to agree ...
>Are you suggesting that Newtons theories are totally useless and irrelevant?

No. I'm just suggesting that they DON'T WORK! One only has to look at
the orbit of Mercury to know that's true.

All of those theories work within limits. But if you're stupid enough to
believe that they are accurate, then you deserve everything you get when
you get burnt to a crisp ... as the astronauts would have been had NASA
used them ... (actually, the astronauts would probably have frozen as
they missed the moon and couldn't get home).
>kindest regards,
>Lauri Pietarinen
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
Witches are curious by definition and inquisitive by nature. She moved in. "Let 
me through. I'm a nosey person.", she said, employing both elbows.
Maskerade : (c) 1995 Terry Pratchett

In response to


pgsql-hackers by date

Next:From: Andrew DunstanDate: 2003-11-05 00:52:33
Subject: Re: Open Sourcing pgManage
Previous:From: Andreas PflugDate: 2003-11-04 23:06:21
Subject: Re: Open Sourcing pgManage

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group