Re: Dreaming About Redesigning SQL

From: "Bob Badour" <bbadour(at)golden(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dreaming About Redesigning SQL
Date: 2003-10-21 14:08:22
Message-ID: HS2dnWzBZ-KWowiiU-KYjA@golden.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Lauri Pietarinen" <lauri(dot)pietarinen(at)atbusiness(dot)com> wrote in message
news:3F94BCBB(dot)7030001(at)atbusiness(dot)com(dot)(dot)(dot)
> Anthony W. Youngman wrote:
>
> >In article <bmutga$jdk$1(at)nyytiset(dot)pp(dot)htv(dot)fi>, Lauri Pietarinen
> ><lauri(dot)pietarinen(at)atbusiness(dot)com> writes
> >
> >
> >>So in your opinion, is the problem
> >>
> >>1) SQL is so hard that the average programmer will not know how to use
it
> >>efficiently
> >>
> >>
> >
> >Nope
> >
> >>or
> >>2) Relational (or SQL-) DBMS'es are just too slow
> >>
> >Yes.
> >
> >>If 2) then why don't we get a bit more concrete. Could you give
> >>an example of a query that in your experience would be too slow using
> >>a standard SQL database (e.g. Oracle, or MySQL). We could then
> >>actually try it out on some machine and compare. I suggest using
> >>the customer-order-order_detail-product database
> >
> >Okay. Give me a FORMULA that returns a time in seconds for your query.
> >
> >Let's assume I want to print a statement of how many invoices were sent
> >to a customer, along with various details of those invoices. My invoice
> >file is indexed by company/month, and we can reasonably assume that the
> >time taken to produce the statement is infinitesimal compared to the
> >time taken to retrieve the invoice data from disk. For MV
> >
> >T = (2 + N) * ST * 1.05
> >
> >Where T is the time taken to produce the report, N is the number of
> >invoices, and ST is the hard disk seek time.
> >
> First of all it is important to note that an important component of all
> modern SQL-DBMS's is
> the buffer pool (or cache) meaning that in a reasonably well tuned
> database you get very few
> disk I/O's, even when *writing* data into tables.
>
> SQL-DBMS's also are very clever at using indexes, i.e. if they can find
> all necessary data
> from an index it will not even look at the table, so to speak.
>
> And, even when presuming conservatively that there is no data in cache,
> depending on how
> the data is clustered, you will get more than one row/disk read (= 8K in
> most(?) systems).
>
> So, assuming the (simplified) example
>
> Customer(cust_id, .....)
> Order(order_id, cust_id,...)
> OrderDetail(order_id, prod_id, ...
> Product(prod_id,....)
>
> If you created a clustering index on
> Customer(cust_id)
> Order(cust_id)
> OrderDetail(order_id)
>
> And presumed that the average length of
> customer = 1K
> order=500
> orderDetail=300
>
> You would get, with 3 I/O's
> - 8 customer rows
> - 16 order rows
> - 24 order detail rows (which would only apply to one order)
>
> so, granted, that would result in one I/O per order which is more than
> in your example.
>
> I could now denormalise OrderDetail so that it contains cust_id also
> and cluster by cust_id
> (might cause you trouble down the road, if you can change the customer
> of an order), in which case, with 3 I/O's I would get
> - 8 customer rows
> - 16 order rows
> - 24 order detail rows (which would all apply to one customer)

Depending on block size, by clustering the three tables together, one might
get all of those rows for a single read potentially improving on Wol's
numbers by a factor of eight or more for this one query. Of course, doing so
would increase the cost of a table scan on the customer table.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-10-21 14:13:00 Re: [HACKERS] obj_description problems?
Previous Message Rod Taylor 2003-10-21 13:48:49 Re: multi-backend psql