Re: Google Summer of Code: Full Disjunctions

From: "Tzahi Fadida" <tzahi_ml(at)myrealbox(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Cc: <ana_cata_hylo(at)yahoo(dot)com>
Subject: Re: Google Summer of Code: Full Disjunctions
Date: 2006-05-08 17:30:17
Message-ID: 015901c672c5$130b76c0$0b00a8c0@llord
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Lee,
First, i have no knowledge of anyone that have implemented full disjunctions(ever) aside
from the theoretical works of my colleagues.
With the exception of a corner case of it, that I believe was a simulation in 96.
(A. Rajaman and J.D. Ullman Integrating information by outerjoins and full-disjunctions).
I'd love to hear about any implementation out there (aside from my colleagues work, which
is mine also: cohen,sagiv, kimelfeld,kanza)

It can never be a binary operation since at the heart of the matter is that you need to take
each subset of the relations and join them. i.e.:
A
B
C
A,B
A,C
B,C
A,B,C

and remove subsumptions, for example if you have {t_A,t_B} from the A,B join,
you will not include just {t_A} or just {t_B} in the result (subsumption).

Usually binary operations allow for a bottom up computation approach, but FD is a TOP down approach
(Galindo-Legaria, C. outerjoins as disjunctions).

The answer what to do is a bit complex and part of it is in a paper that is currently in review,
however:

You take the cyclic component, for example FD(A,B,C,D) and then join it with another relation E.
It's not a trouble to write (A fd B) fd C but it will still have to be computed as one computation -
FD(A,B,C).

Again, note that FD is only useful for cases you can't use natural outer joins.
So only if you have a cycle you use FD.

In my current implementation I do getfdr(A,B,C) to get the RECORD since with FD you cannot
know in advance what will be the eventual RECORD (unless you save the RECORD type for A,B,C
and A,B,C,D etc...)

then I run, for example, FD('A,B,C') as RECORD (a0 int, ....)
As you can see, you can now join other relations to the result, it's a standard recordset.
So (FD('A,B,C') as RECORD (a0 int, ....))

Here is an example of a RUN (ignore all the additional parameters):
select getfdr('rand_1_0,rand_1_1,rand_1_2');
getfdr
---------------------------
(a0 int4,a1 int4,a2 int4)
(1 row)
select * from odmbfd('rand_1_0,rand_1_1,rand_1_2',true,true,100,'random',10,10) AS RECORD (a0 int,
a1 int, a2 int) natural full outer join rand_4_2;
a2 | a0 | a1 | a3
------+------+------+------
1 | | 1813 |
1 | | 3091 |
1 | | 316 |
2 | 2113 | | 2738
3 | | 2924 | 823
3 | | 2924 | 2735
.....
.....

select count(*) from odmbfd('rand_1_0,rand_1_1,rand_1_2',true,true,100,'random',10,10) AS RECORD (a0
int, a1 int, a2 int) natural full outer join rand_4_2;
count
-------
3201
(1 row)

>Hi Tzahi
>
>Since you seem to know about full disjunctions, I was wondering if you
>could answer this question:
>
>Can full disjunction be implemented as a binary operator?
>
>The algorithms I've seen for computing it seem to require that all
>input relations be supplied at once. Even in your message above you
>specified your example as FD(A,B,C) rather than say (A fd B) fd C.
>So I was wondering if the latter is possible?
>
>The reason I ask is that I'm currently trying to solve a problem that
>could in principle be solved using FD but it would fit into my current
>framework much better if it were a binary operator like the other
>joins.
>
>Thanks in advance.
>Lee

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joao Miguel Ferreira 2006-05-08 17:43:40 Re: database size grows (even after vacuum (full and analyze))....
Previous Message Bruno Wolff III 2006-05-08 17:07:24 Re: database size grows (even after vacuum (full and analyze))....