Re: optimizing a query over tree-like structure

From: "Oliveiros Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
To: <az(at)svilendobrev(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: optimizing a query over tree-like structure
Date: 2008-09-30 17:01:23
Message-ID: 083601c9231e$2b3b08d0$ec5a3d0a@marktestcr.marktest.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Svil,

Please advice me,

You have values and one table for N,R,P,F and O and Z, right?

And you have ownership which is a "catch-all" associative table between
values and whatever other table, is this correct?

You want to retrieve the values for a certain N, and all to all the other
entities that belong to that N, is this correct?

Fine,
I have a few questions on your associative tables
mm_N2P -- An N can be associated to P in a many to many relationship? N
Can have one or none P, am I right? It can have many Ps ? And a P can belong
to many Ns ?
mm_P2O, -- I understand an O can have many Positions, but a Positon can be
in more than one O? is this a many to many relationship?
mm_O2O, -- if it is a hierarchical tree do you need an associative table? An
O can have many sub-departments but a sub-department (O) can just belong to
a (super-)department,
-- isn't it so?
mm_N2Z -- this one was mistake of yours, it doesn't exist, right?

You wrote that giant query by yourself or was it generated by some automated
tool? (sqlalchemy , I have no idea :p )

Your ultimate goal is to retrieve the values associated with a certain
Department( and all its sub-departments) which is associated with a certain
Position, which values
are also to obtain, such position being associated with a certain N, which
values are also to obtain, is my understandin correct?

Explain me the simplest case, where you have an N not associated with any R
or P, just with an F.
Which info is to be retrieved, exactly in this case ?

Best,
Oliveiros

----- Original Message -----
From: <az(at)svilendobrev(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Sent: Tuesday, September 30, 2008 3:24 PM
Subject: Re: [SQL] optimizing a query over tree-like structure

> another idea i just got, to decrease the number of tables in one FROM,
> is to represent the first-level ORs with unions. but i'm not sure how
> exactly to do it: are these equivalent?
>
> FROM N,ownership,mm_N2R
> where (
> N.dbid = ownership.N
> OR
> N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R
> OR
> ...
> ) AND N.obj = whatever-filter-by-N
>
> --------
>
> FROM (
> (ownership join N on ownership.N = N.dbid)
> union
> (ownership join mm_N2R on ownership.R = mm_N2R.right join N on
> mm_N2R.left = N.dbid )
> OR
> ) ...
>
> and should i bundle the filtering-by-N/Employment in every of above
> union-members?
>
> On Tuesday 30 September 2008 15:21:09 Oliveiros Cristina wrote:
>> Hi, Svil
>>
>> I 'd like to first fully understand the background of your problem
>> before figurin out if I can be of any help (or not).
>>
>> You have a tree, of which N is the root, is this correct?
>> Then, what are the next sublevel?
>> F, P and R? If so, why is R linked to a sibling (F) ?
>> And the next one?
>> O and Z?
>> Is O connected to itself?
>> And i am not understanding your concept of "shortcuts". Could you
>> please explain ?
>> What kind of tree do you have exactly? Binary? Trenary?
>> The mm_* tables keep relations between nodes, I guess.... If so ,
>> the mm_N2Z one is empty, in this example, right?As there is no edge
>> from N to Z (not direct).
>> But what is the Nazn table? What records does it keep?
>> And what is the ownership table? And the value?
>> Could you tell which columns these tables have, at least the
>> relevant ones for your problem ?
>>
>> Please kindly advice me on these issues.
>> I am not very specialized in optimizing queries, but I see you have
>> a lot of cartesian products on your FROM clause, which, from my own
>> experience, I guess it has tendency to be slow...
>>
>> Best,
>>
>> Oliveiros
>>
>> ----- Original Message -----
>> From: <az(at)svilendobrev(dot)com>
>> To: <pgsql-sql(at)postgresql(dot)org>
>> Sent: Tuesday, September 30, 2008 9:32 AM
>> Subject: [SQL] optimizing a query over tree-like structure
>>
>> > hi.
>> > sorry for the vague syntax used below, but the query is huge so
>> > i've tried to present it in simple terms. And sorry if i'm doing
>> > obviously stupid things, i have lots of years programming behind
>> > me but NO sql involved.
>> > i have a somewhat tree-like structure of objects that link to
>> > each other via many2many associations. it looks like:
>> > (N is "root")
>> > N links to R,P,F
>> > R links to F
>> > P links to O,F
>> > O links to O,F #recursively
>> > F links to Z
>> > All links to F but the one in O are "shortcuts", to avoid looking
>> > it up recursively.
>> > each of these objects has some associated values (again
>> > many2many, ownership).
>> >
>> > what i want is to get all the values related to a given N and its
>> > sublevels, in one query.
>> >
>> > one variant of what i've invented so far is (~pseudocode, no
>> > recursion on O):
>> >
>> > SELECT ownership.*, value.*
>> > FROM Nazn, mm_N2P, mm_P2O, mm_O2O, mm_O2O AS mm_O2O1, mm_N2Z,
>> > ownership JOIN value ON ownership.value = value.dbid
>> > WHERE (
>> > N.dbid = ownership.N
>> > OR
>> > N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R
>> > OR
>> > N.dbid = mm_N2P.left AND (
>> > mm_N2P.right = ownership.P
>> > OR
>> > mm_N2P.right = mm_P2O.left AND (
>> > mm_P2O.right = ownership.O
>> > OR
>> > mm_P2O.right = mm_O2O.left AND (
>> > mm_O2O.right = ownership.O
>> > OR
>> > mm_O2O.right = mm_O2O1.left AND
>> > mm_O2O1.right = ownership.O
>> > )))
>> > OR
>> > Nazn.dbid = mm_N2F.left AND (
>> > mm_N2F.right = ownership.F
>> > OR
>> > mm_N2Z.right = ownership.Z
>> > )
>> > ) AND ownership.value = value.dbid AND N.obj =
>> > whatever-filter-by-N
>> >
>> > ----------------
>> > this scales very poor.
>> > it uses the shortcut to F present in N.
>> > for just 200 rows with related associations, it takes 4 seconds
>> > to get result.
>> > if i use the shortcut to F present in P, it takes 2 seconds - but
>> > thats still inacceptable.
>> > seems that the number or consequtive ORs on same level is killing
>> > it. EXPLAIN gives nested loops all over.
>> > What am i doing wrong here?
>> > should i expand the A-to-B links of the sort
>> > mm_N2P.right = mm_P2O.left
>> > into
>> > mm_N2P.right = P.dbid and P.dbid == mm_P2O.left ?
>> >
>> > the query is generated via sqlalchemy and a layer on top, so i
>> > can tweak it any way required (and it has many other
>> > sub/filterings which i've ommited for brevity - they dont make it
>> > better/worse).
>> >
>> > any pointers of how such queries should be written are
>> > appreciated - e.g. what is considered fine, what doable and what
>> > is a no-no.
>> >
>> > thanks ahead
>> > ciao
>> > svil
>> >
>> > www.svilendobrev.com
>> > dbcook.sf.net
>> >
>> > --
>> > Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-sql
>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Rafael Domiciano 2008-10-01 01:07:09 Re: Can COPY update or skip existing records?
Previous Message az 2008-09-30 14:24:21 Re: optimizing a query over tree-like structure