pov and tsort

From: Joel Jacobson <joel(at)gluefinance(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pov and tsort
Date: 2011-01-07 01:52:54
Message-ID: AANLkTinxm99N5X6-am=Y4_5EkKfU+_EytNkLG89oOeoR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings hackers!

I've renamed the project fsnapshot to pov, PostgreSQL Object Version
control system. :)

I've also created a quite nifty SQL command line based utility to
perform graph/tree sorting algorithms on data in the database, without
the need to export the data to some external application.

The utility is named tsort and behaves exactly like the GNU coreutils
tool tsort (or the Perl Power Tools tool tcsort).

Since tree sorting is a quite common task in computer science, perhaps
some of you will find it useful.

Source code: https://github.com/gluefinance/pov/blob/master/sql/schema/pov/functions/tsort.pl

Also, if I have understood everything correctly, the toposort
algorithm in pg_dump_sort.c does not does its best ensuring the
objects are created in the same order, if the oids would change or if
objects would be dropped/added.

The algorithm is probably a simple DFS (depth-first sorting) or BFS
(breadth-first sorting) without any effort made to do sorting when
selecting the successor objects.

It's probably a quite challenging task to implement the extra
necessary sorting in C, on top of the standard DFS or BFS algorithm.

I put together a small quite nifty plperl utility to do the task, as a
simple proof-of-concept and also because I thought it makes sense to
provide such a method accessible from the SQL prompt,
since tree/graph sorting is probably one of the most commonly
performed tasks in many applications dealing.

It is quite powerful with all the basic tree sorting features I could think of.

I hope you find it useful and perhaps even a bit fun :)

Here is a small example on how to use it to analyze pg_depend:

create table a ( id int not null, primary key (id));
create table aa ( id int not null, primary key (id), foreign key (id)
references a(id));
create table ab ( id int not null, primary key (id), foreign key (id)
references a(id));
create table aaa ( id int not null, primary key (id), foreign key (id)
references aa(id));
create table aab ( id int not null, primary key (id), foreign key (id)
references aa(id));
create table aba ( id int not null, primary key (id), foreign key (id)
references ab(id));
create table abb ( id int not null, primary key (id), foreign key (id)
references ab(id));

Instead of using oid, one should use unique names for each object
(composed differently based on the object type).

If doing so, the example below would render exactly the same result
even in two databases with completely different oids for the same
schema.

glue=# select oid,relname from pg_class where relname ~ '^a.?.?$';
oid | relname
-------+---------
52354 | a
52359 | aa
52399 | aba
52369 | ab
52389 | aab
52379 | aaa
52409 | abb
(7 rows)

Find root objects (source objects, no predecessors), sort numerically:
glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' ||
refobjid), ','), ',', 0, 'sub {$a <=> $b}', 'SOURCE') FROM pg_depend;

tsort

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
{0,10939,10940,10942,10943,10945,10946,10948,10949,10951,10952,10955,10956,10958,10959,10962,10963,10966,10967,10970,10971,10973,10974,10976,10977,10980,10981,10983,10984,10985,10986,10988,10989,10991,10992,10994,10995,10998,10999,11001,11002,11004,11005,11008,11009,11011,11012,11014,11015,11018,11019,11021,11022,11024,11025,11028,11029,11031,11032,11034,1103
5,11037,11038,11040,11041,11043,11044,11046,11047,11049,11050,11053,11054,11056,11057,11074,11076,11078,11080,11082,11084,11086,11088,11090,11092,11094,11096,11098,11100,11102,11104,11106,11108,11110,11112,11114,11116,11118,11120,11122,11124,11126,11128,11130,11132,11134,11136,11138,11140,11142,11144,11146,11148,11150,11152,11154,11156,11158,11160,11162,11164,
11166,11168,11170,11172,11174,11176,11178,11180,11182,11184,11186,11188,11190,11192,11193,11194,11195,11196,11197,11198,11199,11200,11201,11202,11203,11204,11205,11206,11207,11208,11209,11210,11211,11212,11214,11216,11218,11220,11222,11224,11226,11228,11230,11232,11234,11236,11238,11240,11241,11242,11243,11244,11245,11246,11247,11248,11249,11250,11251,11252,11
253,11254,11255,11256,11257,11258,11259,11260,11261,11262,11263,11264,11266,11268,11270,11272,11274,11276,11278,11280,11282,11284,11286,11288,11290,11292,11297,11299,11301,11303,11305,11307,11309,11311,11313,11315,11317,11319,11321,11323,11325,11340,11344,11345,11348,11349,11352,11353,11355,11356,11359,11360,11363,11364,11367,11368,11371,11372,11375,11376,1137
9,11380,11383,11384,11387,11388,11391,11392,11395,11396,11398,11399,11402,11403,11405,11406,11409,11410,11413,11414,11417,11418,11421,11422,11425,11426,11429,11430,11433,11434,11437,11438,11441,11442,11444,11445,11448,11450,11451,11453,11455,11456,11458,11460,11461,11463,11465,11466,11468,11470,11471,11473,11475,11476,11478,11480,11481,11483,11484,11487,11488,
11491,11492,11495,11496,11498,11499,11502,11503,11506,11507,11510,11511,11514,11515,11518,11519,11522,11523,11526,11527,11530,11531,11533,11534,11536,11537,11539,11540,11542,11543,11545,11546,11548,11549,11551,11552,11555,11556,52267,52342,52344,52345,52346,52347,52348,52349,52350,52351,52355,52360,52365,52366,52367,52368,52370,52375,52376,52377,52378,52380,52
382,52385,52386,52387,52388,52390,52392,52395,52396,52397,52398,52400,52402,52405,52406,52407,52408,52410,52412,52415,52416,52417,52418}
(1 row)

Find connected objects to the tables we created above and sort like
pg_dump_sort.c (DFS/BFS):
glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' ||
refobjid), ','),',',0,'DFS','CONN_INCL','INCLUDE','52354,52359,52399,52369,52389,52379,52409','BOTH')
FROM pg_depend;

tsort

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------
{52398,52405,52386,52390,52391,52400,52401,52387,52380,52381,52406,52375,52396,52418,52407,52417,52397,52410,52411,52408,52404,52385,52395,52394,52365,52415,52378,52355,52356,52376,52402,52403,52399,52416,52414,52372,52373,52377,52374,52366,52370,52371,52369,52412,52413,52409,52392,52393,52389,52360,52361,52388,52384,52362,52363,52367,52382,52383,52379,52368,
52364,52359,52357,52358,52354,2200}
(1 row)

Find connected objects to the tables we created above and sort each
lexically ascending, preserving the order:
glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' ||
refobjid), ','),',',0,'sub {$a cmp
$b}','CONN_INCL','INCLUDE','52354,52359,52399,52369,52389,52379,52409','BOTH')
FROM pg_depend;

tsort

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------
{52355,52356,52360,52361,52365,52366,52367,52368,52364,52370,52371,52375,52376,52377,52378,52374,52357,52358,52354,52380,52381,52382,52383,52385,52386,52387,52388,52384,52379,52390,52391,52392,52393,52395,52396,52397,52398,52394,52362,52363,52359,52389,52400,52401,52402,52403,52405,52406,52407,52408,52404,52399,52410,52411,52412,52413,52415,52416,52417,52418,
52414,52372,52373,52369,52409,2200}
(1 row)

Calling the utility with no arguments shows the help:

glue=# SELECT tsort();
tsort
----------------------------------------------------------------------------------

DESCRIPTION
tsort - return a tree's nodes in topological order

SYNOPSIS
tsort(edges text, delimiter text, debug integer, algorithm text,
selection text, operator text, nodes text, direction text);

OUTPUT
nodes text[] Array of nodes in topologically sorted order

INPUT PARAMETERS
Parameter Type Regex Description
============================================================================
edges text ^.+$ Node pairs, separated by [delimiter].

delimiter text ^.*$ Node separator in [edges],
default is ' ', i.e. single blank space.

debug integer Print debug information using RAISE DEBUG:
0 no debug (default)
1 some debug
2 verbose debug

algorithm text Sorting algorithm:
DFS depth-first (default)
explores as far as possible along each
branch before backtracking.
BFS breadth-first
explores all the neighboring nodes,
then for each of those nearest nodes,
it explores their unexplored neighbor
nodes, and so on.
^sub sort using perl subroutine.
examples:
# sort numerically ascending
sub {$a <=> $b}

# sort numerically descending
sub {$b <=> $a}

#sort lexically ascending
sub {$a cmp $b}

# sort lexically descending
sub {$b cmp $a}

# sort case-insensitively
sub {uc($a) cmp uc($b)}

For more examples, please goto:
http://perldoc.perl.org/functions/sort.html

The following options will not affect the order of the nodes in the result,
they only control which nodes are included in the result:

Parameter Type Regex Description
============================================================================
selection text Selection of nodes, used by [operator]:
ALL select all nodes (default)
ISOLATED select nodes without any
successors nor predecessors
SOURCE select nodes with successors
but no predecessors
SINK select nodes with predecessors
but no successors
CONN_INCL select nodes connected to [nodes],
including [nodes]
CONN_EXCL select nodes connected to [nodes],
excluding [nodes]
separated by [delimiter]

operator text Include or exclude nodes in [selection]:
INCLUDE include nodes (default)
EXCLUDE exclude nodes

The following options are only applicable if,
[selection] is CONN_INCL or CONN_EXCL

Parameter Type Regex Description
============================================================================

nodes text select nodes connected to [nodes]
NULL not applicable (default)
[nodes] the start nodes, separated by [delimiter]

direction text direction to look for connected nodes
BOTH traverse both successors and
predecessors (default)
UP only traverse predecessors
DOWN only traverse successors

--
Best regards,

Joel Jacobson
Glue Finance

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2011-01-07 01:57:17 Re: SQL/MED - file_fdw
Previous Message Simon Riggs 2011-01-07 01:15:15 Re: Streaming base backups