Re: GSoC 2012

From: Maxim Smyatkin <smyatkinmaxim(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>, pgsql-students(at)postgresql(dot)org
Subject: Re: GSoC 2012
Date: 2012-03-27 13:19:47
Message-ID: CAAo+-C1P_+2XoTAvrU_AXNe73npEVGzK-hx-eAz7o8DS60SCog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-students

Hello all, again.

Finally, I have made a decision to work on Firebird FDW. I'm sorry for
length of the letter, here is a list of content (if you wont read whole the
message):
1. What I was doing during this time (from my first message to this one)
and Why I finally decided to implement Firebird FDW?
2. What i want from Firebird FDW?
3. Specific questions I have to solve before proposing quantifiable results
and schedule.

1. There is many interesting projects in TODO list, but I have not found
any with which I am at least on 50% familiar as I am with Firebird's
architecture or with Firebird's API, so I can't assure the completeness of
such projects. Furthermore, I still remember how I was digging into
internals of another projects and I am not as familiar with PostgreSQL that
I have to be for doing something in its Core. So, I started looking for
information about FDWs. I have done:
1) I got PostgreSQL source code and built it on Ubuntu. I opened project
using Eclipse and I tried to debug forked postgresql process with client
connected to it, to be sure that it works fine.
2) I used file_fdw as client, looked at its sources to get first impression
about FDW internals and also tried to debug it, to be sure that I
understand how to do it within Eclipse.
3) I have read client's and developer's documentation about FDWs; looked at
FDW list on wiki (to choose something more complex than file_fdw to study
and to estimate complexity of Firebird's one); Have read 2011 mail archive
with FDW questions and looked at 2 presentations.
4) Tried to use mysql_fdw, but as I understood it uses old FDW API, so I
downloaded Oracle_fdw to be familiar with modern API. Unfortunaly,
Oracle-xe was not working on my Ubuntu (Probably, it works well, but I have
to spend more time to set it up. May be I'll return to it, if it will be
necessary). Anyway I was studying source code, readme and changelog and I
found there several interesting optimizations, such as where push-down and
connection pool (in library cache). At this moment my interest in Firebird
FDW strongly grew up :) and I started to think about "quantifiable results".

2. Of course, I want Firebird FDW to be as good as Oracle's one is:) But, I
cant be sure, that I'll complete it all in 3-4 month. So I have to choose
most important of features to be implemented first. Next list contains
features I want to implement marked as "+" for most important features and
"?" for features I can implement later:
+ Types compatibility should be as full as it is possible to implement.
+ Translate most common Firebird SQL's and API's errors to PostgreSQL's
analogues.
+ Plan and Cost output [3.1].
? Connections pool optimization (as it is in Oracle).
? Predicates push-down (may be not only "where", to handle situations like
it was described here:
http://archives.postgresql.org/pgsql-students/2011-03/msg00036.php).
? Field compatibility (link fields not only on their position, but on type,
name, and some other properties [3.2]).

3. Here is several (2 at this moment) questions I have to solve before cost
estimation and schedule creation can be done:
1) Firebird shows very simple query plan without cardinality or selectivity
values. Furthermore, it even does not maintain it internally (as I know it
can be implemented in Firebird 3, but anyway I have to think about cost
estimation depending on lack of statistics). So. probably, I have to get
cardinality as Count(...) and then let PostgreSQL to estimate cost and
selectivity based on heuristic rules. I'm pretty sure in positive answer,
but "Can PostgreSQL do it?"
2) According to
http://people.planetpostgresql.org/andrew/uploads/fdw2.pdfFKs, PKs,
Constraints and defaults are not implemented because PostgreSQL
can't manage them on foreign tables. NOT NULL is maintained because it
can't be changed in foreign DBMS, so we can be sure in it (actually field
can be recreated, so we can't be sure. Btw is it taken into account in
Oracle FDW?), if I am right. But we can map all of these field properties
from Firebird's metadata to PostgreSQL's (FK only between foreign tables,
not between servers). To check it for every user's query is too high prise
for such mapping, but (I am not sure about it, because we have something
like read-only transactions) as Firebird implements multi-version
transaction system - we can be sure that in context of one transaction this
values cant be changed. And we can track it inside of connection pool. We
can get several advantages from it:
- It will be necessary if FDWs will be not read-only in future.
- I think, PostgreSQL optimizer can make better decisions based on
constraints sometimes?
- And, finally, it can be useful for database user/client to know a
character of data.

--
Thank you!
Smyatkin Maxim.

In response to

Responses

Browse pgsql-students by date

  From Date Subject
Next Message Josh Berkus 2012-03-30 19:17:15 Re: GSoC 2012
Previous Message Dave Page 2012-03-26 08:44:31 Re: Regarding GSoC 2012 project