Обсуждение: GSoC 2012

Поиск
Список
Период
Сортировка

GSoC 2012

От
Maxim Smyatkin
Дата:
Hello all!

I am Russian first year Master student. My specialization is Information Technologies and the subject of my Master's thesis will be related with DBMSs' internals. My experience in this direction consists of (ordered by date):
- "The Relational DMBSs" (client side) and "The Modern DBMSs" courses at the University;
- implementation of Firebird engine for GSQL project as the course project on the Relational DBMSs;
- about 14 month worked at Red Soft Corp. as Red Database (Firebird's fork) core developer. During this period I was studying Firebird's Internals (read papers, learned the code, got 6-day master-class on this Subject by Dmitry Emanov - Firebird's lead developer), was involved in Red Database 2.5 implementation (i have imported features from Red Database 2.0, fixed some bugs, done several little improvements and have implemented Group algorithm based on B+-Tree);
- choosing the master degree subject within DBMS internals area and started studying it more detailed. Also I am getting right now "Architecture of DBMSs" course at the University based on Hellerstein and Stonebraker "Readings in Database Systems".

Basing on my knowledge, I think I am able to implement FDW (particular for wrapping Firebird data). But, as I wrote before, I'm also strongly interested in DBMS internals. So, if there is any ideas, related with PostgreSQL's Core, I, probably, will like them even more.
And one another question:) Many projects advise students to do some little patches to be accepted, but i have not found something about it in PostgreSQL GSoC 2012 page. I just missed it, or you have another criteria?

Thank you!
Smyatkin Maxim.

Re: GSoC 2012

От
Thom Brown
Дата:
On 22 March 2012 13:10, Maxim Smyatkin <smyatkinmaxim@gmail.com> wrote:
> Hello all!
>
> I am Russian first year Master student. My specialization is Information
> Technologies and the subject of my Master's thesis will be related with
> DBMSs' internals. My experience in this direction consists of (ordered by
> date):
> - "The Relational DMBSs" (client side) and "The Modern DBMSs" courses at the
> University;
> - implementation of Firebird engine for GSQL project as the course project
> on the Relational DBMSs;
> - about 14 month worked at Red Soft Corp. as Red Database (Firebird's fork)
> core developer. During this period I was studying Firebird's Internals (read
> papers, learned the code, got 6-day master-class on this Subject by Dmitry
> Emanov - Firebird's lead developer), was involved in Red Database 2.5
> implementation (i have imported features from Red Database 2.0, fixed some
> bugs, done several little improvements and have implemented Group algorithm
> based on B+-Tree);
> - choosing the master degree subject within DBMS internals area and started
> studying it more detailed. Also I am getting right now "Architecture of
> DBMSs" course at the University based on Hellerstein and Stonebraker
> "Readings in Database Systems".
>
> Basing on my knowledge, I think I am able to implement FDW (particular for
> wrapping Firebird data). But, as I wrote before, I'm also strongly
> interested in DBMS internals. So, if there is any ideas, related with
> PostgreSQL's Core, I, probably, will like them even more.
> And one another question:) Many projects advise students to do some little
> patches to be accepted, but i have not found something about it in
> PostgreSQL GSoC 2012 page. I just missed it, or you have another criteria?
>
> Thank you!
> Smyatkin Maxim.

Hi Smyatkin,

We recommend to all students that project proposals aren't too
ambitious as it's often the case that the work involved is
underestimated, and there's a high risk of the project not reaching
completion.  Students don't necessarily need to have provided small
patches for PostgreSQL in the past, although naturally it would show
that there would be a level of familiarity with the code base.

If you're interested in PostgreSQL internals, a good starting point
would be to look at the current TODO list on the wiki.  The items
listed there link to previous discussions around the features and can
provide design proposals, technical issues and background information
for each feature: http://wiki.postgresql.org/wiki/Todo

There are also suggestions available on the GSoC 2012 page on the
PostgreSQL wiki:
http://wiki.postgresql.org/wiki/GSoC_2012#Project_Ideas

When you register for GSoC as a student, you may submit as many
proposals as you wish, and those will be reviewed by a committee to
determine which ones are worth implementing, have a good chance of
completion, and that a specific mentor is available for.

--
Thom

Re: GSoC 2012

От
Maxim Smyatkin
Дата:
Hello all, again.

Finally, I have made a decision to work on Firebird FDW. I'm sorry for length of the letter, here is a list of content (if you wont read whole the message):
1. What I was doing during this time (from my first message to this one) and Why I finally decided to implement Firebird FDW?
2. What i want from Firebird FDW?
3. Specific questions I have to solve before proposing quantifiable results and schedule.

1. There is many interesting projects in TODO list, but I have not found any with which I am at least on 50% familiar as I am with Firebird's architecture or with Firebird's API, so I can't assure the completeness of such projects. Furthermore, I still remember how I was digging into internals of another projects and I am not as familiar with PostgreSQL that I have to be for doing something in its Core. So, I started looking for information about FDWs. I have done:
1) I got PostgreSQL source code and built it on Ubuntu. I opened project using Eclipse and I tried to debug forked postgresql process with client connected to it, to be sure that it works fine.
2) I used file_fdw as client, looked at its sources to get first impression about FDW internals and also tried to debug it, to be sure that I understand how to do it within Eclipse.
3) I have read client's and developer's documentation about FDWs; looked at FDW list on wiki (to choose something more complex than file_fdw to study and to estimate complexity of Firebird's one); Have read 2011 mail archive with FDW questions and looked at 2 presentations.
4) Tried to use mysql_fdw, but as I understood it uses old FDW API, so I downloaded Oracle_fdw to be familiar with modern API. Unfortunaly, Oracle-xe was not working on my Ubuntu (Probably, it works well, but I have to spend more time to set it up. May be I'll return to it, if it will be necessary). Anyway I was studying source code, readme and changelog and I found there several interesting optimizations, such as where push-down and connection pool (in library cache). At this moment my interest in Firebird FDW strongly grew up :) and I started to think about "quantifiable results".

2. Of course, I want Firebird FDW to be as good as Oracle's one is:) But, I cant be sure, that I'll complete it all in 3-4 month. So I have to choose most important of features to be implemented first. Next list contains features I want to implement marked as "+" for most important features and "?" for features I can implement later:
+ Types compatibility should be as full as it is possible to implement.
+ Translate most common Firebird SQL's and API's errors to PostgreSQL's analogues.
+ Plan and Cost output [3.1].
? Connections pool optimization (as it is in Oracle).
? Predicates push-down (may be not only "where", to handle situations like it was described here: http://archives.postgresql.org/pgsql-students/2011-03/msg00036.php).
? Field compatibility (link fields not only on their position, but on type, name, and some other properties [3.2]).

3. Here is several (2 at this moment) questions I have to solve before cost estimation and schedule creation can be done:
1) Firebird shows very simple query plan without cardinality or selectivity values. Furthermore, it even does not maintain it internally (as I know it can be implemented in Firebird 3, but anyway I have to think about cost estimation depending on lack of statistics). So. probably, I have to get cardinality as Count(...) and then let PostgreSQL to estimate cost and selectivity based on heuristic rules. I'm pretty sure in positive answer, but "Can PostgreSQL do it?"
2) According to http://people.planetpostgresql.org/andrew/uploads/fdw2.pdf FKs, PKs, Constraints and defaults are not implemented because PostgreSQL can't manage them on foreign tables. NOT NULL is maintained because it can't be changed in foreign DBMS, so we can be sure in it (actually field can be recreated, so we can't be sure. Btw is it taken into account in Oracle FDW?), if I am right. But we can map all of these field properties from Firebird's metadata to PostgreSQL's (FK only between foreign tables, not between servers). To check it for every user's query is too high prise for such mapping, but (I am not sure about it, because we have something like read-only transactions) as Firebird implements multi-version transaction system - we can be sure that in context of one transaction this values cant be changed. And we can track it inside of connection pool. We can get several advantages from it:
- It will be necessary if FDWs will be not read-only in future.
- I think, PostgreSQL optimizer can make better decisions based on constraints sometimes?
- And, finally, it can be useful for database user/client to know a character of data.


--
Thank you!
Smyatkin Maxim.

Re: GSoC 2012

От
Josh Berkus
Дата:
On 3/27/12 6:19 AM, Maxim Smyatkin wrote:
> Hello all, again.
>
> Finally, I have made a decision to work on Firebird FDW. I'm sorry for
> length of the letter, here is a list of content (if you wont read whole the
> message):

This looks like a very good proposal.  Please submit it if you haven't
already.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com