Dear hackers,
I have to port a lot of programs to PostgreSQL and don't want to wait
for completing them all before I put the first into production.
So after some thought I came to an enthrilling vision
If PostgreSQL had an interface for transparently accessing external
data, I could leave the needed tables on the old RDBMS and write a
wrapper which would provide access to the data on the old RDBMS. Both
old and new (pgsql) programs would see the same data. And I could start
using postgres' fine tools (e.g. psql) _now_ for all of my work.
Putting these short sighted things aside it would provide an opportunity
for
- distributed databases
- load balancing via putting a table on multiple servers, creating a
unioning-view
- you could use SQL to access other data sources (e.g. LDAP)
- one step further to world domination (no matter where your data
resides, you could use postgres for your queries)
- accessing multiple databases in _one_ SQL statement (e.g. join)
So I ask for your opinion on this strange idea.
I suggest marking a table as external uses an interface which should
provide the following methods
- query the structure of the table (\d table-name)
- sequentially scan the table (returning selected attributes of each
tuple) [with some conditions]
- update/delete either by cursor or by where-condition
- query for statistics (see below)
Since this functionality would require modifications all over the place
in postgres I would like to start discussion about it. I might overlook
something but the thought of having such a thing around opens up a lot
of opportunities.
Perhaps an ODBC wrapper could be the correct point to start (after
implementing and testing the [additional] virtual table access layer)
Christof
PS: I'd never dare to depend on this functionality, some kind of
mirroring program might cover the problem for me as well, but it looked
so cool.
PPS: Of course I would start to investigate it further _after_ TOAST is
finished.
------------------- example ------------------------------------
[a is an external table, b an internal table]
select tuple_a1,tuple_b1 from a,b where a.tuple_a2=b.tuple_b2 and
tuple_a3=42
would cause-> sequential scan (tuple_a1,tuple_a2) on a where tuple_a3=42-> for each entry index scan on b for
a.tuple_a2=b.tuple_b2->report result
or-> sequential scan (tuple_b2, tuple_b1) on b-> index_scan (tuple_a1) by (tuple_a3=42,tuple_a2=b.tuple_b2)-> report
result
clearly we need to collect statistics on external tables as well