Обсуждение: Database INNOVATION

Поиск
Список
Период
Сортировка

Database INNOVATION

От
Mauricio Chamati
Дата:
Folks,

Postgree is the most amazing DB, even more it will be the only one that will remain free (the good ones) as MySQL has been taken. In order to move on with this project, as an Java Senior Architect, I am recomming to you guys to have an special "Section" for Audit tables. As we separate sequence in another seq, we should do the same for Audit tables. They have their own behavior, are less used, and so on. I got in the mailing list and am sending the email just to give this idea for you guys, keep in mind and implement as you want.

Regards.

Re: Database INNOVATION

От
Scott Marlowe
Дата:
On Tue, Oct 19, 2010 at 8:12 AM, Mauricio Chamati <mchamati@gmail.com> wrote:
> Folks,
> Postgree is the most amazing DB, even more it will be the only one that will
> remain free (the good ones) as MySQL has been taken.

On a side note, MySQL is GPL licensed and cannot simply be taken.  The
commercially licensed version is owned by Oracle / Sun, but the GPL
code is quite free to be hacked and released and use according to the
GPL.  Due to the idiocy of GPL licensing the connection libs, there
are issues with integrating it with non-GPL software, but it is most
certainly not taken.

I'm no MySQL fan, just wanted to clear that point up.

Re: Database INNOVATION

От
Greg Smith
Дата:
Scott Marlowe wrote:
> On a side note, MySQL is GPL licensed and cannot simply be taken.  The
> commercially licensed version is owned by Oracle / Sun, but the GPL
> code is quite free to be hacked and released and use according to the
> GPL.

As the MySQL documentation is not GPL licensed, such hacks will have to
start over from scratch in that area.  Good luck with that.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


Re: Database INNOVATION

От
Chris Browne
Дата:
mchamati@gmail.com (Mauricio Chamati) writes:
> Postgree is the most amazing DB, even more it will be the only one that will
> remain free (the good ones) as MySQL has been taken. In order to move on with
> this project, as an Java Senior Architect, I am recomming to you guys to have
> an special "Section" for Audit tables. As we separate sequence in another seq,
> we should do the same for Audit tables. They have their own behavior, are less
> used, and so on. I got in the mailing list and am sending the email just to
> give this idea for you guys, keep in mind and implement as you want.

There is value to there being a tutorial (or such) available to explain
strategies for doing this.

I'm not so sure that this is something that properly belongs in the
Postgres reference documentation, as:

  - There are various strategies in the handling of auditing data, and
    no forcibly normative/universal solutions;

  - This isn't an explanation of how to use Postgres (ergo part of the
    "user guide" aspect of the documentation);

  - This isn't an aspect of the standard behaviour of Postgres (ergo
    part of the "reference manual" aspect of the documentation).

It might be sort of neat to have another "Part" to the documentation
that consists of stuff like:
  - Design patterns
  - Design antipatterns (to avoid!)
  - Cookbook
  - Implementation strategies

(and there is overlap between those 4 things, to be sure!)

But that's not a portion of the Official Documentation that presently
exists, so I'm not sure it properly fits.

People have written articles on this sort of thing in the past,
including:

 - Elein Mustain, with material at varlena.com
 - Postgres Online Journal <http://www.postgresonline.com/>
 - Articles blogged at Planet PostgreSQL
--
select 'cbbrowne' || '@' || 'gmail.com';
http://www3.sympatico.ca/cbbrowne/postgresql.html
"There are no threads in a.b.p.erotica, so there's no  gain in using a
threaded news reader."  -- (Unknown source)

Re: Database INNOVATION

От
Scott Marlowe
Дата:
On Tue, Oct 19, 2010 at 1:17 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Scott Marlowe wrote:
>>
>> On a side note, MySQL is GPL licensed and cannot simply be taken.  The
>> commercially licensed version is owned by Oracle / Sun, but the GPL
>> code is quite free to be hacked and released and use according to the
>> GPL.
>
> As the MySQL documentation is not GPL licensed, such hacks will have to
> start over from scratch in that area.  Good luck with that.

Actually, given the horrible docs for mysql (sorry but I can't find
anything in there half the time) that's not a wholly bad thing.  Plus
you can always get the docs from somewhere else, if you have to, I
guess.

Re: Database INNOVATION

От
Craig Ringer
Дата:
On 19/10/2010 10:12 PM, Mauricio Chamati wrote:

 > I am recomming
> to you guys to have an special "Section" for Audit tables. As we
> separate sequence in another seq, we should do the same for Audit
> tables. They have their own behavior, are less used, and so on.

As a senior software architect, you should know how to write clearer
descriptions of what you want :-P

Your suggestion is really broad. It's not clear exactly what you are
suggesting or what you actually want, or even what kind of audit tables
you're referring to.

Are you suggesting that there should also be a facility to put some
tables outside the normal transactional flow, so that writes to them are
retained when a transaction rolls back? (If so: this can be done, albeit
hackishly, with dblink).

Are you suggesting support for audit logging of SELECTs? ie "ON SELECT"
triggers or some similar mechanism?

Do you want separate storage and/or WAL logging of audit tables for
performance reasons? If Pg ever supports separate WAL logging for
different databases or different sets of tables, that'd be an attractive
option for audit tables. As it is, you already can use tablespaces to
put them on different storage.

Do you want built-in support for marking tables/columns as "audited" so
the database system automatically manages recording of audit data to
audit tables with no need to define triggers etc?

If not, what exactly is it that you *do* want?


Now, personally, if we're talking "database innovation" what I'd like to
see is a built-in way to get query results straight from the database as
graphs of tuples and their relationships. Tabular result sets are poorly
suited to some kinds of workloads, including a few increasingly common
ones like document-oriented storage and use via ORMs. In particular, the
way ORMs tend to do multiple LEFT OUTER JOINs and deduplicate the
results or do multiple queries and post-process to form a graph is
wasteful and slow. If Pg had a way to output an object graph (or at
least tree) natively as, say, JSON, that'd be a marvellous option for
some kinds of workloads, and might help the NoSQL folks from whining
quite so much as well ;-)



--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Database INNOVATION

От
"Brent Wood"
Дата:
Have a look at PL/R.

You can embed a command to generate a graphic using R via a user defined SQL function,

This example from http://www.varlena.com/GeneralBits/Tidbits/bernier/art_66/graphingWithR.html

HTH

  Brent Wood

=====================================================================================

Graphs can be as easy as '123'. Here's an example where two columns in a table are plotted against each other.

Create and populate the table using the following commands:

CREATE TABLE temp (x int, y int);

INSERT INTO temp VALUES(4,6);
INSERT INTO temp VALUES(8,3);
INSERT INTO temp VALUES(4,7);
INSERT INTO temp VALUES(1,5);
INSERT INTO temp VALUES(7,8);
INSERT INTO temp VALUES(2,3);
INSERT INTO temp VALUES(5,1);
INSERT INTO temp VALUES(9,4);

The function f_graph() generates the graph as a pdf document:

CREATE OR REPLACE FUNCTION f_graph() RETURNS text AS
'
str <<- pg.spi.exec (''select x as "my a" ,y as "my b" from temp order by x,y'');
pdf(''/tmp/myplot.pdf'');
plot(str,type="l",main="Graphics Demonstration",sub="Line Graph");
dev.off();
print(''done'');
'
LANGUAGE plr;

Creating the graph by invoking this query:

SELECT f_graph();   databased plot example




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Craig Ringer said:

Now, personally, if we're talking "database innovation" what I'd like to
see is a built-in way to get query results straight from the database as
graphs of tuples and their relationships. Tabular result sets are poorly
suited to some kinds of workloads, including a few increasingly common
ones like document-oriented storage and use via ORMs. In particular, the
way ORMs tend to do multiple LEFT OUTER JOINs and deduplicate the
results or do multiple queries and post-process to form a graph is
wasteful and slow. If Pg had a way to output an object graph (or at
least tree) natively as, say, JSON, that'd be a marvellous option for
some kinds of workloads, and might help the NoSQL folks from whining
quite so much as well ;-)



--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand


Please consider the environment before printing this email.

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

Re: Database INNOVATION

От
Darren Duncan
Дата:
Craig Ringer wrote:
> Now, personally, if we're talking "database innovation" what I'd like to
> see is a built-in way to get query results straight from the database as
> graphs of tuples and their relationships. Tabular result sets are poorly
> suited to some kinds of workloads, including a few increasingly common
> ones like document-oriented storage and use via ORMs. In particular, the
> way ORMs tend to do multiple LEFT OUTER JOINs and deduplicate the
> results or do multiple queries and post-process to form a graph is
> wasteful and slow. If Pg had a way to output an object graph (or at
> least tree) natively as, say, JSON, that'd be a marvellous option for
> some kinds of workloads, and might help the NoSQL folks from whining
> quite so much as well ;-)

Never mind JSON.  You can fix the outer joins problem and other issues simply by
supporting relation-valued attributes, or in other words, row field values that
are rowsets.  And recursively.  Parent records in outer rowset, child records
inside.  And this is all perfectly normal for the relational model, and SQL's
differing from this is part of how SQL is crippled and not really the relational
model.  I demonstrate how it might be better done with my Muldis D language. --
Darren Duncan

Re: Database INNOVATION

От
Craig Ringer
Дата:
On 10/20/2010 12:35 PM, Brent Wood wrote:
> Have a look at PL/R.
>
> You can embed a command to generate a graphic using R via a user defined
> SQL function,

In this case, when I say "graph" or "tree" I'm referring to the concept
in the graph theory sense, not the "plot" sense. "object graph" not
"image representation of data".

http://en.wikipedia.org/wiki/Graph_(mathematics)
http://en.wikipedia.org/wiki/Graph_theory

Sorry, I didn't even think to clarify my usage.

What I'm talking about is a way to query the database and obtain a
representation of matching tuples where each tuple is represented
exactly once, and referential relationships between tuples are included
in an efficient way.

For a simple tree or forest (ie a directed graph with no cycles) this
could be a XML/JSON/YAML/whatever document that uses nesting to
represent relationships.

For more complex graphs, it'd have to be a list of
XML/JSON/YAML/whatever representations of each tuple or (if Pg supported
it) multiple tabular result sets, one for each tuple type. An edge list
could be included to speed mapping out the inter-object references after
deserialization.

To say this would be nice when dealing with document-in-database storage
and certain types of ORM workload is quite an understatement. Getting
rid of all that horrid "multiply left join, filter and de-duplicate" or
"n+1 select" crap would be quite lovely. Sure, it's often better to use
sane SQL directly, but there are tasks for which ORMs or
document-database mappings are a real time and pain saver - it'd just be
nice to be able to teach the database their language.

Plus, that'd help shut up the "NoSQL" crowd and separate "NoSQL" from
"relaxed or no ACID shareded databases", two different things people
currently confuse.

In any case, thanks for the tip. It's nice to know the PL/R can be used
for such in-database processing when I *do* want to plot data.

--
Craig Ringer

Re: Database INNOVATION

От
Craig Ringer
Дата:
On 20/10/10 13:12, Darren Duncan wrote:

> Never mind JSON.  You can fix the outer joins problem and other issues
> simply by supporting relation-valued attributes, or in other words, row
> field values that are rowsets.

You can for trees/forests yes. How would you handle more general graphs
with cycles or bidirectional relationships, where you still want to be
able to reconstruct them into a traversible graph client-side?

There are existing graph databases for this, of course, but I've
frequently wished to be able to use the power of SQL's query language
and reporting facilities with my data as well as being able to extract
it as (sub)graphs when needed. Using a graph database would usually cost
me ACID, full SQL support, and in most cases all those goodies like
triggers, constraints, etc as well.

I think there's a real use case for using regular relational storage
with a few SQL extensions to support returning graph-style rather than
row-set style results. Even the SQL extensions probably aren't necesary;
I suspect a (limited and somewhat slow) version could be done purely in
a PL under PostgreSQL, and I've been thinking about trying to prototype one.



> And recursively.  Parent records in
> outer rowset, child records inside.  And this is all perfectly normal
> for the relational model, and SQL's differing from this is part of how
> SQL is crippled and not really the relational model.  I demonstrate how
> it might be better done with my Muldis D language. -- Darren Duncan

You'd really wow people if you could bang together a working JPA 2.0
backend for that, or a dialect for an existing provider like Hibernate.

Personally, I'd love to give someting like your Muldis-D query interface
a go if it could live within PostgreSQL as a contrib module, using the
regular Pg storage and just providing an alternative query facility.

Right now, it looks like it is a perl-all-the-way system, with no
interfaces for other languages and its own database storage system. In
its current form a quick glance at the docs doesn't demonstrate any
obvious advantage of using it over any of the existing well-established
graph databases or object-relational database systems.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: Database INNOVATION

От
Darren Duncan
Дата:
Craig Ringer wrote:
> On 20/10/10 13:12, Darren Duncan wrote:
>> Never mind JSON.  You can fix the outer joins problem and other issues
>> simply by supporting relation-valued attributes, or in other words, row
>> field values that are rowsets.
>
> You can for trees/forests yes. How would you handle more general graphs
> with cycles or bidirectional relationships, where you still want to be
> able to reconstruct them into a traversible graph client-side?
>
> There are existing graph databases for this, of course, but I've
> frequently wished to be able to use the power of SQL's query language
> and reporting facilities with my data as well as being able to extract
> it as (sub)graphs when needed. Using a graph database would usually cost
> me ACID, full SQL support, and in most cases all those goodies like
> triggers, constraints, etc as well.
>
> I think there's a real use case for using regular relational storage
> with a few SQL extensions to support returning graph-style rather than
> row-set style results. Even the SQL extensions probably aren't necesary;
> I suspect a (limited and somewhat slow) version could be done purely in
> a PL under PostgreSQL, and I've been thinking about trying to prototype one.
>
>> And recursively.  Parent records in
>> outer rowset, child records inside.  And this is all perfectly normal
>> for the relational model, and SQL's differing from this is part of how
>> SQL is crippled and not really the relational model.  I demonstrate how
>> it might be better done with my Muldis D language. -- Darren Duncan
>
> You'd really wow people if you could bang together a working JPA 2.0
> backend for that, or a dialect for an existing provider like Hibernate.
>
> Personally, I'd love to give someting like your Muldis-D query interface
> a go if it could live within PostgreSQL as a contrib module, using the
> regular Pg storage and just providing an alternative query facility.
>
> Right now, it looks like it is a perl-all-the-way system, with no
> interfaces for other languages and its own database storage system. In
> its current form a quick glance at the docs doesn't demonstrate any
> obvious advantage of using it over any of the existing well-established
> graph databases or object-relational database systems.

Muldis D is a work-in-progress.  The system is specified in terms of a
programming language of which multiple implementations could exist, essentially
the same way as SQL is.  I fully intend for it to run on Postgres, using regular
Pg storage (which may gain features along the way), and be useable both in the
manner of a PL and for general DDL/DML/queries/etc.  I agree that it is best for
the initial all-in attempt to be done as a contrib module, and I am happy that
Pg already has so much of the infrastructure in place that is needed.  The
first, Perl-all-the-way, system is just the reference implementation, but the
Pg-contrib version would likely be the first one for serious use.

If you or anyone wants to work on making this a reality, I welcome it.

-- Darren Duncan


Re: Database INNOVATION

От
Joe Conway
Дата:
On 10/19/2010 10:12 PM, Craig Ringer wrote:
> On 10/20/2010 12:35 PM, Brent Wood wrote:
>> Have a look at PL/R.

> In this case, when I say "graph" or "tree" I'm referring to the concept
> in the graph theory sense, not the "plot" sense. "object graph" not
> "image representation of data".

Analysis of graphs is available via PL/R as well. See:
  http://cran.r-project.org/web/views/gR.html

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


Вложения

Re: Database INNOVATION

От
"Brent Wood"
Дата:
Gotcha.

Apologies for the digression, off your exact topic but consistent with the subject :-)

I'm interested in both, PL/R & representational graphics from an analytical perspective, doing more than just retrieving raw or accumulated data with SQL. & also from the (mathemetical) graphic perspective to support biological taxonomic trees/heirarchies, which do not easily fit the SQL model, although a number of kludges to traverse such structures are around.

(I need to look at the Postgres recursive capability for this sometime)

Cheers,

  Brent


Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand

>>> Craig Ringer <craig@postnewspapers.com.au> 10/20/10 6:12 PM >>>
On 10/20/2010 12:35 PM, Brent Wood wrote:
> Have a look at PL/R.
>
> You can embed a command to generate a graphic using R via a user defined
> SQL function,

In this case, when I say "graph" or "tree" I'm referring to the concept
in the graph theory sense, not the "plot" sense. "object graph" not
"image representation of data".

http://en.wikipedia.org/wiki/Graph_(mathematics)
http://en.wikipedia.org/wiki/Graph_theory

Sorry, I didn't even think to clarify my usage.

What I'm talking about is a way to query the database and obtain a
representation of matching tuples where each tuple is represented
exactly once, and referential relationships between tuples are included
in an efficient way.

For a simple tree or forest (ie a directed graph with no cycles) this
could be a XML/JSON/YAML/whatever document that uses nesting to
represent relationships.

For more complex graphs, it'd have to be a list of
XML/JSON/YAML/whatever representations of each tuple or (if Pg supported
it) multiple tabular result sets, one for each tuple type. An edge list
could be included to speed mapping out the inter-object references after
deserialization.

To say this would be nice when dealing with document-in-database storage
and certain types of ORM workload is quite an understatement. Getting
rid of all that horrid "multiply left join, filter and de-duplicate" or
"n+1 select" crap would be quite lovely. Sure, it's often better to use
sane SQL directly, but there are tasks for which ORMs or
document-database mappings are a real time and pain saver - it'd just be
nice to be able to teach the database their language.

Plus, that'd help shut up the "NoSQL" crowd and separate "NoSQL" from
"relaxed or no ACID shareded databases", two different things people
currently confuse.

In any case, thanks for the tip. It's nice to know the PL/R can be used
for such in-database processing when I *do* want to plot data.

--
Craig Ringer

Please consider the environment before printing this email.

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.