Обсуждение: SQL Property Graph Queries (SQL/PGQ)
Here is a prototype implementation of SQL property graph queries (SQL/PGQ), following SQL:2023. This was talked about briefly at the FOSDEM developer meeting, and a few people were interested, so I wrapped up what I had in progress into a presentable form. There is some documentation to get started in doc/src/sgml/ddl.sgml and doc/src/sgml/queries.sgml. To learn more about this facility, here are some external resources: * An article about a competing product: https://oracle-base.com/articles/23c/sql-property-graphs-and-sql-pgq-23c (All the queries in the article work, except the ones using vertex_id() and edge_id(), which are non-standard, and the JSON examples at the end, which require some of the in-progress JSON functionality for PostgreSQL.) * An academic paper related to another competing product: https://www.cidrdb.org/cidr2023/papers/p66-wolde.pdf (The main part of this paper discusses advanced functionality that my patch doesn't have.) * A 2019 presentation about graph databases: https://www.pgcon.org/2019/schedule/events/1300.en.html (There is also a video.) * (Vik has a recent presentation "Property Graphs: When the Relational Model Is Not Enough", but I haven't found the content posted online.) The patch is quite fragile, and treading outside the tested paths will likely lead to grave misbehavior. Use with caution. But I feel that the general structure is ok, and we just need to fill in the proverbial few thousand lines of code in the designated areas.
Вложения
Hi, On 2024-02-16 15:53:11 +0100, Peter Eisentraut wrote: > The patch is quite fragile, and treading outside the tested paths will > likely lead to grave misbehavior. Use with caution. But I feel that > the general structure is ok, and we just need to fill in the > proverbial few thousand lines of code in the designated areas. One aspect that I m concerned with structurally is that the transformation, from property graph queries to something postgres understands, is done via the rewrite system. I doubt that that is a good idea. For one it bars the planner from making plans that benefit from the graph query formulation. But more importantly, we IMO should reduce usage of the rewrite system, not increase it. Greetings, Andres Freund
On 16.02.24 20:23, Andres Freund wrote: > One aspect that I m concerned with structurally is that the transformation, > from property graph queries to something postgres understands, is done via the > rewrite system. I doubt that that is a good idea. For one it bars the planner > from making plans that benefit from the graph query formulation. But more > importantly, we IMO should reduce usage of the rewrite system, not increase > it. PGQ is meant to be implemented like that, like views expanding to joins and unions. This is what I have gathered during the specification process, and from other implementations, and from academics. There are certainly other ways to combine relational and graph database stuff, like with native graph storage and specialized execution support, but this is not that, and to some extent PGQ was created to supplant those other approaches. Many people will agree that the rewriter is sort of weird and archaic at this point. But I'm not aware of any plans or proposals to do anything about it. As long as the view expansion takes place there, it makes sense to align with that. For example, all the view security stuff (privileges, security barriers, etc.) will eventually need to be considered, and it would make sense to do that in a consistent way. So for now, I'm working with what we have, but let's see where it goes. (Note to self: Check that graph inside view inside graph inside view ... works.)
On 2/23/24 17:15, Peter Eisentraut wrote: > On 16.02.24 20:23, Andres Freund wrote: >> One aspect that I m concerned with structurally is that the >> transformation, >> from property graph queries to something postgres understands, is done >> via the >> rewrite system. I doubt that that is a good idea. For one it bars the >> planner >> from making plans that benefit from the graph query formulation. But more >> importantly, we IMO should reduce usage of the rewrite system, not >> increase >> it. > > PGQ is meant to be implemented like that, like views expanding to joins > and unions. This is what I have gathered during the specification > process, and from other implementations, and from academics. There are > certainly other ways to combine relational and graph database stuff, > like with native graph storage and specialized execution support, but > this is not that, and to some extent PGQ was created to supplant those > other approaches. > I understand PGQ was meant to be implemented as a bit of a "syntactic sugar" on top of relations, instead of inventing some completely new ways to store/query graph data. But does that really mean it needs to be translated to relations this early / in rewriter? I haven't thought about it very deeply, but won't that discard useful information about semantics of the query, which might be useful when planning/executing the query? I've somehow imagined we'd be able to invent some new index types, or utilize some other type of auxiliary structure, maybe some special executor node, but it seems harder without this extra info ... > Many people will agree that the rewriter is sort of weird and archaic at > this point. But I'm not aware of any plans or proposals to do anything > about it. As long as the view expansion takes place there, it makes > sense to align with that. For example, all the view security stuff > (privileges, security barriers, etc.) will eventually need to be > considered, and it would make sense to do that in a consistent way. So > for now, I'm working with what we have, but let's see where it goes. > > (Note to self: Check that graph inside view inside graph inside view ... > works.) > AFAIK the "policy" regarding rewriter was that we don't want to use it for user stuff (e.g. people using it for partitioning), but I'm not sure about internal stuff. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 23, 2024 at 11:08 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 2/23/24 17:15, Peter Eisentraut wrote: > > On 16.02.24 20:23, Andres Freund wrote: > >> One aspect that I m concerned with structurally is that the > >> transformation, > >> from property graph queries to something postgres understands, is done > >> via the > >> rewrite system. I doubt that that is a good idea. For one it bars the > >> planner > >> from making plans that benefit from the graph query formulation. But more > >> importantly, we IMO should reduce usage of the rewrite system, not > >> increase > >> it. > > > > PGQ is meant to be implemented like that, like views expanding to joins > > and unions. This is what I have gathered during the specification > > process, and from other implementations, and from academics. There are > > certainly other ways to combine relational and graph database stuff, > > like with native graph storage and specialized execution support, but > > this is not that, and to some extent PGQ was created to supplant those > > other approaches. > > > > I understand PGQ was meant to be implemented as a bit of a "syntactic > sugar" on top of relations, instead of inventing some completely new > ways to store/query graph data. > > But does that really mean it needs to be translated to relations this > early / in rewriter? I haven't thought about it very deeply, but won't > that discard useful information about semantics of the query, which > might be useful when planning/executing the query? > > I've somehow imagined we'd be able to invent some new index types, or > utilize some other type of auxiliary structure, maybe some special > executor node, but it seems harder without this extra info ... I am yet to look at the implementation but ... 1. If there are optimizations that improve performance of some path patterns, they are likely to improve the performance of joins used to implement those. In such cases, loosing some information might be ok. 2. Explicit graph annotatiion might help to automate some things like creating indexes automatically on columns that appear in specific patterns OR create extended statistics automatically on the columns participating in specific patterns. OR interpreting statistics/costing in differently than normal query execution. Those kind of things will require retaining annotations in views, planner/execution trees etc. 3. There are some things like aggregates/operations on paths which might require stuff like new execution nodes. But I am not sure we have reached that stage yet. There might be things we may not see right now in the standard e.g. indexes on graph properties. For those mapping the graph objects unto database objects might prove useful. That goes back to Peter's comment --- quote As long as the view expansion takes place there, it makes sense to align with that. For example, all the view security stuff (privileges, security barriers, etc.) will eventually need to be considered, and it would make sense to do that in a consistent way. --- unquote -- Best Wishes, Ashutosh Bapat
Patch conflicted with changes in ef5e2e90859a39efdd3a78e528c544b585295a78. Attached patch with the conflict resolved.
--
Best Wishes,
Ashutosh Bapat