Re: Do we want a hashset type?
От | Tomas Vondra |
---|---|
Тема | Re: Do we want a hashset type? |
Дата | |
Msg-id | 501f66fc-c113-a64c-3b98-7e1844403c65@enterprisedb.com обсуждение исходный текст |
Ответ на | Do we want a hashset type? ("Joel Jacobson" <joel@compiler.org>) |
Ответы |
Re: Do we want a hashset type?
("Joel Jacobson" <joel@compiler.org>)
|
Список | pgsql-hackers |
On 5/31/23 16:09, Joel Jacobson wrote: > Hi, > > I've been working with a social network start-up that uses PostgreSQL as > their > only database. Recently, they became interested in graph databases, largely > because of an article [1] suggesting that a SQL database "just chokes" > when it > encounters a depth-five friends-of-friends query (for a million users having > 50 friends each). > > The article didn't provide the complete SQL queries, so I had to buy the > referenced book to get the full picture. It turns out, the query was a > simple > self-join, which, of course, isn't very efficient. When we rewrote the query > using a modern RECURSIVE CTE, it worked but still took quite some time. > > Of course, there will always be a need for specific databases, and some > queries > will run faster on them. But, if PostgreSQL could handle graph queries > with a > Big-O runtime similar to graph databases, scalability wouldn't be such a big > worry. > > Just like the addition of the JSON type to PostgreSQL helped reduce the hype > around NoSQL, maybe there's something simple that's missing in > PostgreSQL that > could help us achieve the same Big-O class performance as graph > databases for > some of these type of graph queries? > > Looking into the key differences between PostgreSQL and graph databases, > it seems that one is how they store adjacent nodes. In SQL, a graph can be > represented as one table for the Nodes and another table for the Edges. > For a friends-of-friends query, we would need to query Edges to find > adjacent > nodes, recursively. > > Graph databases, on the other hand, keep adjacent nodes immediately > accessible > by storing them with the node itself. This looks like a major difference in > terms of how the data is stored. > > Could a hashset type help bridge this gap? > > The idea would be to store adjacent nodes as a hashset column in a Nodes > table. > I think this needs a better explanation - what exactly is a hashset in this context? Something like an array with a hash for faster lookup of unique elements, or what? Presumably it'd store whole adjacent nodes, not just some sort of node ID. So what if a node is adjacent to many other nodes? What if a node is added/deleted/modified? AFAICS the main problem is the lookups of adjacent nodes, generating lot of random I/O etc. Presumably it's not that hard to keep the "relational" schema with table for vertices/edges, and then an auxiliary table with adjacent nodes grouped by node, possibly maintained by a couple triggers. A bit like an "aggregated index" except the queries would have to use it explicitly. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Daniel GustafssonДата:
Сообщение: Re: Refactor ssl tests to avoid using internal PostgreSQL::Test::Cluster methods