Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Do we want a hashset type?
Дата
Msg-id 501f66fc-c113-a64c-3b98-7e1844403c65@enterprisedb.com
обсуждение исходный текст
Ответ на Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Список pgsql-hackers

On 5/31/23 16:09, Joel Jacobson wrote:
> Hi,
> 
> I've been working with a social network start-up that uses PostgreSQL as
> their
> only database. Recently, they became interested in graph databases, largely
> because of an article [1] suggesting that a SQL database "just chokes"
> when it
> encounters a depth-five friends-of-friends query (for a million users having
> 50 friends each).
> 
> The article didn't provide the complete SQL queries, so I had to buy the
> referenced book to get the full picture. It turns out, the query was a
> simple
> self-join, which, of course, isn't very efficient. When we rewrote the query
> using a modern RECURSIVE CTE, it worked but still took quite some time.
> 
> Of course, there will always be a need for specific databases, and some
> queries
> will run faster on them. But, if PostgreSQL could handle graph queries
> with a
> Big-O runtime similar to graph databases, scalability wouldn't be such a big
> worry.
> 
> Just like the addition of the JSON type to PostgreSQL helped reduce the hype
> around NoSQL, maybe there's something simple that's missing in
> PostgreSQL that
> could help us achieve the same Big-O class performance as graph
> databases for
> some of these type of graph queries?
> 
> Looking into the key differences between PostgreSQL and graph databases,
> it seems that one is how they store adjacent nodes. In SQL, a graph can be
> represented as one table for the Nodes and another table for the Edges.
> For a friends-of-friends query, we would need to query Edges to find
> adjacent
> nodes, recursively.
> 
> Graph databases, on the other hand, keep adjacent nodes immediately
> accessible
> by storing them with the node itself. This looks like a major difference in
> terms of how the data is stored.
> 
> Could a hashset type help bridge this gap?
> 
> The idea would be to store adjacent nodes as a hashset column in a Nodes
> table.
> 

I think this needs a better explanation - what exactly is a hashset in
this context? Something like an array with a hash for faster lookup of
unique elements, or what?

Presumably it'd store whole adjacent nodes, not just some sort of node
ID. So what if a node is adjacent to many other nodes? What if a node is
added/deleted/modified?

AFAICS the main problem is the lookups of adjacent nodes, generating
lot of random I/O etc. Presumably it's not that hard to keep the
"relational" schema with table for vertices/edges, and then an auxiliary
table with adjacent nodes grouped by node, possibly maintained by a
couple triggers. A bit like an "aggregated index" except the queries
would have to use it explicitly.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Refactor ssl tests to avoid using internal PostgreSQL::Test::Cluster methods
Следующее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Do we want a hashset type?