Обсуждение: PostgreSQL as a triple store
Hello,
I'm looking into gathering information about geographical locations:
- Coordinates
- Place name
- Pictures
- etc.
Since a place can be anything, a specific place may need any kind of data type to be described.
I'm therefore looking into using the same method as the semantic Web and trying to describe a place with triples.
1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture, ...).
Using this, I can easily add a new resource and describe it by adding new triples. Every item is identified using UUIDs as primary key.
This seems too simple to be true so I falling back to you in case you see any pitfalls here.
The "triples" table will grow very fast in this setup and may become a bottleneck. However, I don't see any other way to store information about something as wide as places around the world.
Regards,
Jimmy
Hello,
I'm looking into gathering information about geographical locations:
- Coordinates
- Place name
- Pictures
- etc.Since a place can be anything, a specific place may need any kind of data type to be described.
I'm therefore looking into using the same method as the semantic Web and trying to describe a place with triples.
1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture, ...).Using this, I can easily add a new resource and describe it by adding new triples. Every item is identified using UUIDs as primary key.
This seems too simple to be true so I falling back to you in case you see any pitfalls here.
The "triples" table will grow very fast in this setup and may become a bottleneck. However, I don't see any other way to store information about something as wide as places around the world.
Regards,
Jimmy
We are currently working on in-database semantic reasoning. However, if you do not need any T-Box or A-Box reasoning, maybe JSON is a better choice in your case. Am 12.08.2014 15:19, schrieb Jimmy Thrasibule: > > Hello, > > I'm looking into gathering information about geographical locations: > > - Coordinates > - Place name > - Pictures > - etc. > > Since a place can be anything, a specific place may need any kind of > data type to be described. > > I'm therefore looking into using the same method as the semantic Web > and trying to describe a place with triples. > > 1. The "triples" table will reference a subject, attribute and value. > 2. The "attributes" table will have an attribute name, a type and > mauve a table name. > 3. For each attribute type, a new table is created to store the values. > 4. A "resources" table list all the available resources (location, > picture, ...). > > Using this, I can easily add a new resource and describe it by adding > new triples. Every item is identified using UUIDs as primary key. > > This seems too simple to be true so I falling back to you in case you > see any pitfalls here. > > The "triples" table will grow very fast in this setup and may become a > bottleneck. However, I don't see any other way to store information > about something as wide as places around the world. > > Regards, > Jimmy >
> Is there a reason why hstore or json is not an option? That may work a lot > better than this approach. I don't want to move away from SQL common features so I can have a test environment using SQLite and deploy on PostgreSQL. This approach looks elegant and simple to me. Using a new table per attribute type or even per attribute for the values, I can easily index them for quick lookups.
On 12/08/2014 15:57, Jimmy Thrasibule wrote: >> Is there a reason why hstore or json is not an option? That may work a lot >> better than this approach. > > I don't want to move away from SQL common features so I can have a > test environment using SQLite and deploy on PostgreSQL. This approach > looks elegant and simple to me. Using a new table per attribute type > or even per attribute for the values, I can easily index them for > quick lookups. What is your test environment? Postgres works quite happily on my rather low-powered Windows 7 laptop, and so I have all the bells and whistles available for development. Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On Tue, 12 Aug 2014 16:57:32 +0200 Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote: > > Is there a reason why hstore or json is not an option? That may work a lot > > better than this approach. > > I don't want to move away from SQL common features so I can have a > test environment using SQLite and deploy on PostgreSQL. This approach > looks elegant and simple to me. Using a new table per attribute type > or even per attribute for the values, I can easily index them for > quick lookups. Couple of things to keep in mind: * SQLLite and PostgreSQL are not 100% compatable. It's cheap and easy to set up PostgreSQL for testing/development, so it seems silly (to me) to test on something that might behave differently than the production environment. * Any setup where the application can execute DDL is a potential security concern. If the code can manipulate tables then a bug in the code can allow an accidental or malicious user to quickly and easily destroy data. -- Bill Moran I need your help to succeed: http://gamesbybill.com
On 8/12/2014 6:19 AM, Jimmy Thrasibule wrote: > > I'm looking into gathering information about geographical locations: > > - Coordinates > - Place name > - Pictures > - etc. > > Since a place can be anything, a specific place may need any kind of > data type to be described. > > I'm therefore looking into using the same method as the semantic Web > and trying to describe a place with triples. > > 1. The "triples" table will reference a subject, attribute and value. > 2. The "attributes" table will have an attribute name, a type and > mauve a table name. > 3. For each attribute type, a new table is created to store the values. > 4. A "resources" table list all the available resources (location, > picture, ...). > > Using this, I can easily add a new resource and describe it by adding > new triples. Every item is identified using UUIDs as primary key. > that is a variation of an anti-pattern known as EAV (Entity-Attribute-Value). its impossible to implement well-performing queries with this as you'll need to make several sequential queries each time, since table names can't be 'dynamic' in a sql query by design. (query triples join attributes, case on the attribute type, query appropriate attribute table and possibly resources) http://mikesmithers.wordpress.com/2013/12/22/the-anti-pattern-eavil-database-design/ -- john r pierce 37N 122W somewhere on the middle of the left coast
Couple of things to keep in mind:On Tue, 12 Aug 2014 16:57:32 +0200
Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:
> > Is there a reason why hstore or json is not an option? That may work a lot
> > better than this approach.
>
> I don't want to move away from SQL common features so I can have a
> test environment using SQLite and deploy on PostgreSQL. This approach
> looks elegant and simple to me. Using a new table per attribute type
> or even per attribute for the values, I can easily index them for
> quick lookups.
* SQLLite and PostgreSQL are not 100% compatable. It's cheap and easy to
set up PostgreSQL for testing/development, so it seems silly (to me) to test
on something that might behave differently than the production environment.
* Any setup where the application can execute DDL is a potential security
concern. If the code can manipulate tables then a bug in the code can
allow an accidental or malicious user to quickly and easily destroy data.
'; DROP TABLE things; --
--
Bill Moran
I need your help to succeed:
http://gamesbybill.com
I've developed solutions where I've used EAV along with a more relational design, they were delivered on time, performed at the expected/acceptable level, so no problems. Then again, I've tried to use EAV bluntly in a healthcare related project and it died on me quite quickly when the data grew.
Seref
Hello,
I'm looking into gathering information about geographical locations:
- Coordinates
- Place name
- Pictures
- etc.Since a place can be anything, a specific place may need any kind of data type to be described.
I'm therefore looking into using the same method as the semantic Web and trying to describe a place with triples.
1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture, ...).Using this, I can easily add a new resource and describe it by adding new triples. Every item is identified using UUIDs as primary key.
This seems too simple to be true so I falling back to you in case you see any pitfalls here.
The "triples" table will grow very fast in this setup and may become a bottleneck. However, I don't see any other way to store information about something as wide as places around the world.
Regards,
Jimmy
Why? Because we don't know how much data you're going to process with this design, with what kind of hardware.As you can see from John R Pierce's response it is only matter of time before someone (correctly) warns you about the performance issues with EAV derivatives, but then the conversation becomes too abstract, at least in my opinion.Hi Jimmy,I think you're going to need to perform experiments and decide if the flexibility you get is worth the performance you're going to lose.
> I think you're going to need to perform experiments and decide if the flexibility you get is worth the performance you're going to lose.
> As you can see from John R Pierce's response it is only matter of time before someone (correctly) warns you about the performance issues with EAV derivatives, but then the conversation becomes too abstract, at least in my opinion.
> Why? Because we don't know how much data you're going to process with this design, with what kind of hardware.
My data model is not completely ready yet and is still being worked on. But yes, I think I have first to think more about how I will be using the data before looking at a storage model.
I will indeed run some experiments before but using this model, I was planning to but some complexity in the code.
> Also, your requirements sounds related to some projects that use PostGis, have you taken a look at that?
Never heard of it, I'll give it a look.
---
Jimmy