Обсуждение: Need advise for database structure for non linear data.

Поиск
Список
Период
Сортировка

Need advise for database structure for non linear data.

От
Andre Lopes
Дата:
Hi,

I need advise about a database structure. I need to capture data from the web about one specific subject on few specific websites and insert that data to a database. I have done this question here before, but I think I have not explained very well.

The problem with this task is that the information is not linear, if I try to design tables with fields for all possible data I will end up with many row fields with NULL values. There are any problem with this(end up with many row fields with NULL values)? Or should I user other kind of structure? For example store the data in one field and that field containing an associative array with data.

What I mean with non linear data is the following:

    array(
          'name' => 'Don',
          'age'  => '31'
         );
   
   
    array(
          'name' => 'Peter',
          'age'  => '28',
          'car'  => 'ford',
          'km'   => '2000'
         );

In a specific website search I will store only "name" and "age", and in other website I will store "name", "age", "car" and "km".

I don't know If I explain weel my problem. My english is not very good.

Best Regards.

Re: Need advise for database structure for non linear data.

От
Thomas Schmidt
Дата:
  Hello,

Am 03.01.11 12:11, schrieb Andre Lopes:
> Hi,
>
> I need advise about a database structure. I need to capture data from the
> web about one specific subject on few specific websites and insert that data
> to a database. I have done this question here before, but I think I have not
> explained very well.
>
> What I mean with non linear data is the following:
>
>      array(
>            'name' =>  'Don',
>            'age'  =>  '31'
>           );
>
>
>      array(
>            'name' =>  'Peter',
>            'age'  =>  '28',
>            'car'  =>  'ford',
>            'km'   =>  '2000'
>           );
>
> In a specific website search I will store only "name" and "age", and in
> other website I will store "name", "age", "car" and "km".
>
> I don't know If I explain weel my problem. My english is not very good.
>
In theory, using a single table having three columns
(array-id,key,value) will suit your needs.
However, providing a simple key/value store is not the idea behind DBMS
like postgres ...
See:
http://en.wikipedia.org/wiki/NoSQL
http://en.wikipedia.org/wiki/Relational_database_management_system

Thomas



Re: Need advise for database structure for non linear data.

От
Thomas Kellerer
Дата:
Andre Lopes wrote on 03.01.2011 12:11:

>      array(
> 'name' => 'Don',
> 'age'  => '31'
>           );
>
>
>      array(
> 'name' => 'Peter',
> 'age'  => '28',
> 'car'  => 'ford',
> 'km'   => '2000'
>           );
>
> In a specific website search I will store only "name" and "age", and
> in other website I will store "name", "age", "car" and "km".
>
> I don't know If I explain weel my problem. My english is not very
> good.

That's exactly what the hstore data type supports:

http://www.postgresql.org/docs/current/static/hstore.html

Regards
Thomas

Re: Need advise for database structure for non linear data.

От
Radosław Smogura
Дата:
 I can propose you something like this:

 website(id int, url varchar);
 attr_def (id int, name varchar);
 attr_val (id int, def_id reference attr_def.id, website_id int
 references website.id, value varchar);
 If all of your attributes in website are single valued then you can
 remove id from attr_val and use PK from website_id, def_id.

 Depending on your needs one or many from following indexes:
 attr_val(value) - search for attributes with value;
 attr_val(def_id, value) - search for given attributes with values;
 attr_val(website_id, def_id, value) - checks, if given site has
 attribue, search by values for given site and attribute;
 attr_val(def_id, website_id) - like above, without value searching;
 attr_val(website_id, value) - search for attributes on given site with
 value.
 Probably you will use 2nd or 3rd index.

 Example of search on website
 select d.name, v.value from attre_def d join attr_val v on (v.def_id =
 d.id) join website w on (v.website_id = w.id)
 where d.name = 'xxxx' and w.url='http://somtehing'

 This is common map structure.

 Kind regards,
 Radosław Smogura

 On Mon, 03 Jan 2011 12:26:45 +0100, Thomas Schmidt
 <postgres@stephan.homeunix.net> wrote:
> Hello,
>
> Am 03.01.11 12:11, schrieb Andre Lopes:
>> Hi,
>>
>> I need advise about a database structure. I need to capture data
>> from the
>> web about one specific subject on few specific websites and insert
>> that data
>> to a database. I have done this question here before, but I think I
>> have not
>> explained very well.
>>
>> What I mean with non linear data is the following:
>>
>>      array(
>>            'name' =>  'Don',
>>            'age'  =>  '31'
>>           );
>>
>>
>>      array(
>>            'name' =>  'Peter',
>>            'age'  =>  '28',
>>            'car'  =>  'ford',
>>            'km'   =>  '2000'
>>           );
>>
>> In a specific website search I will store only "name" and "age", and
>> in
>> other website I will store "name", "age", "car" and "km".
>>
>> I don't know If I explain weel my problem. My english is not very
>> good.
>>
> In theory, using a single table having three columns
> (array-id,key,value) will suit your needs.
> However, providing a simple key/value store is not the idea behind
> DBMS like postgres ...
> See:
> http://en.wikipedia.org/wiki/NoSQL
> http://en.wikipedia.org/wiki/Relational_database_management_system
>
> Thomas


Re: Need advise for database structure for non linear data.

От
Thomas Schmidt
Дата:
  Hello,

Am 03.01.11 12:46, schrieb Radosław Smogura:
> I can propose you something like this:
>
> website(id int, url varchar);
> attr_def (id int, name varchar);
> attr_val (id int, def_id reference attr_def.id, website_id int
> references website.id, value varchar);
> If all of your attributes in website are single valued then you can
> remove id from attr_val and use PK from website_id, def_id.
>
> Depending on your needs one or many from following indexes:
> attr_val(value) - search for attributes with value;
(...)
> Probably you will use 2nd or 3rd index.
>
> Example of search on website
> select d.name, v.value from attre_def d join attr_val v on (v.def_id =
> d.id) join website w on (v.website_id = w.id)
> where d.name = 'xxxx' and w.url='http://somtehing'

Imho its hard - (if not impossible) to recommand a specific database
scheme (incl indexes) without knowing the applications taking plance
behind it.
Your schema is nice for specific querying, but might blow up if lots of
data is stored in the database (joins, index-building might be time
consuming).
On the other hand, google put some effort into their "BigTable"
http://en.wikipedia.org/wiki/BigTable for storing tons of data...

Thus - it all depends on the usage :-)

Thomas


Re: Need advise for database structure for non linear data.

От
Andre Lopes
Дата:
Hi,

Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura proposal. There will be about 100 websites to capture data on daily basis. Each website adds per day(average) 2 articles.

Thomas talked about the noSQL possibility. What do you think would be better? I have no experience in noSQL and that could be a weakness.

Best Regards,
André




On Mon, Jan 3, 2011 at 11:58 AM, Thomas Schmidt <postgres@stephan.homeunix.net> wrote:
 Hello,

Am 03.01.11 12:46, schrieb Radosław Smogura:

I can propose you something like this:

website(id int, url varchar);
attr_def (id int, name varchar);
attr_val (id int, def_id reference attr_def.id, website_id int references website.id, value varchar);
If all of your attributes in website are single valued then you can remove id from attr_val and use PK from website_id, def_id.

Depending on your needs one or many from following indexes:
attr_val(value) - search for attributes with value;
(...)

Probably you will use 2nd or 3rd index.

Example of search on website
select d.name, v.value from attre_def d join attr_val v on (v.def_id = d.id) join website w on (v.website_id = w.id)
where d.name = 'xxxx' and w.url='http://somtehing'

Imho its hard - (if not impossible) to recommand a specific database scheme (incl indexes) without knowing the applications taking plance behind it.
Your schema is nice for specific querying, but might blow up if lots of data is stored in the database (joins, index-building might be time consuming).
On the other hand, google put some effort into their "BigTable"  http://en.wikipedia.org/wiki/BigTable for storing tons of data...

Thus - it all depends on the usage :-)


Thomas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Need advise for database structure for non linear data.

От
Thomas Schmidt
Дата:
  Hello,

Am 03.01.11 14:14, schrieb Andre Lopes:
> Hi,
>
> Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura
> proposal. There will be about 100 websites to capture data on daily basis.
> Each website adds per day(average) 2 articles.
>
> Thomas talked about the noSQL possibility. What do you think would be
> better? I have no experience in noSQL and that could be a weakness.
>
Imho RDBMS do a very good job in managing data on a relational basis.
However - there are alternatives and use cases for 'em and there is no
holy grail...
Not having any experience is a good point for not using it in production
:-).
However, if you've time to spare, looking into database design
(plain-sql and not-only-sql) will help.

I don't think that you get in trouble with a few hundered rows per day,
but keep in mind, what queries are used.

Thomas


Re: Need advise for database structure for non linear data.

От
Fredric Fredricson
Дата:
On 01/03/2011 12:11 PM, Andre Lopes wrote:
> [snip]
> The problem with this task is that the information is not linear, if I
> try to design tables with fields for all possible data I will end up
> with many row fields with NULL values. There are any problem with
> this(end up with many row fields with NULL values)? Or should I user
> other kind of structure? For example store the data in one field and
> that field containing an associative array with data.
As far as I understand NULL values are not really stored and a column
with many NULLs is not a problem as such, but if it is part of an index
the index might not be very useful.

At least that's my understanding of how SQL databases work. If I got
this wrong I hope someone will correct me.
> [snip]
/Fredric

Вложения

Re: Need advise for database structure for non linear data.

От
Stefan Keller
Дата:
Andre,

From a distant view of your problem I would like to vote for Thomas
Kellerer's proposal:
Maintain only the data you need (to enhance import/sync performance)
and use the hstore data type (as long as query performance is ok).

Yours, S.

2011/1/3 Fredric Fredricson <Fredric.Fredricson@bonetmail.com>:
>
> On 01/03/2011 12:11 PM, Andre Lopes wrote:
>>
>> [snip]
>> The problem with this task is that the information is not linear, if I try
>> to design tables with fields for all possible data I will end up with many
>> row fields with NULL values. There are any problem with this(end up with
>> many row fields with NULL values)? Or should I user other kind of structure?
>> For example store the data in one field and that field containing an
>> associative array with data.
>
> As far as I understand NULL values are not really stored and a column with
> many NULLs is not a problem as such, but if it is part of an index the index
> might not be very useful.
>
> At least that's my understanding of how SQL databases work. If I got this
> wrong I hope someone will correct me.
>>
>> [snip]
>
> /Fredric
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>