Обсуждение: Using GUIDs

Поиск
Список
Период
Сортировка

Using GUIDs

От
Dennis Wagner
Дата:
Hi list!

I'm currently thinking about using GUIDs as primary keys. To be more
precise: On *some* of the tables in my structure I *will* need GUIDs (or
UUIDs or whatever format may proove usefull). Now, this implies that
using a bigserial-field for readability and the uids is redundand, so
I'm thinking of dropping my current primary-key-columns in favor of an
uuid-column.
In the postgres-docs I can't find any built-in data-types directly
supporting guids. I found the project pguuid on gborg but it somehow
lacks documentation and doesn't seem very promising (since it doesn't
seem to be touched since three years ago...).
I have found some older discussion on the archives but in the end noone
ever came up with THE solution nor did anyone talk about the
performance-impact using a guid-row as primary key would have over
using, let's say bigint.
Has anyone made some test in this field and can share some experiences?
Would this question be better placed in the hackers list?

Best regards



Dennis


Re: Using GUIDs

От
"Aaron Bono"
Дата:
By its definition, GUID ( http://en.wikipedia.org/wiki/GUID ) is not guaranteed to be unique.  As a result, it could cause problems in tables that have a large number of rows - actually this could result in hard to reproduce, seemingly random errors in your systems that use the database.  How does a GUID help you over a BIGSERIAL?

Unless you are generating the PK in multiple databases, a BIGSERIAL should act as a UUID.

Maybe if you provide a little more detail on why you want to use GUID or UUID fields we can help you come up with more options.

Thanks,
Aaron

On 6/26/06, Dennis Wagner <wagner@spheron.com> wrote:
Hi list!

I'm currently thinking about using GUIDs as primary keys. To be more
precise: On *some* of the tables in my structure I *will* need GUIDs (or
UUIDs or whatever format may proove usefull). Now, this implies that
using a bigserial-field for readability and the uids is redundand, so
I'm thinking of dropping my current primary-key-columns in favor of an
uuid-column.
In the postgres-docs I can't find any built-in data-types directly
supporting guids. I found the project pguuid on gborg but it somehow
lacks documentation and doesn't seem very promising (since it doesn't
seem to be touched since three years ago...).
I have found some older discussion on the archives but in the end noone
ever came up with THE solution nor did anyone talk about the
performance-impact using a guid-row as primary key would have over
using, let's say bigint.
Has anyone made some test in this field and can share some experiences?
Would this question be better placed in the hackers list?

Best regards



Dennis

Re: Using GUIDs

От
Dennis Wagner
Дата:
> By its definition, GUID ( http://en.wikipedia.org/wiki/GUID ) is not
> guaranteed to be unique.  As a result, it could cause problems in
> tables that have a large number of rows - actually this could result
> in hard to reproduce, seemingly random errors in your systems that use
> the database.  How does a GUID help you over a BIGSERIAL?
It would help me in restoring archived projects. There is a quite
frequent scenario where a once archived (e.g. in a file) subset of the
db-contents will be restored. In that occasion I cannot be sure wether
the bigserial-values the entries has before are free. One can easily
imagine a system holding one project v(aka. subset of the content). Then
one archives that projects, deletes the db, resets it's schema and then
enteres two new projects. Afetr that, the once archived project is
restored from the archive. That would most likely fail as the sequences
will have given the same IDs (starting with 1) to elements of the other
two projects. I would need a handmade merge which would really be painfull.
Using GUIDs that use the MAC-adresses as well as the creation-time (as
in version 4 of uuids e.g.) would most likely (I see, there IS the
probability but I doubt it happing inside a common LAN, at least in the
life-time of the product) create distinct GUIDS that would reside
side-by-side without any trouble.

>
> Unless you are generating the PK in multiple databases, a BIGSERIAL
> should act as a UUID.
>
> Maybe if you provide a little more detail on why you want to use GUID
> or UUID fields we can help you come up with more options.
>
We're talking about a Client-Server-Application that would normally be
run on about 10 to 50 clients simultanously using the same backend.
Project-archivment will be frequent as single projects are open for
approximatly one to three month.
I'm well aware that as GUIDS are not really guaranteed to be unique
there could be scenarios where double ID's would occur. But I could
easily sense that before entering the project and then do this
praticular one by hand. I don't want to do this on every project that is
restored.
As a second subject I will need to extract subsets of the content for
external use in form of lists. Using the GUIDs of the entries would make
them be more distinct and thus no addidtional numbering would be
necessary for those lists. Currently I'm storing entries of different
type in different tables. To have them distinct on a combined list I
would need both the type and the id of a single row which would make up
16 Bytes if the type is bigserial as well (which it is in my scenario).
> Thanks,
> Aaron
Thanks for your quick answer.

Best regards


Dennis



Re: Using GUIDs

От
"Aaron Bono"
Дата:

> By its definition, GUID ( http://en.wikipedia.org/wiki/GUID ) is not
> guaranteed to be unique.  As a result, it could cause problems in
> tables that have a large number of rows - actually this could result
> in hard to reproduce, seemingly random errors in your systems that use
> the database.  How does a GUID help you over a BIGSERIAL?
It would help me in restoring archived projects. There is a quite
frequent scenario where a once archived (e.g. in a file) subset of the
db-contents will be restored. In that occasion I cannot be sure wether
the bigserial-values the entries has before are free. One can easily
imagine a system holding one project v(aka. subset of the content). Then
one archives that projects, deletes the db, resets it's schema and then
enteres two new projects.

Why do you need to delete the db and reset it?  Why not just truncate the tables and leave the database and sequence numbers as they are?  And even if you do delete and reset the db, all you need to do is reseed the sequences to start where they left off so the ID remains unique even across archives.

Afetr that, the once archived project is
restored from the archive. That would most likely fail as the sequences
will have given the same IDs (starting with 1) to elements of the other
two projects. I would need a handmade merge which would really be painfull.
Using GUIDs that use the MAC-adresses as well as the creation-time (as
in version 4 of uuids e.g.) would most likely (I see, there IS the
probability but I doubt it happing inside a common LAN, at least in the
life-time of the product) create distinct GUIDS that would reside
side-by-side without any trouble.

You could also make your MAC address/timestamp separate columns which combined are the primary key.  Then you don't need any GUID.  Add an insert trigger to the tables to detect possible duplicates and tweak the timestamp a little or throw in a third smallint column into the primary key that the trigger can use to avoid the conflicts.  I do something like this with my log files - I name them with a timestamp, check to see if that name exists and then add a -1, -2, -3, etc. to the end of the file name if the file already exists.

>
> Unless you are generating the PK in multiple databases, a BIGSERIAL
> should act as a UUID.
>
> Maybe if you provide a little more detail on why you want to use GUID
> or UUID fields we can help you come up with more options.
>
We're talking about a Client-Server-Application that would normally be
run on about 10 to 50 clients simultanously using the same backend.
Project-archivment will be frequent as single projects are open for
approximatly one to three month.
I'm well aware that as GUIDS are not really guaranteed to be unique
there could be scenarios where double ID's would occur. But I could
easily sense that before entering the project and then do this
praticular one by hand. I don't want to do this on every project that is
restored.

You could also use a separate schema for each project.  Then restoring a project would not step on or conflict with other projects in any way.  And you can do unions across schemas to do detailed reporting on all projects.  Then in the public or some master schema, keep a list of the project schemas so you could even create dynamic views (via a function or stored procedure) that union all current projects/schemas together when needed.  Just an idea...

As a second subject I will need to extract subsets of the content for
external use in form of lists. Using the GUIDs of the entries would make
them be more distinct and thus no addidtional numbering would be
necessary for those lists. Currently I'm storing entries of different
type in different tables. To have them distinct on a combined list I
would need both the type and the id of a single row which would make up
16 Bytes if the type is bigserial as well (which it is in my scenario).

So there is a problem of space?  How many records are you talking about that makes adding a type such a concern?

-Aaron

Re: Using GUIDs

От
Jim Nasby
Дата:
On Jun 26, 2006, at 9:05 AM, Dennis Wagner wrote:
> I'm currently thinking about using GUIDs as primary keys.

Look on pgFoundry.org; I'm pretty sure there's some kind of UUID type
there.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



Re: Using GUIDs

От
"Paul S"
Дата:
I have implemented GUID's in an application that needed them.  We needed globally unique PK's and although Postgres did not offer them as a datatype, we used C# to perform that function.  The Postgres database attribute ended up being a varchar(39).  Not as performant as having Postgres natively do it but it works. :)  The good thing is that the GUID is calculated before the insert into the database so it's actually using the middle tier's resources to compute it. (ok, it's a small amount but it's nice that the DB didn't have to pay that price) 
 
Paul
 
 


 
On 6/27/06, Jim Nasby <jnasby@pervasive.com> wrote:
On Jun 26, 2006, at 9:05 AM, Dennis Wagner wrote:
> I'm currently thinking about using GUIDs as primary keys.

Look on pgFoundry.org; I'm pretty sure there's some kind of UUID type
there.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software       http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly