Обсуждение: How to generate unique hash-type id?

Поиск
Список
Период
Сортировка

How to generate unique hash-type id?

От
Joe Kramer
Дата:
Hello,

I need to generate unique id which is not guessable unlike
serial(integer) type. I need an id in format like md5 hash of random
number.
On top of that I need this id to be unique across multiple tables.

Anyone had to solve this problem before? Can you post any recipes or
best practices please?

My questions:

1. Avoiding collisions.
If I make an UNIQUE constraint and do generation of id triggered on
INSERT. What if collision happens? DO I nee d to check if unique hash
already exists and if not- regenerate.
This looks too primitive. Is there a readily available function or
methodology to do that?

2. Generating global unique id across multiple tables.
How to do that? My only idea is to have separate table to keep all
hashes and compare for collision against that table.
Is there a better way? Maybe by creating some special serial type that
is not integer but varchar?

3. what function to use to generate 64-bit random hash without much
overhead to CPU?

Thanks.

R: How to generate unique hash-type id?

От
Vincenzo Romano
Дата:

Uuid?

Il giorno 29 gen, 2010 9:20 m., "Joe Kramer" <cckramer@gmail.com> ha scritto:

Hello,

I need to generate unique id which is not guessable unlike
serial(integer) type. I need an id in format like md5 hash of random
number.
On top of that I need this id to be unique across multiple tables.

Anyone had to solve this problem before? Can you post any recipes or
best practices please?

My questions:

1. Avoiding collisions.
If I make an UNIQUE constraint and do generation of id triggered on
INSERT. What if collision happens? DO I nee d to check if unique hash
already exists and if not- regenerate.
This looks too primitive. Is there a readily available function or
methodology to do that?

2. Generating global unique id across multiple tables.
How to do that? My only idea is to have separate table to keep all
hashes and compare for collision against that table.
Is there a better way? Maybe by creating some special serial type that
is not integer but varchar?

3. what function to use to generate 64-bit random hash without much
overhead to CPU?

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: How to generate unique hash-type id?

От
Adrian von Bidder
Дата:
Hi,

On Friday 29 January 2010 09.20:33 Joe Kramer wrote:
> I need to generate unique id which is not guessable unlike
> serial(integer) type. I need an id in format like md5 hash of random
> number.
> On top of that I need this id to be unique across multiple tables.

Have a look at http://www.postgresql.org/docs/8.3/static/uuid-ossp.html

The usual approach is that (given a sensible random generator[1]) uuid are
assumed to be unique[2].  So you don't need to check because the probability
of collisions is so small that for practical purposes you can just ignore
it.

(If your engineer's mind balks at this, consider that you're trusting this
already when you use digital cryptography / signatures, for example https
certificates.)

I haven't looked at this module myself, but from the experience with
generating gpg keys on an appliance:  if you need lots of randomness, the
geneation of random numbers might be your bottleneck.  OTOH, our platform
didn't have disks and usually there was no network traffic while your
average db server has both, and on many systems there is a hardware random
generator, so this might not be an issue.

cheers
-- vbi


[1] like, for example: http://www.dilbert.com/strips/comic/2001-10-25/
[2] you'll want v4 uuids
--
Linus has opinions, I have opinions, everybody else has opinions, and
the only consistency here is that most of us are wrong most of the time.
        -- Andrew Morton, OLS 2004

Вложения

Re: How to generate unique hash-type id?

От
Magnus Hagander
Дата:
On Fri, Jan 29, 2010 at 10:31, Adrian von Bidder <avbidder@fortytwo.ch> wrote:
>
> Hi,
>
> On Friday 29 January 2010 09.20:33 Joe Kramer wrote:
>> I need to generate unique id which is not guessable unlike
>> serial(integer) type. I need an id in format like md5 hash of random
>> number.
>> On top of that I need this id to be unique across multiple tables.
>
> Have a look at http://www.postgresql.org/docs/8.3/static/uuid-ossp.html
>
> The usual approach is that (given a sensible random generator[1]) uuid are
> assumed to be unique[2].  So you don't need to check because the probability
> of collisions is so small that for practical purposes you can just ignore
> it.
>
> (If your engineer's mind balks at this, consider that you're trusting this
> already when you use digital cryptography / signatures, for example https
> certificates.)

uuid:s are, AFAIK, not cryptographically strong. They are predictable
- a lot less predictable than a sequence, but still. If you want
secure random numbers, you need to look at pgcrypto -
http://www.postgresql.org/docs/current/static/pgcrypto.html


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: How to generate unique hash-type id?

От
Joe Kramer
Дата:
We have bunch of servers running the app and rebuilding postgres with
support for ossp_uuid on all servers is time consuming.
Is there a way of doing it without third party dependency like
ossp_uuid? Should I just run md5(random number), will itbe the same ?.
According to description it seems that all uuid_generate_v4() does is
simply generate md5 of random number.

Thanks.

On Fri, Jan 29, 2010 at 8:31 PM, Adrian von Bidder <avbidder@fortytwo.ch> wrote:
>
> Hi,
>
> On Friday 29 January 2010 09.20:33 Joe Kramer wrote:
>> I need to generate unique id which is not guessable unlike
>> serial(integer) type. I need an id in format like md5 hash of random
>> number.
>> On top of that I need this id to be unique across multiple tables.
>
> Have a look at http://www.postgresql.org/docs/8.3/static/uuid-ossp.html
>
> The usual approach is that (given a sensible random generator[1]) uuid are
> assumed to be unique[2].  So you don't need to check because the probability
> of collisions is so small that for practical purposes you can just ignore
> it.
>
> (If your engineer's mind balks at this, consider that you're trusting this
> already when you use digital cryptography / signatures, for example https
> certificates.)
>
> I haven't looked at this module myself, but from the experience with
> generating gpg keys on an appliance:  if you need lots of randomness, the
> geneation of random numbers might be your bottleneck.  OTOH, our platform
> didn't have disks and usually there was no network traffic while your
> average db server has both, and on many systems there is a hardware random
> generator, so this might not be an issue.
>
> cheers
> -- vbi
>
>
> [1] like, for example: http://www.dilbert.com/strips/comic/2001-10-25/
> [2] you'll want v4 uuids
> --
> Linus has opinions, I have opinions, everybody else has opinions, and
> the only consistency here is that most of us are wrong most of the time.
>        -- Andrew Morton, OLS 2004
>

Re: How to generate unique hash-type id?

От
Adrian von Bidder
Дата:
On Friday 29 January 2010 11.21:00 Joe Kramer wrote:
> We have bunch of servers running the app and rebuilding postgres with
> support for ossp_uuid on all servers is time consuming.
> Is there a way of doing it without third party dependency like
> ossp_uuid? Should I just run md5(random number), will itbe the same ?

If you're building your own: at least use sha1 instead of md5.

(Even md5 *should* be safe in the absence of malicious attacks, but md5 is
generally  not recommended anymore.)

Everything depends on the quality of your random numbers.  I don't know how
much randomness pg's random() delivers, and as I've said I haven't looked
what the uuid module does.

(To give you an example: if random() only delivers a random 32 bit float
value, the 160 bits of SHA-1 will not be used.  You'll only use 4 billion
different values and you *will* soon get collisions.)

If I were to roll my own, I'd just use 256 bit of /dev/random (or, depending
on the application, possibly /dev/urandom and take the risk that my values
aren't that random.)  Since it's random anyway, there's no need to use a
hash.  (Not sure: can a SQL function read arbitrary binary files or will a C
module be necessary?)

Speed: just did a quick test on one machine.  reading 1kB from /dev/random
takes about 1s.  (constant 5MB/s disk activity with lots of seeking going
on, no hw random device.)  So you'd get ca. 32 id values per second.  Don't
know if that's a lot or not for your application.

Magnus: can you elaborate on uuid not being secure?  AFAICT v4 uuid are
supposed to be essentially a random number formatted in a certain way.

cheers
-- vbi


--
featured product: GNU Privacy Guard - http://gnupg.org

Вложения

Re: How to generate unique hash-type id?

От
hubert depesz lubaczewski
Дата:
On Fri, Jan 29, 2010 at 07:20:33PM +1100, Joe Kramer wrote:
> I need to generate unique id which is not guessable unlike
> serial(integer) type. I need an id in format like md5 hash of random
> number.

check this blogpost:
http://www.depesz.com/index.php/2007/06/25/random-text-record-identifiers/

> On top of that I need this id to be unique across multiple tables.

just add table id to the generated id.

for example: id "xxx" in table users, is globally unique (for your
database) when you write it: "users:xxx"

if, for some weird reason, you don't want to put table name on its own
in your key (why?) then just use some dictionary table, that will link
those keys with your table.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: How to generate unique hash-type id?

От
Joe Kramer
Дата:
Thanks for the answer,

I am unable to use ossp_uuid due to package install and/or server
rebuild requirement.

So I am trying to roll my own, and
digest(quote_literal(random()+random()), 'sha256'), 'hex') doesn't
work:

I have created this table and inserted 200000 rows (two million).
This is more or less now my application looks now. It uses bigserial.
And I need to add some unique hash:
CREATE TABLE item
(
  item_id bigserial NOT NULL,
  title character varying,
  CONSTRAINT pk PRIMARY KEY (item_id)
)
WITH (
  OIDS=FALSE
);


Now I add the hash column:
ALTER TABLE item ADD COLUMN hash1 character varying NOT NULL DEFAULT
encode(digest(quote_literal(random()+random()), 'sha256'), 'hex');
ALTER TABLE item ADD UNIQUE (hash1);

When I executed this two statements, ALTER TABLE ADD COUMN, ADD
UNIQUE, after  20 seconds I got this message:
NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index
"item_hash1_key" for table "item"
ERROR:  could not create unique index "item_hash1_key"
DETAIL:  Table contains duplicated values.
********* Error **********
ERROR: could not create unique index "item_hash1_key"
SQL state: 23505
Detail: Table contains duplicated values.

So this means random()+random() is not random even within 2,000,000 iterations!

If you suggest accessing /dev/urandom directly- I cannot do that
because my application runs on mac,windows and linux. It would be
maintenance nightmare.

Any suggestions?

Thanks.

On Fri, Jan 29, 2010 at 10:20 PM, Adrian von Bidder
<avbidder@fortytwo.ch> wrote:
> On Friday 29 January 2010 11.21:00 Joe Kramer wrote:
>> We have bunch of servers running the app and rebuilding postgres with
>> support for ossp_uuid on all servers is time consuming.
>> Is there a way of doing it without third party dependency like
>> ossp_uuid? Should I just run md5(random number), will itbe the same ?
>
> If you're building your own: at least use sha1 instead of md5.
>
> (Even md5 *should* be safe in the absence of malicious attacks, but md5 is
> generally  not recommended anymore.)
>
> Everything depends on the quality of your random numbers.  I don't know how
> much randomness pg's random() delivers, and as I've said I haven't looked
> what the uuid module does.
>
> (To give you an example: if random() only delivers a random 32 bit float
> value, the 160 bits of SHA-1 will not be used.  You'll only use 4 billion
> different values and you *will* soon get collisions.)
>
> If I were to roll my own, I'd just use 256 bit of /dev/random (or, depending
> on the application, possibly /dev/urandom and take the risk that my values
> aren't that random.)  Since it's random anyway, there's no need to use a
> hash.  (Not sure: can a SQL function read arbitrary binary files or will a C
> module be necessary?)
>
> Speed: just did a quick test on one machine.  reading 1kB from /dev/random
> takes about 1s.  (constant 5MB/s disk activity with lots of seeking going
> on, no hw random device.)  So you'd get ca. 32 id values per second.  Don't
> know if that's a lot or not for your application.
>
> Magnus: can you elaborate on uuid not being secure?  AFAICT v4 uuid are
> supposed to be essentially a random number formatted in a certain way.
>
> cheers
> -- vbi
>
>
> --
> featured product: GNU Privacy Guard - http://gnupg.org
>

Re: How to generate unique hash-type id?

От
Adrian von Bidder
Дата:
On Friday 29 January 2010 12.51:20 Joe Kramer wrote:
> So this means random()+random() is not random even within 2,000,000
>  iterations!
>

Exactly the issue I wrote about: random() apparently doesn't deliver enough
randomness.

Even if it did: quote_literal(random() + random()) is ca. 14 to 16 decimals,
so this gets you ca. 40 to 50 bits of randomness (again: assuming random()
actually can produce it!  You'll really have to look into the implementation
of random() for all your supported versions of pg and platforms!)

So, of the 160 bit that are possible for a sha1 hash, your're only using 25%
(and that's 25% of the bits.  That means you're only using 1e12 of the
1.4e48 possible values. which is about
0.000000000000000000000000000000000001% if my math isn't totally bogus.

(Read up about the birthday paradoxon to get an idea on how likely a
collision is depending on how long your random number is.)

cheers
-- vbi

--
featured link: http://www.pool.ntp.org

Вложения

Re: How to generate unique hash-type id?

От
"Wappler, Robert"
Дата:
On 2010-01-29, Joe Kramer wrote:

> Thanks for the answer,
>
> I am unable to use ossp_uuid due to package install and/or server
> rebuild requirement.
>
> So I am trying to roll my own, and
> digest(quote_literal(random()+random()), 'sha256'), 'hex') doesn't
work:
>

Your input value is a random number, those aren't supposed to be unique
within any predictable number of iterations. Moreover I'd suspect, that
random() + random() significantly increases the probability to create
duplicated values due to the properties of the add-operation. In the
simplest example, the first invocation may return random number 3, the
second one random number 5 creating an input of 8 for the
digest-function. Another outcome of the random number generator may
yield 7 and 1 giving the same input.

I'd suggest to use some kind of sequence or something constructed from
the primary keys. But you may still see hash collisions although the
input is different.

--
Robert...



Re: How to generate unique hash-type id?

От
Ivan Sergio Borgonovo
Дата:
On Fri, 29 Jan 2010 13:13:17 +0100
"Wappler, Robert" <rwappler@ophardt.com> wrote:

> I'd suggest to use some kind of sequence or something constructed
> from the primary keys. But you may still see hash collisions
> although the input is different.

Concatenate /* ::text */ random() with something like:

http://www.webthatworks.it/d1/node/page/pseudo_random_sequences_postgresql


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: How to generate unique hash-type id?

От
Craig Ringer
Дата:
On 29/01/2010 4:20 PM, Joe Kramer wrote:
> Hello,
>
> I need to generate unique id which is not guessable unlike
> serial(integer) type. I need an id in format like md5 hash of random
> number.
> On top of that I need this id to be unique across multiple tables.
>
> Anyone had to solve this problem before? Can you post any recipes or
> best practices please?
>
> My questions:
>
> 1. Avoiding collisions.
> If I make an UNIQUE constraint and do generation of id triggered on
> INSERT. What if collision happens? DO I nee d to check if unique hash
> already exists and if not- regenerate.
> This looks too primitive. Is there a readily available function or
> methodology to do that?
>
> 2. Generating global unique id across multiple tables.
> How to do that? My only idea is to have separate table to keep all
> hashes and compare for collision against that table.
> Is there a better way? Maybe by creating some special serial type that
> is not integer but varchar?
>
> 3. what function to use to generate 64-bit random hash without much
> overhead to CPU?

When I ran into something somewhat akin to this I asked the list about a
non-repeating pseudo-random mapping function. Daniel Verite (on this
list) enlightened me about Feistel networks/cyphers, and even posted a
PL/PgSQL implementation!

It's documented here:

http://wiki.postgresql.org/wiki/Pseudo_encrypt

and has been extremely handy. It should fit you needs - just define a
sequence that you pull new input values from, and use that same sequence
across all tables that need unique values.

It's not trivial to assert, across multiple tables, that a value is
unique. You could do it with a trigger that checks each table that uses
such values (slow-ish but effective) or maintains a side table of values
in use. In either case it'd have to be added to every table that used
the pseudo_encrypt values.

By the way, if you intend to expose these to users you might also want
some kind of data entry error checking so that a typo can't accidentally
transform id `n' to id `m' with transposition of a single digit or the
like. The luhn algorithm provides a good way to do that with a simple
check digit - it's not cryptographically strong in that if you know the
values are checked with the luhn algorithm it's trivial to re-generate
the check digit, but it helps a log against casual scanning of the
number space and against accidental user data entry error. Check out:

http://wiki.postgresql.org/wiki/Luhn_algorithm

--
Craig Ringer

Re: How to generate unique hash-type id?

От
Joe Kramer
Дата:
Thanks for that link Depesz!
It worked, I've run ALTER TABLE with your function and didn't have collisions.
I guess it's more bulletproof because random() is called not once, but
for every character therefore reducing possibility of collision by
multitude of number of bytes in hash.

CREATE OR REPLACE FUNCTION make_random_string(string_length INT4)
RETURNS TEXT
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
possible_chars TEXT =
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
output TEXT = '';
i INT4;
pos INT4;
BEGIN
FOR i IN 1..string_length LOOP
pos := 1 + cast( random() * ( length(possible_chars) - 1) as INT4 );
output := output || substr(possible_chars, pos, 1);
END LOOP;
RETURN output;
END;
$BODY$;

CREATE TABLE item
(
  item_id bigserial NOT NULL,
  title character varying,
  CONSTRAINT pk PRIMARY KEY (item_id)
)
WITH (
  OIDS=FALSE
);

....
     LOOP
          INSERT INTO item(
             title)
    VALUES ('title1');
    count = count+1;
EXIT WHEN count > 10000000;
    END LOOP;
....

ALTER TABLE item ADD COLUMN hash1 character varying NOT NULL DEFAULT
make_random_string(64);
ALTER TABLE item ADD UNIQUE (hash1);

Query returned successfully with no result in 2120670 ms.

It worked! No collisions on 10 million records.

Now a question. Is it okay to add calculated column this way by
specifying DEFAULT. Or I'm better using INSERT trigger? is DEFAULT
basically an internal insert trigger?

Thanks.

On Fri, Jan 29, 2010 at 10:50 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
> On Fri, Jan 29, 2010 at 07:20:33PM +1100, Joe Kramer wrote:
>> I need to generate unique id which is not guessable unlike
>> serial(integer) type. I need an id in format like md5 hash of random
>> number.
>
> check this blogpost:
> http://www.depesz.com/index.php/2007/06/25/random-text-record-identifiers/
>
>> On top of that I need this id to be unique across multiple tables.
>
> just add table id to the generated id.
>
> for example: id "xxx" in table users, is globally unique (for your
> database) when you write it: "users:xxx"
>
> if, for some weird reason, you don't want to put table name on its own
> in your key (why?) then just use some dictionary table, that will link
> those keys with your table.
>
> Best regards,
>
> depesz
>
> --
> Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
> jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
>