Обсуждение: Replace anonymized data in string

Поиск
Список
Период
Сортировка

Replace anonymized data in string

От
Patrick FICHE
Дата:

Hi Team,

 

I have some data that has been anonymized and I would like to generate some test data from this. In some way, I would like to deanonymize this data with random data.

 

For example, phone numbers have been anonymized with changing the 5 right digits with the 8 digit (preserving length).

Applying this, the number 390694802756 was changed to 3906948088888.

 

I would like to get random digits at the end of the phone number knowing that anonymized data can be a variable length.

So, I would like to change every sequence (at least 2) of 8 by random value of same length (I don’t worry if phone number contains 88 in the middle and this sequence is changed to random data)…

 

I tried to do this with replace / regexp_replace functions but could not achieve what I wanted to do.

I don’t want these digits to be changed with a single one (88888 by 111111 or 99999 but something like 42384)…

Ideally, the new string would be different if multiple sequences of 8 appear in a string and would be different from one record to the other when applied to a full table…

 

Is there any way to do this ?

 

Thanks,

Patrick

Re: Replace anonymized data in string

От
Daniel Gustafsson
Дата:
> On 12 Nov 2021, at 15:12, Patrick FICHE <Patrick.Fiche@aqsacom.com> wrote:

> Is there any way to do this ?

There was a presentation on the subject of anonymization and data masking at
Fosdem PGDay 2019, maybe the slides from there can give any insights?

https://www.postgresql.eu/events/fosdem2019/schedule/session/2287-anonymization-and-data-masking-with-postgresql/

--
Daniel Gustafsson        https://vmware.com/




Re: Replace anonymized data in string

От
Rob Sargent
Дата:
On 11/12/21 7:12 AM, Patrick FICHE wrote:
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif;}div.WordSection1 {page:WordSection1;}

Hi Team,

 

I have some data that has been anonymized and I would like to generate some test data from this. In some way, I would like to deanonymize this data with random data.

 

For example, phone numbers have been anonymized with changing the 5 right digits with the 8 digit (preserving length).

Applying this, the number 390694802756 was changed to 3906948088888.

 

I would like to get random digits at the end of the phone number knowing that anonymized data can be a variable length.

So, I would like to change every sequence (at least 2) of 8 by random value of same length (I don’t worry if phone number contains 88 in the middle and this sequence is changed to random data)…

 

I tried to do this with replace / regexp_replace functions but could not achieve what I wanted to do.

I don’t want these digits to be changed with a single one (88888 by 111111 or 99999 but something like 42384)…

Ideally, the new string would be different if multiple sequences of 8 appear in a string and would be different from one record to the other when applied to a full table…

 

Is there any way to do this ?

 

Thanks,

Patrick

Usual trick it to select floor(random()*100000);

RE: Replace anonymized data in string

От
Patrick FICHE
Дата:
> On 12 Nov 2021, at 15:12, Patrick FICHE <Patrick.Fiche@aqsacom.com> wrote:

> Is there any way to do this ?

There was a presentation on the subject of anonymization and data masking at Fosdem PGDay 2019, maybe the slides from
therecan give any insights?
 

https://www.postgresql.eu/events/fosdem2019/schedule/session/2287-anonymization-and-data-masking-with-postgresql/

--

Thanks a lot for your answer.
This is a very good presentation for anonymization technics.
Unfortunately, my data has been already anonymized and I'm trying to random the anonymized part... which is a bit
differentfrom what I could find here 😊