Обсуждение: [PoC] run SQL over ciphertext
Hi all,
We have developed an extension, allowing PostgreSQL to run queries over encrypted data. This functionality is achieved via user-defined functions that extend encrypted data types and support commonly used expression operations. Our tests validated its effectiveness with TPC-C and TPC-H benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB.
This PoC is a reimplementation fork while collaborating with a cloud database company; the aim is to enable their DBAs to manage databases without the risk of data leaks, meeting the requirements of laws such as GDPR.
I am wondering if anyone thinks this is a nice feature. If so, I am curious about the steps to further it mature and potentially have it incorporated as a part of PostgreSQL contrib.
Best regards,
Mingyu Li
We have developed an extension, allowing PostgreSQL to run queries over encrypted data. This functionality is achieved via user-defined functions that extend encrypted data types and support commonly used expression operations. Our tests validated its effectiveness with TPC-C and TPC-H benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB.
This PoC is a reimplementation fork while collaborating with a cloud database company; the aim is to enable their DBAs to manage databases without the risk of data leaks, meeting the requirements of laws such as GDPR.
I am wondering if anyone thinks this is a nice feature. If so, I am curious about the steps to further it mature and potentially have it incorporated as a part of PostgreSQL contrib.
Best regards,
Mingyu Li
Hello,
I think this is a very interesting topic, especially for European companies where data sovereignty in the cloud has become critical.
If I understand correctly, the idea is to split users into 'client users' who can see data unencrypted, and 'server users', who are administrators unable to decrypt data.
A few questions:
- how are secrets managed? Do you use a sort of vault to keep encryption keys? Is there a master key to encrypt session keys?
- what about performances? Is it possible to use indexes on encrypted columns?
Hi all,
We have developed an extension, allowing PostgreSQL to run queries over encrypted data. This functionality is achieved via user-defined functions that extend encrypted data types and support commonly used expression operations. Our tests validated its effectiveness with TPC-C and TPC-H benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB.
This PoC is a reimplementation fork while collaborating with a cloud database company; the aim is to enable their DBAs to manage databases without the risk of data leaks, meeting the requirements of laws such as GDPR.
I am wondering if anyone thinks this is a nice feature. If so, I am curious about the steps to further it mature and potentially have it incorporated as a part of PostgreSQL contrib.
Best regards,
Mingyu Li
best regards
Giampaolo Capelli
On 10.10.23 08:42, Mingyu Li wrote: > We have developed an extension, allowing PostgreSQL to run queries over > encrypted data. This functionality is achieved via user-defined > functions that extend encrypted data types and support commonly used > expression operations. Our tests validated its effectiveness with TPC-C > and TPC-H benchmarks. You may find the code here: > https://github.com/SJTU-IPADS/HEDB <https://github.com/SJTU-IPADS/HEDB>. > > This PoC is a reimplementation fork while collaborating with a cloud > database company; the aim is to enable their DBAs to manage databases > without the risk of data leaks, /meeting the requirements of laws such > as GDPR./ > > I am wondering if anyone thinks this is a nice feature. If so, I am > curious about the steps to further it mature and potentially have it > incorporated as a part of PostgreSQL contrib. FYI, see also <https://www.postgresql.org/message-id/flat/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com> for a similar project.
Hi,
> the idea is to split users into 'client users' who can see data unencrypted, and 'server users', who are administrators unable to decrypt data.
Exactly!
> how are secrets managed? Do you use a sort of vault to keep encryption keys?
Good question. The client holds the key and uses a proxy for transparent encryption. The implementation also assumes secure storage of encryption keys in hardware-protected memory called "enclaves". Only client users and server enclaves have access to the plaintext. Please take a glance at page 5 of the slide: www.usenix.org/system/files/osdi23_slides_li_mingyu_v2.pdf. Modern clouds like OVH and Azure now offer hardware enclaves. If enclaves are not available, a rich client-side proxy can be used, with extra round-trip costs.
> Is there a master key to encrypt session keys?
There should be.
> what about performances?
TPC-C overhead is <50%. TPC-H overhead ranges from 5-20 times the baseline; there is room for TPC-H improvement and we are working on it.
> Is it possible to use indexes on encrypted columns?
Yes. The extension allows client users to intentionally reveal the ordering of encrypted columns for indexing purposes.
--
Best,
Mingyu
> the idea is to split users into 'client users' who can see data unencrypted, and 'server users', who are administrators unable to decrypt data.
Exactly!
> how are secrets managed? Do you use a sort of vault to keep encryption keys?
Good question. The client holds the key and uses a proxy for transparent encryption. The implementation also assumes secure storage of encryption keys in hardware-protected memory called "enclaves". Only client users and server enclaves have access to the plaintext. Please take a glance at page 5 of the slide: www.usenix.org/system/files/osdi23_slides_li_mingyu_v2.pdf. Modern clouds like OVH and Azure now offer hardware enclaves. If enclaves are not available, a rich client-side proxy can be used, with extra round-trip costs.
> Is there a master key to encrypt session keys?
There should be.
> what about performances?
TPC-C overhead is <50%. TPC-H overhead ranges from 5-20 times the baseline; there is room for TPC-H improvement and we are working on it.
> Is it possible to use indexes on encrypted columns?
Yes. The extension allows client users to intentionally reveal the ordering of encrypted columns for indexing purposes.
--
Best,
Mingyu
Giampaolo Capelli <giampow@gmail.com> 于2023年10月10日周二 16:18写道:
Hello,I think this is a very interesting topic, especially for European companies where data sovereignty in the cloud has become critical.If I understand correctly, the idea is to split users into 'client users' who can see data unencrypted, and 'server users', who are administrators unable to decrypt data.A few questions:- how are secrets managed? Do you use a sort of vault to keep encryption keys? Is there a master key to encrypt session keys?- what about performances? Is it possible to use indexes on encrypted columns?Hi all,
We have developed an extension, allowing PostgreSQL to run queries over encrypted data. This functionality is achieved via user-defined functions that extend encrypted data types and support commonly used expression operations. Our tests validated its effectiveness with TPC-C and TPC-H benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB.
This PoC is a reimplementation fork while collaborating with a cloud database company; the aim is to enable their DBAs to manage databases without the risk of data leaks, meeting the requirements of laws such as GDPR.
I am wondering if anyone thinks this is a nice feature. If so, I am curious about the steps to further it mature and potentially have it incorporated as a part of PostgreSQL contrib.
Best regards,
Mingyu Li--best regardsGiampaolo Capelli
Hello Peter,
> https://www.postgresql.org/message-id/flat/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com
Thanks for referring me to your TCE project, nice work! It takes time to go through the long thread of discussion and the patch.
A quick question: what operations do pg_encrypted_* support? Are (in)equality checks sufficient to fulfill real-world queries?
> https://www.postgresql.org/message-id/flat/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com
Thanks for referring me to your TCE project, nice work! It takes time to go through the long thread of discussion and the patch.
A quick question: what operations do pg_encrypted_* support? Are (in)equality checks sufficient to fulfill real-world queries?
--
Best,
Mingyu
Peter Eisentraut <peter@eisentraut.org> 于2023年10月11日周三 14:43写道:
On 10.10.23 08:42, Mingyu Li wrote:
> We have developed an extension, allowing PostgreSQL to run queries over
> encrypted data. This functionality is achieved via user-defined
> functions that extend encrypted data types and support commonly used
> expression operations. Our tests validated its effectiveness with TPC-C
> and TPC-H benchmarks. You may find the code here:
> https://github.com/SJTU-IPADS/HEDB <https://github.com/SJTU-IPADS/HEDB>.
>
> This PoC is a reimplementation fork while collaborating with a cloud
> database company; the aim is to enable their DBAs to manage databases
> without the risk of data leaks, /meeting the requirements of laws such
> as GDPR./
>
> I am wondering if anyone thinks this is a nice feature. If so, I am
> curious about the steps to further it mature and potentially have it
> incorporated as a part of PostgreSQL contrib.
FYI, see also
<https://www.postgresql.org/message-id/flat/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com>
for a similar project.