Обсуждение: journaling / time travel
Hi,
In a joint effort with some parties, we have created a large database (1.1 TB) of records that change only occasionally. Mainly, more of them are added to the database in a slow rate (1.1 TB built up in 10 years).The use case of legal disputes being fought with our data as evidence and digging up the exact data from a certain point of time never occurred in those 10 years, and it is unlikely that it ever will.
But it might, if anyone could reasonably expect this to be possible.
Also, it has been argued that this journal is a good thing, because it could save us in case of a unforeseen creeping corruption of data or some other catastrophe that isn't covered by our backup system. Someone described this as a good practice, resulting in the fact that we still do this.
Now, i don't like this blown up journal that we don't ever use, so i would love to lose it.
But indeed we should do anything we reasonably can to protect that data.
My question to you all is:
* Is the legal thing actualy something one could expect of us?
* Is the legal thing actualy something one could expect of us?
* Is the security thing really a good practice?
* Is this a common use case that is normally solved with standard components?
One thing: PITR would be a candidate for a standard component, but it takes up much more space. We could double the retention time if the journal were to be axed, but that would probably not be enough.
Cheers,
--
Willy-Bas Loos
On Mon, Sep 19, 2016 at 5:48 PM, Willy-Bas Loos <willybas@gmail.com> wrote:
The use case of legal disputes being fought with our data as evidence and digging up the exact data from a certain point of time never occurred in those 10 years, and it is unlikely that it ever will.But it might, if anyone could reasonably expect this to be possible.
Just to clarify, this is not a specific feature that we are bound to in an SLA, or anything like that.
It is just argued that we should keep the feature because a judge may ask for it.
I'm not asking for official legal advice on this list, just for common sense.
We are based in the Netherlands and any legal action of relevance to this question would take place here.
We are based in the Netherlands and any legal action of relevance to this question would take place here.
Cheers,
--
Willy-Bas Loos
On 09/19/2016 08:48 AM, Willy-Bas Loos wrote: > Since records can > be changed afterwards, it has been argued that we should have > "journaling", meaning that every change to the data is saved in a > separate schema that holds a "journaling" copy of each table I don't think this is especially unusual. I've worked on similar projects that maintain history for regulatory reasons. I can't speak to your legal situation, but I don't think keeping history is a strange choice. Re the security argument: it seems to me it depends on whether you can restrict access to the journal while providing access to the normal tables. I guess journaling would help for some threats but not others, right? If regular users can't touch the journal, then I agree keeping one could make sense. Are you saying the journaling portion is 624GB on top of 1.1TB? Or that of the 1.1TB, 624GB of it is from journaling? Either way it doesn't seem like a massive cost to me. I don't think PITR is an appropriate solution to keeping a 10-year history of changes. It sounds like you're not looking for a green-field solution, but just trying to get perspective on what others are doing. Some resources for this that might help you: https://www.youtube.com/watch?v=TRgni5q0YM8 https://github.com/arkhipov/temporal_tables (aka http://pgxn.org/dist/temporal_tables/) http://www.cs.arizona.edu/~rts/tdbbook.pdf (also available in print) https://www.amazon.com/Bitemporal-Data-Practice-Tom-Johnston/dp/0124080677 The two books contain multiple approaches to storing history, each with tradeoffs for ease-of-use, disk space, etc. Reading them might be overkill for you. If you want to choose one, I'd read Snodgrass. The first few chapters are tedious, but it gets better. Temporal databases are in a funny in-between zone where there is lots of research, but standard tools are fairly underdeveloped. Postgres recently added range types and exclusion constraints, which are important primitives for building a temporal system, but it is still a long way from SQL:2011, and SQL:2011 is itself a long way from everything you might want. One thing that seems lacking to me, even in the research, is how to handle DDL changes. You should be glad that you only care about audit history and not subject history too, because going bi-temporal is where you really cross over into lack of available tools, outside of a few commercial offerings. (Teradata has temporal support, using a Snodgrass-like approach that pre-dates the standard.) Paul
> On 09/19/2016 10:56 AM, Willy-Bas Loos wrote: > > On Mon, Sep 19, 2016 at 6:26 PM, Paul Jungwirth > > <pj@illuminatedcomputing.com <mailto:pj@illuminatedcomputing.com>> wrote: > > I've worked on similar > > projects that maintain history for regulatory reasons. > Can you explain "regulatory reasons" please? I mean government regulations, like HIPAA and Sarbanes-Oxley. I'm in the U.S. Since you are in the Netherlands you'll have different rules to deal with. Paul
You may also have a look on the E-Maj extension. It is available for instance here on pgxn.On 09/19/2016 08:48 AM, Willy-Bas Loos wrote: > Since records can > be changed afterwards, it has been argued that we should have > "journaling", meaning that every change to the data is saved in a > separate schema that holds a "journaling" copy of each table I don't think this is especially unusual. I've worked on similar projects that maintain history for regulatory reasons. I can't speak to your legal situation, but I don't think keeping history is a strange choice. Re the security argument: it seems to me it depends on whether you can restrict access to the journal while providing access to the normal tables. I guess journaling would help for some threats but not others, right? If regular users can't touch the journal, then I agree keeping one could make sense. Are you saying the journaling portion is 624GB on top of 1.1TB? Or that of the 1.1TB, 624GB of it is from journaling? Either way it doesn't seem like a massive cost to me. I don't think PITR is an appropriate solution to keeping a 10-year history of changes. It sounds like you're not looking for a green-field solution, but just trying to get perspective on what others are doing. Some resources for this that might help you: https://www.youtube.com/watch?v=TRgni5q0YM8 https://github.com/arkhipov/temporal_tables (aka http://pgxn.org/dist/temporal_tables/) http://www.cs.arizona.edu/~rts/tdbbook.pdf (also available in print) https://www.amazon.com/Bitemporal-Data-Practice-Tom-Johnston/dp/0124080677 The two books contain multiple approaches to storing history, each with tradeoffs for ease-of-use, disk space, etc. Reading them might be overkill for you. If you want to choose one, I'd read Snodgrass. The first few chapters are tedious, but it gets better. Temporal databases are in a funny in-between zone where there is lots of research, but standard tools are fairly underdeveloped. Postgres recently added range types and exclusion constraints, which are important primitives for building a temporal system, but it is still a long way from SQL:2011, and SQL:2011 is itself a long way from everything you might want. One thing that seems lacking to me, even in the research, is how to handle DDL changes. You should be glad that you only care about audit history and not subject history too, because going bi-temporal is where you really cross over into lack of available tools, outside of a few commercial offerings. (Teradata has temporal support, using a Snodgrass-like approach that pre-dates the standard.) Paul
Its purpose is to record all updates on tables either to examine them or to "rollback" them later. But it should be also usable for auditing.
The documentation in the package includes a detailed reference guide, but also a presentation that allows to quickly have a view on the extention.
Even if you think E-Maj doesn't fit your need, feel free to give me a feedback. It may give me thoughts for enhancements ;-)
Regards. Philippe.
Coming late to this, but ... On Mon, 19 Sep 2016 17:48:20 +0200, Willy-Bas Loos <willybas@gmail.com> wrote: >The use case of legal disputes being fought with our data as evidence and >digging up the exact data from a certain point of time never occurred in >those 10 years, and it is unlikely that it ever will. >But it might, if anyone could reasonably expect this to be possible. > > : > >My question to you all is: >* Is the legal thing actualy something one could expect of us? >* Is the security thing really a good practice? >* Is this a common use case that is normally solved with standard >components? I am not a lawyer. You don't say where you are specifically, but in the US, there is a legal notion that changes/deletions done in the "normal course of business" generally are permitted. That is, e.g., if once a month you routinely purge records older than 3 years, then you can't be expected to produce records from 4 years ago. But you have to prove to the court that this is normal for your business: e.g., show documentation of your record keeping procedures. The problem comes when you do get notice of a legal action. From that moment forward you must preserve any data that might be relevent to the case ... including any new data that is created ... in the event that it ever is subpoenaed by the court. This can become a major issue when you realize that a court case may drag on for many years, and you may not know exactly what data has to be preserved. Lawyers often go on "fishing expeditions", asking for data in many different ways [by different keywords, etc.], hoping to find something by comparing the results. Journaling solves the retention problem and may provide other nice features like an audit trail of who made the changes. Of course, journaling may take a lot of extra space unless it stores only deltas. Many locales have similar requirements for data preservation in the face of a legal action. You need to find out what is expected where you are. I'd have to advise that you talk to your lawyers rather than ask here. At least in the US, the "normal course of business" applies to archive data as well as to live data, so you may be able to limit how long you need to keep the journals. Hope this ... doesn't further confuse. George