Обсуждение: Data Warehousing
Hi,
I've got a postgres database collected logged data. This data I have to keep for at least 3 years. The data in the first instance is being recorded in a postgres cluster. This then needs to be moved a reports database server for analysis. Therefore I'd like a job to dump data on the cluster say every hour and record this is in the reports database. The clustered database could be purged of say data more than a week old.
So basically I need a dump/restore that only appends new data to the reports server database.
I've googled but can't find anything, can anyone help?
Thanks
Rob
I've got a postgres database collected logged data. This data I have to keep for at least 3 years. The data in the first instance is being recorded in a postgres cluster. This then needs to be moved a reports database server for analysis. Therefore I'd like a job to dump data on the cluster say every hour and record this is in the reports database. The clustered database could be purged of say data more than a week old.
So basically I need a dump/restore that only appends new data to the reports server database.
I've googled but can't find anything, can anyone help?
Thanks
Rob
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote: > Hi, > > I've got a postgres database collected logged data. This data I have to keep > for at least 3 years. The data in the first instance is being recorded in a > postgres cluster. This then needs to be moved a reports database server for > analysis. Therefore I'd like a job to dump data on the cluster say every > hour and record this is in the reports database. The clustered database > could be purged of say data more than a week old. > > So basically I need a dump/restore that only appends new data to the reports > server database. > > I've googled but can't find anything, can anyone help? You might find an answer in partitioning your data. There's a section in the docs on it. Then you can just dump the old data from the newest couple of partitions if you're partitioning by week, and dump anything older with a simple delete where date < now() - interval '1 week' or something like that.
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote: > So basically I need a dump/restore that only appends new > data to the reports server database. I guess that will all depend on whether or not your data has a record of the time it got stuck in the cluster or not ... if there's no concept of a time-stamp attached to the records as they get entered I don't think it can be done. > Thanks > > Rob Cheers, Andrej
On 03/09/07, Scott Marlowe <scott.marlowe@gmail.com> wrote:
We're using hibernate to write to the database. Partitioning looks like it will be too much of a re-architecture. In reply to Andrej we do have a logged_time entity in the required tables. That being the case how does that help me with the tools provided?
Might I have to write a custom JDBC application to do the data migration?
Rob
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
> Hi,
>
> I've got a postgres database collected logged data. This data I have to keep
> for at least 3 years. The data in the first instance is being recorded in a
> postgres cluster. This then needs to be moved a reports database server for
> analysis. Therefore I'd like a job to dump data on the cluster say every
> hour and record this is in the reports database. The clustered database
> could be purged of say data more than a week old.
>
> So basically I need a dump/restore that only appends new data to the reports
> server database.
>
> I've googled but can't find anything, can anyone help?
You might find an answer in partitioning your data. There's a section
in the docs on it. Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.
We're using hibernate to write to the database. Partitioning looks like it will be too much of a re-architecture. In reply to Andrej we do have a logged_time entity in the required tables. That being the case how does that help me with the tools provided?
Might I have to write a custom JDBC application to do the data migration?
Rob
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote: > We're using hibernate to write to the database. Partitioning looks like it > will be too much of a re-architecture. In reply to Andrej we do have a > logged_time entity in the required tables. That being the case how does that > help me with the tools provided? > > Might I have to write a custom JDBC application to do the data migration? That would be one option :} If the server is on a Unix/Linux-platform you should be able to achieve the result with a reasonably simple shell-script and cron, I'd say. > Rob Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
Andrej Ricnik-Bay wrote: > On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote: > > >> We're using hibernate to write to the database. Partitioning looks like it >> will be too much of a re-architecture. In reply to Andrej we do have a >> logged_time entity in the required tables. That being the case how does that >> help me with the tools provided? >> >> Might I have to write a custom JDBC application to do the data migration? >> > That would be one option :} > > If the server is on a Unix/Linux-platform you should be able > to achieve the result with a reasonably simple shell-script > and cron, I'd say. > > I am on a Linux platform but I'm going to need some pointers regarding the cron job. Are you suggesting that I parse the dump file? I assume I would need to switch to using inserts and then parse the dump looking for where I need to start from?
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote: > I am on a Linux platform but I'm going to need some pointers regarding > the cron job. Are you suggesting that I parse the dump file? I assume I > would need to switch to using inserts and then parse the dump looking > for where I need to start from? The question is: how complex is the data you need to extract? I guess where I was heading was to run a select with the interval Scott described from psql into a file, and then copy-from that into the analysis database. However, if the structure is more complex, if you needed to join tables, the parsing of a dump-file may be an option, even though (always retaining a weeks worth) might make that into quite some overhead. Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
>I am on a Linux platform but I'm going to need some pointers regarding >the cron job. Are you suggesting that I parse the dump file? I assume I >would need to switch to using inserts and then parse the dump looking >for where I need to start from? Something that you may want to consider is dblink from contrib. We have a similar situation for the archiving of collected data and have been able to implement a fairly easy solution that does not require the parsing of dump files, just a simple(ish) query based on the time inserted. -Ken ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
Here's a link to the docs for rskeymgmt, a command line utility for changing the key used to access the catalog. http://msdn2.microsoft.com/en-us/library/aa179504(SQL.80).aspx You might also need to use the rsactivate, and rsconfig utilities to get everything working. ------------------------------- [url=http://e-datapro.net/]Data entry india[/url] -- View this message in context: http://old.nabble.com/Data-Warehousing-tp12457670p26515230.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.