Обсуждение: Read data from Postgres table pages

Поиск
Список
Период
Сортировка

Read data from Postgres table pages

От
Sushrut Shivaswamy
Дата:
Hey,

I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like to avoid manual commands like pg_dump, I need access to the raw data.

Can you please point me to the postgres source header / cc files that encapsulate this functionality?
 - List all pages for a table
- Read a given page for a table

Any pointers to the relevant source code would be appreciated.

Thanks,
Sushrut

Re: Read data from Postgres table pages

От
Alexander Korotkov
Дата:
Hi

On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
> I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like
toavoid manual commands like pg_dump, I need access to the raw data. 
>
> Can you please point me to the postgres source header / cc files that encapsulate this functionality?
>  - List all pages for a table
> - Read a given page for a table
>
> Any pointers to the relevant source code would be appreciated.

Why do you need to work on the source code level?
Please, check this about having a binary  copy of the database on the
filesystem level.
https://www.postgresql.org/docs/current/backup-file.html

------
Regards,
Alexander Korotkov



Re: Read data from Postgres table pages

От
Sushrut Shivaswamy
Дата:
I'd like to read individual rows from the pages as they are updated and stream them to a server to create a copy of the data.
The data will be rewritten to columnar format for analytics queries.

On Tue, Mar 19, 2024 at 7:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi

On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
> I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like to avoid manual commands like pg_dump, I need access to the raw data.
>
> Can you please point me to the postgres source header / cc files that encapsulate this functionality?
>  - List all pages for a table
> - Read a given page for a table
>
> Any pointers to the relevant source code would be appreciated.

Why do you need to work on the source code level?
Please, check this about having a binary  copy of the database on the
filesystem level.
https://www.postgresql.org/docs/current/backup-file.html

------
Regards,
Alexander Korotkov

Re: Read data from Postgres table pages

От
Sushrut Shivaswamy
Дата:
The binary I"m trying to create should automatically be able to read data from a postgres instance without users having to 
run commands for backup / pg_dump etc.
Having access to the appropriate source headers would allow me to read the data.

On Tue, Mar 19, 2024 at 8:03 PM Sushrut Shivaswamy <sushrut.shivaswamy@gmail.com> wrote:
I'd like to read individual rows from the pages as they are updated and stream them to a server to create a copy of the data.
The data will be rewritten to columnar format for analytics queries.

On Tue, Mar 19, 2024 at 7:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi

On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
> I'm trying to build a postgres export tool that reads data from table pages and exports it to an S3 bucket. I'd like to avoid manual commands like pg_dump, I need access to the raw data.
>
> Can you please point me to the postgres source header / cc files that encapsulate this functionality?
>  - List all pages for a table
> - Read a given page for a table
>
> Any pointers to the relevant source code would be appreciated.

Why do you need to work on the source code level?
Please, check this about having a binary  copy of the database on the
filesystem level.
https://www.postgresql.org/docs/current/backup-file.html

------
Regards,
Alexander Korotkov

Re: Read data from Postgres table pages

От
Alexander Korotkov
Дата:
On Tue, Mar 19, 2024 at 4:35 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
> The binary I"m trying to create should automatically be able to read data from a postgres instance without users
havingto 
> run commands for backup / pg_dump etc.
> Having access to the appropriate source headers would allow me to read the data.

Please, avoid the top-posting.
https://en.wikipedia.org/wiki/Posting_style#Top-posting

If you're looking to have a separate binary, why can't your binary
just *connect* to the postgres database and query the data?  This is
what pg_dump does, you can just do the same directly.  pg_dump doesn't
access the raw data.

Trying to read raw postgres data from the separate binary looks flat
wrong for your purposes.  First, you would have to replicate pretty
much postgres internals inside. Second, you can read the consistent
data only when postgres is stopped or didn't do any modifications
since the last checkpoint.

------
Regards,
Alexander Korotkov



Re: Read data from Postgres table pages

От
Sushrut Shivaswamy
Дата:
If we query the DB directly, is it possible to know which new rows have been added since the last query?
Is there a change pump that can be latched onto?

I’m assuming the page data structs are encapsulated in specific headers which can be used to list / read pages.
Why would Postgres need to be stopped to read the data? The read / query path in Postgres would also be reading these
pageswhen the instance is running? 


Re: Read data from Postgres table pages

От
Alexander Korotkov
Дата:
On Tue, Mar 19, 2024 at 4:48 PM Sushrut Shivaswamy
<sushrut.shivaswamy@gmail.com> wrote:
>
> If we query the DB directly, is it possible to know which new rows have been added since the last query?
> Is there a change pump that can be latched onto?

Please, check this.
https://www.postgresql.org/docs/current/logicaldecoding.html

> I’m assuming the page data structs are encapsulated in specific headers which can be used to list / read pages.
> Why would Postgres need to be stopped to read the data? The read / query path in Postgres would also be reading these
pageswhen the instance is running? 

I think this would be a good point to start studying.
https://www.interdb.jp/
The information there should be more than enough to forget this idea forever :)

------
Regards,
Alexander Korotkov



Re: Read data from Postgres table pages

От
Sushrut Shivaswamy
Дата:

lol, thanks for the inputs Alexander :)!