Обсуждение: tracking scripts...

Поиск
Список
Период
Сортировка

tracking scripts...

От
Joey Quinn
Дата:
I have a fairly large table (4.3 billion rows) that I am running an update script on (a bit over 127 thousand individual update queries). I am using the gui. It has been running for about 24 hours now. Is there any good way to gauge progress (as in, how many of the individual update queries have finished)?


Re: tracking scripts...

От
Rémi Cura
Дата:
Now it's too late,
but maybe you could allow to not use a single transaction ( but instead 127k transactions).4

Then at the end of every transaction you could print something in gui (print for pgscript, raise for plpgsql) or execute a command to write in a file (copy for instance).
It would also be in the log, but not so clear.

Cheers,

Rémi-C


2013/11/26 Joey Quinn <bjquinniii@gmail.com>
I have a fairly large table (4.3 billion rows) that I am running an update script on (a bit over 127 thousand individual update queries). I am using the gui. It has been running for about 24 hours now. Is there any good way to gauge progress (as in, how many of the individual update queries have finished)?



Re: tracking scripts...

От
Vick Khera
Дата:
Connect to the DB and run "select * from pg_stat_activity" to see what  specific query your other connection is running. Then find that in your file to see how far it has progressed.

I hope you profiled your queries to make sure they run fast before you started. :)


On Tue, Nov 26, 2013 at 10:28 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
I have a fairly large table (4.3 billion rows) that I am running an update script on (a bit over 127 thousand individual update queries). I am using the gui. It has been running for about 24 hours now. Is there any good way to gauge progress (as in, how many of the individual update queries have finished)?



Re: tracking scripts...

От
Joey Quinn
Дата:
The queries themselves are written like so:

update ipv4_table set country='xx' where ipv4 between 'xxx.xxx.xxx.xxx' and 'xxx.xxx.xxx.xxx';

There are 127k lines like that (each with a different range and the appropriate country code). Each is terminated with a semi-colon. Does that make them individual transactions in postgres or not? (postgres newbie here). Is there something else I need to do for them to be treated like separate transactions?


On Tue, Nov 26, 2013 at 11:16 AM, Rémi Cura <remi.cura@gmail.com> wrote:
Now it's too late,
but maybe you could allow to not use a single transaction ( but instead 127k transactions).4

Then at the end of every transaction you could print something in gui (print for pgscript, raise for plpgsql) or execute a command to write in a file (copy for instance).
It would also be in the log, but not so clear.

Cheers,

Rémi-C


2013/11/26 Joey Quinn <bjquinniii@gmail.com>
I have a fairly large table (4.3 billion rows) that I am running an update script on (a bit over 127 thousand individual update queries). I am using the gui. It has been running for about 24 hours now. Is there any good way to gauge progress (as in, how many of the individual update queries have finished)?




Re: tracking scripts...

От
Vick Khera
Дата:

On Tue, Nov 26, 2013 at 12:11 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
update ipv4_table set country='xx' where ipv4 between 'xxx.xxx.xxx.xxx' and 'xxx.xxx.xxx.xxx';

There are 127k lines like that (each with a different range and the appropriate country code). Each is terminated with a semi-colon. Does that make them individual transactions in postgres or not? (postgres newbie here). Is there something else I need to do for them to be treated like separate transactions?

If you did not wrap the whole thing with begin/commit then each is its own transaction. I certainly hope you have an appopriate index on that ipv4 column and it is appropriately typed.

Re: tracking scripts...

От
Joey Quinn
Дата:
When I ran that command (select * from pg_stat_activity"), it returned the first six lines of the scripts. I'm fairly sure it has gotten a bit beyond that (been running over 24 hours now, and the size has increased about 300 GB). Am I missing something for it to tell me what the last line processed was?

I didn't do any profiling (postgres newbie here). All of the updates are fairly straightforward and only hit a single table. They are updating a single column based upon a "where between" clause which hits an index. I did run a single one initially, and then a group of about 10k to make sure they were behaving properly before launching the rest of the pile...

This is my first postgres project. It's a table of the complete IPV4 address space. Trying out postgres because the MySQL (actually MariaDB) attempt was not scaling well.


On Tue, Nov 26, 2013 at 11:20 AM, Vick Khera <vivek@khera.org> wrote:
Connect to the DB and run "select * from pg_stat_activity" to see what  specific query your other connection is running. Then find that in your file to see how far it has progressed.

I hope you profiled your queries to make sure they run fast before you started. :)


On Tue, Nov 26, 2013 at 10:28 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
I have a fairly large table (4.3 billion rows) that I am running an update script on (a bit over 127 thousand individual update queries). I am using the gui. It has been running for about 24 hours now. Is there any good way to gauge progress (as in, how many of the individual update queries have finished)?




Re: tracking scripts...

От
Joey Quinn
Дата:
The ipv4 column is of type inet. It is the primary key (btree access) and access times for queries on individual ip addresses have been around 10-15 ms.



On Tue, Nov 26, 2013 at 12:13 PM, Vick Khera <vivek@khera.org> wrote:

On Tue, Nov 26, 2013 at 12:11 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
update ipv4_table set country='xx' where ipv4 between 'xxx.xxx.xxx.xxx' and 'xxx.xxx.xxx.xxx';

There are 127k lines like that (each with a different range and the appropriate country code). Each is terminated with a semi-colon. Does that make them individual transactions in postgres or not? (postgres newbie here). Is there something else I need to do for them to be treated like separate transactions?

If you did not wrap the whole thing with begin/commit then each is its own transaction. I certainly hope you have an appopriate index on that ipv4 column and it is appropriately typed.

Re: tracking scripts...

От
John R Pierce
Дата:
On 11/26/2013 9:24 AM, Joey Quinn wrote:
> When I ran that command (select * from pg_stat_activity"), it returned
> the first six lines of the scripts. I'm fairly sure it has gotten a
> bit beyond that (been running over 24 hours now, and the size has
> increased about 300 GB). Am I missing something for it to tell me what
> the last line processed was?

that means your GUI lobbed the entire file at postgres in a single
PQexec call, so its all being executed as a single statement.

psql -f "filename.sql" dbname   would have processed the queries one at
a time.


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: tracking scripts...

От
Vick Khera
Дата:

On Tue, Nov 26, 2013 at 12:24 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
When I ran that command (select * from pg_stat_activity"), it returned the first six lines of the scripts. I'm fairly sure it has gotten a bit beyond that (been running over 24 hours now, and the size has increased about 300 GB). Am I missing something for it to tell me what the last line processed was?

I agree with what John R Pierce says about your GUI lumping all of it into one statement. What you can do is get indirect evidence by looking to see which rows are set. I'm assuming that your IP ranges are non-overlapping, so just do a binary search until you narrow it down to see how far along you are.

Re: tracking scripts...

От
Joey Quinn
Дата:
Sounds like I will have to get comfortable with the command line version of things... sigh... hate that.

Would that command be from within the psql SQL Shell that came as part of the install? (I'm living in Windows land).

(and thank-you for the command)



On Tue, Nov 26, 2013 at 1:24 PM, John R Pierce <pierce@hogranch.com> wrote:
On 11/26/2013 9:24 AM, Joey Quinn wrote:
When I ran that command (select * from pg_stat_activity"), it returned the first six lines of the scripts. I'm fairly sure it has gotten a bit beyond that (been running over 24 hours now, and the size has increased about 300 GB). Am I missing something for it to tell me what the last line processed was?

that means your GUI lobbed the entire file at postgres in a single PQexec call, so its all being executed as a single statement.

psql -f "filename.sql" dbname   would have processed the queries one at a time.


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: tracking scripts...

От
Joey Quinn
Дата:
The ranges are indeed overlapping, though the update statements were generated alphabetically rather than in IP order... If the command line will let me query the table directly without being blocked by the ongoing updates, then I could get a rough order of magnitude of progress by doing a null count on the county field... hate to throw queries at it while it's busy updating though...



On Tue, Nov 26, 2013 at 1:43 PM, Vick Khera <vivek@khera.org> wrote:

On Tue, Nov 26, 2013 at 12:24 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
When I ran that command (select * from pg_stat_activity"), it returned the first six lines of the scripts. I'm fairly sure it has gotten a bit beyond that (been running over 24 hours now, and the size has increased about 300 GB). Am I missing something for it to tell me what the last line processed was?

I agree with what John R Pierce says about your GUI lumping all of it into one statement. What you can do is get indirect evidence by looking to see which rows are set. I'm assuming that your IP ranges are non-overlapping, so just do a binary search until you narrow it down to see how far along you are.


Re: tracking scripts...

От
Vick Khera
Дата:

On Tue, Nov 26, 2013 at 2:48 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
The ranges are indeed overlapping, though the update statements were generated alphabetically rather than in IP order... If the command line will let me query the table directly without being blocked by the ongoing updates, then I could get a rough order of magnitude of progress by doing a null count on the county field... hate to throw queries at it while it's busy updating though...

Try a SELECT ... LIMIT 1; it will find the first row that matches and exit. So you can see if any particular country code has been set. Again, binary search on the codes.

Re: tracking scripts...

От
John R Pierce
Дата:
On 11/26/2013 11:45 AM, Joey Quinn wrote:
> Would that command be from within the psql SQL Shell that came as part
> of the install? (I'm living in Windows land).

if you're already in psql, logged onto your database, it would be \i
filename.sql

psql -f filename.sql  dbname...    would be at the system shell prompt,
but that assumes the postgresql binary directory is in your path, which
it may not be on default Windows installs.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: tracking scripts...

От
Joey Quinn
Дата:
nope, that appears to be being blocked by the updates...

tried "select * from ipv4_table where country='gb' limit 1;"

it just sat there...


On Tue, Nov 26, 2013 at 3:00 PM, Vick Khera <vivek@khera.org> wrote:

On Tue, Nov 26, 2013 at 2:48 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
The ranges are indeed overlapping, though the update statements were generated alphabetically rather than in IP order... If the command line will let me query the table directly without being blocked by the ongoing updates, then I could get a rough order of magnitude of progress by doing a null count on the county field... hate to throw queries at it while it's busy updating though...

Try a SELECT ... LIMIT 1; it will find the first row that matches and exit. So you can see if any particular country code has been set. Again, binary search on the codes.

Re: tracking scripts...

От
Joey Quinn
Дата:
yeah, unlikely that it is already in the path (I certainly didn't add it yet).

Thanks for the command (new version).



On Tue, Nov 26, 2013 at 3:13 PM, John R Pierce <pierce@hogranch.com> wrote:
On 11/26/2013 11:45 AM, Joey Quinn wrote:
Would that command be from within the psql SQL Shell that came as part of the install? (I'm living in Windows land).

if you're already in psql, logged onto your database, it would be \i filename.sql

psql -f filename.sql  dbname...    would be at the system shell prompt, but that assumes the postgresql binary directory is in your path, which it may not be on default Windows installs.




--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: tracking scripts...

От
Merlin Moncure
Дата:
On Tue, Nov 26, 2013 at 9:28 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
> I have a fairly large table (4.3 billion rows) that I am running an update
> script on (a bit over 127 thousand individual update queries). I am using
> the gui. It has been running for about 24 hours now. Is there any good way
> to gauge progress (as in, how many of the individual update queries have
> finished)?


There are not many ways to Hand off information outside of the
database while a transaction Is running. one way Is to write a Simple
trigger in plpgsql that 'raise'es A notice every 'n' times trigger
condition fires.  that'S Essentially the only Clean way to do it in
such a way that the information is Returned to the Executing console.
Thanks!

merlin


Re: tracking scripts...

От
John R Pierce
Дата:
On 11/26/2013 12:30 PM, Merlin Moncure wrote:
> There are not many ways to Hand off information outside of the
> database while a transaction Is running. one way Is to write a Simple
> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
> condition fires.  that'S Essentially the only Clean way to do it in
> such a way that the information is Returned to the Executing console.
> Thanks!

how would that trigger track N?

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: tracking scripts...

От
Merlin Moncure
Дата:
On Tue, Nov 26, 2013 at 2:38 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 11/26/2013 12:30 PM, Merlin Moncure wrote:
>>
>> There are not many ways to Hand off information outside of the
>> database while a transaction Is running. one way Is to write a Simple
>> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
>> condition fires.  that'S Essentially the only Clean way to do it in
>> such a way that the information is Returned to the Executing console.
>> Thanks!
>
>
> how would that trigger track N?

A couple of different ways.  Easiest would just be to manage a sequence.

merlin


Re: tracking scripts...

От
Raymond O'Donnell
Дата:
On 26/11/2013 20:30, Merlin Moncure wrote:
> There are not many ways to Hand off information outside of the
> database while a transaction Is running. one way Is to write a Simple
> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
> condition fires.  that'S Essentially the only Clean way to do it in
> such a way that the information is Returned to the Executing console.
> Thanks!

Totally unrelated to the thread.... I noticed that the capitalised
letters in the email above spell out this:

 THIISASECRET

.. which (almost) spells "This is a secret". Was this intentional, or am
I just working too hard? :-)

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: tracking scripts...

От
Merlin Moncure
Дата:
On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 26/11/2013 20:30, Merlin Moncure wrote:
>> There are not many ways to Hand off information outside of the
>> database while a transaction Is running. one way Is to write a Simple
>> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
>> condition fires.  that'S Essentially the only Clean way to do it in
>> such a way that the information is Returned to the Executing console.
>> Thanks!
>
> Totally unrelated to the thread.... I noticed that the capitalised
> letters in the email above spell out this:
>
>  THIISASECRET
>
> .. which (almost) spells "This is a secret". Was this intentional, or am
> I just working too hard? :-)

Well, bad spelling on my part.  To get the joke, you have to be A.
observant, B. be using a gmail account, and C. be a comic book geek
that grew up in the 80's.

merlin


Re: tracking scripts...

От
Rémi Cura
Дата:
First serious answer :
you don't have to use command line,
you can use the pgadmin gui, loading your file with all the command, and then hit F6 (or select run as pgscript).
This will wrapp each command in a transaction , and will print messages all along.
Please test this on a few line before trying on everything.

Second :
lol for the secret message.
There is a very famous one like this in french, in private letters by George Sand, a famous writter.

The text is very high level and nice french, but if you read one line then skip the next ..,
the message is very dirty !

It is of course also very hard to translate ...

Cheers,
Rémi-C

Cher ami,
Je suis toute émue de vous dire que j'ai
bien compris l'autre jour que vous aviez
toujours une envie folle de me faire
danser. Je garde le souvenir de votre
baiser et je voudrais bien que ce soit
une preuve que je puisse être aimée
par vous. Je suis prête à montrer mon
affection toute désintéressée et sans cal-
cul, et si vous voulez me voir ainsi
vous dévoiler, sans artifice, mon âme
toute nue, daignez me faire visite,
nous causerons et en amis franchement
je vous prouverai que je suis la femme
sincère, capable de vous offrir l'affection
la plus profonde, comme la plus étroite
amitié, en un mot : la meilleure épouse
dont vous puissiez rêver. Puisque votre>
âme est libre, pensez que l'abandon ou je
vis est bien long, bien dur et souvent bien>
insupportable. Mon chagrin est trop
gros. Accourrez bien vite et venez me le
faire oublier. À vous je veux me sou-
mettre entièrement.
Votre poupée 


2013/11/26 Merlin Moncure <mmoncure@gmail.com>
On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 26/11/2013 20:30, Merlin Moncure wrote:
>> There are not many ways to Hand off information outside of the
>> database while a transaction Is running. one way Is to write a Simple
>> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
>> condition fires.  that'S Essentially the only Clean way to do it in
>> such a way that the information is Returned to the Executing console.
>> Thanks!
>
> Totally unrelated to the thread.... I noticed that the capitalised
> letters in the email above spell out this:
>
>  THIISASECRET
>
> .. which (almost) spells "This is a secret". Was this intentional, or am
> I just working too hard? :-)

Well, bad spelling on my part.  To get the joke, you have to be A.
observant, B. be using a gmail account, and C. be a comic book geek
that grew up in the 80's.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: tracking scripts...

От
Albe Laurenz
Дата:
John R Pierce wrote:
> On 11/26/2013 9:24 AM, Joey Quinn wrote:
>> When I ran that command (select * from pg_stat_activity"), it returned
>> the first six lines of the scripts. I'm fairly sure it has gotten a
>> bit beyond that (been running over 24 hours now, and the size has
>> increased about 300 GB). Am I missing something for it to tell me what
>> the last line processed was?
> 
> that means your GUI lobbed the entire file at postgres in a single
> PQexec call, so its all being executed as a single statement.
> 
> psql -f "filename.sql" dbname   would have processed the queries one at
> a time.

Yes, but that would slow down processing considerably, which would
not help in this case.

I'd opt for
psql -1 -f "filename.sql" dbname
so it all runs in a single transaction.

Yours,
Laurenz Albe

Re: tracking scripts...

От
Rémi Cura
Дата:
I'm not an expert,
 I would think if you can spare using only one transaction , it would be way way faster to do it !

the system simply could skip keeping log to be ready to roll back for a 1 billion row update !

Of course it would be preferable to use psql to execute statement by statement as separate transactions , and do it with X several parallel psql (splitting the big text file into X parts), yet Joey seemed reluctant to use console =)


Cheers,
Rémi-C


2013/11/27 Albe Laurenz <laurenz.albe@wien.gv.at>
John R Pierce wrote:
> On 11/26/2013 9:24 AM, Joey Quinn wrote:
>> When I ran that command (select * from pg_stat_activity"), it returned
>> the first six lines of the scripts. I'm fairly sure it has gotten a
>> bit beyond that (been running over 24 hours now, and the size has
>> increased about 300 GB). Am I missing something for it to tell me what
>> the last line processed was?
>
> that means your GUI lobbed the entire file at postgres in a single
> PQexec call, so its all being executed as a single statement.
>
> psql -f "filename.sql" dbname   would have processed the queries one at
> a time.

Yes, but that would slow down processing considerably, which would
not help in this case.

I'd opt for
psql -1 -f "filename.sql" dbname
so it all runs in a single transaction.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: tracking scripts...

От
John R Pierce
Дата:
On 11/27/2013 1:39 AM, Rémi Cura wrote:
>
> the system simply could skip keeping log to be ready to roll back for
> a 1 billion row update

thats not how postgres does rollbacks



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: tracking scripts...

От
Raymond O'Donnell
Дата:
On 27/11/2013 08:20, Rémi Cura wrote:
> First serious answer :
> you don't have to use command line,
> you can use the pgadmin gui, loading your file with all the command, and
> then hit F6 (or select run as pgscript).
> This will wrapp each command in a transaction , and will print messages
> all along.
> Please test this on a few line before trying on everything.
>
> Second :
> lol for the secret message.
> There is a very famous one like this in french, in private letters by
> George Sand, a famous writter.
>
> The text is very high level and nice french, but if you read one line
> then skip the next ..,
> the message is very dirty !
>
> It is of course also very hard to translate ...

A guy I know finished finished his doctorate some years ago, more in
spite of his supervisor than with his assistance - the supervisor was
very little help, and at times actively discouraging. Anyway, when my
friend's thesis was printed, he included a short preface which thanked
the supervisor in glowing terms.... but the first letter of each line
spelt out the words "F*** you Mike". :-)

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: tracking scripts...

От
Joey Quinn
Дата:
Wow, thank-you (sometimes the answer is right there in front of you... very new to Postgres, had wondered what the difference was between the run query and run as PGS script, but hadn't looked into it yet).

So, here's the critical question(s) right now (for me)...

With the way I launched it, using the "Execute query" button, if I now hit the "Cancel query" button, what happens? Have the last two days of updates already been committed? Or will they get rolled back? I would love to switch to the other method, so that I can gauge progress, but would hate to lose two days worth of run time...

 If I do run the same script (a bit over 100k lines) in PGS mode, will that affect the speed? If so, how much? 1%? 5%? More?




On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura <remi.cura@gmail.com> wrote:
First serious answer :
you don't have to use command line,
you can use the pgadmin gui, loading your file with all the command, and then hit F6 (or select run as pgscript).
This will wrapp each command in a transaction , and will print messages all along.
Please test this on a few line before trying on everything.

Second :
lol for the secret message.
There is a very famous one like this in french, in private letters by George Sand, a famous writter.

The text is very high level and nice french, but if you read one line then skip the next ..,
the message is very dirty !

It is of course also very hard to translate ...

Cheers,
Rémi-C

Cher ami,
Je suis toute émue de vous dire que j'ai
bien compris l'autre jour que vous aviez
toujours une envie folle de me faire
danser. Je garde le souvenir de votre
baiser et je voudrais bien que ce soit
une preuve que je puisse être aimée
par vous. Je suis prête à montrer mon
affection toute désintéressée et sans cal-
cul, et si vous voulez me voir ainsi
vous dévoiler, sans artifice, mon âme
toute nue, daignez me faire visite,
nous causerons et en amis franchement
je vous prouverai que je suis la femme
sincère, capable de vous offrir l'affection
la plus profonde, comme la plus étroite
amitié, en un mot : la meilleure épouse
dont vous puissiez rêver. Puisque votre>
âme est libre, pensez que l'abandon ou je
vis est bien long, bien dur et souvent bien>
insupportable. Mon chagrin est trop
gros. Accourrez bien vite et venez me le
faire oublier. À vous je veux me sou-
mettre entièrement.
Votre poupée 


2013/11/26 Merlin Moncure <mmoncure@gmail.com>
On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 26/11/2013 20:30, Merlin Moncure wrote:
>> There are not many ways to Hand off information outside of the
>> database while a transaction Is running. one way Is to write a Simple
>> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
>> condition fires.  that'S Essentially the only Clean way to do it in
>> such a way that the information is Returned to the Executing console.
>> Thanks!
>
> Totally unrelated to the thread.... I noticed that the capitalised
> letters in the email above spell out this:
>
>  THIISASECRET
>
> .. which (almost) spells "This is a secret". Was this intentional, or am
> I just working too hard? :-)

Well, bad spelling on my part.  To get the joke, you have to be A.
observant, B. be using a gmail account, and C. be a comic book geek
that grew up in the 80's.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: tracking scripts...

От
Rémi Cura
Дата:
Sorry,
if you cancel everything will be rolled back
(it is actually what makes DB so powerfull).

Unless it finishes I don't know of a way to keep changes.

At least on my computer (I don't know if you can generalize this), it is way faster to split into many transaction, so you would gain time.
Using pgscript will make you loose some time, but it won't be much if each querry is long enough (some seconds at least).
If you intend to do it often, you may want to consider mutliple parallel psql.

Cheers,

Rémi-C



2013/11/27 Joey Quinn <bjquinniii@gmail.com>
Wow, thank-you (sometimes the answer is right there in front of you... very new to Postgres, had wondered what the difference was between the run query and run as PGS script, but hadn't looked into it yet).

So, here's the critical question(s) right now (for me)...

With the way I launched it, using the "Execute query" button, if I now hit the "Cancel query" button, what happens? Have the last two days of updates already been committed? Or will they get rolled back? I would love to switch to the other method, so that I can gauge progress, but would hate to lose two days worth of run time...

 If I do run the same script (a bit over 100k lines) in PGS mode, will that affect the speed? If so, how much? 1%? 5%? More?




On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura <remi.cura@gmail.com> wrote:
First serious answer :
you don't have to use command line,
you can use the pgadmin gui, loading your file with all the command, and then hit F6 (or select run as pgscript).
This will wrapp each command in a transaction , and will print messages all along.
Please test this on a few line before trying on everything.

Second :
lol for the secret message.
There is a very famous one like this in french, in private letters by George Sand, a famous writter.

The text is very high level and nice french, but if you read one line then skip the next ..,
the message is very dirty !

It is of course also very hard to translate ...

Cheers,
Rémi-C

Cher ami,
Je suis toute émue de vous dire que j'ai
bien compris l'autre jour que vous aviez
toujours une envie folle de me faire
danser. Je garde le souvenir de votre
baiser et je voudrais bien que ce soit
une preuve que je puisse être aimée
par vous. Je suis prête à montrer mon
affection toute désintéressée et sans cal-
cul, et si vous voulez me voir ainsi
vous dévoiler, sans artifice, mon âme
toute nue, daignez me faire visite,
nous causerons et en amis franchement
je vous prouverai que je suis la femme
sincère, capable de vous offrir l'affection
la plus profonde, comme la plus étroite
amitié, en un mot : la meilleure épouse
dont vous puissiez rêver. Puisque votre>
âme est libre, pensez que l'abandon ou je
vis est bien long, bien dur et souvent bien>
insupportable. Mon chagrin est trop
gros. Accourrez bien vite et venez me le
faire oublier. À vous je veux me sou-
mettre entièrement.
Votre poupée 


2013/11/26 Merlin Moncure <mmoncure@gmail.com>
On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 26/11/2013 20:30, Merlin Moncure wrote:
>> There are not many ways to Hand off information outside of the
>> database while a transaction Is running. one way Is to write a Simple
>> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
>> condition fires.  that'S Essentially the only Clean way to do it in
>> such a way that the information is Returned to the Executing console.
>> Thanks!
>
> Totally unrelated to the thread.... I noticed that the capitalised
> letters in the email above spell out this:
>
>  THIISASECRET
>
> .. which (almost) spells "This is a secret". Was this intentional, or am
> I just working too hard? :-)

Well, bad spelling on my part.  To get the joke, you have to be A.
observant, B. be using a gmail account, and C. be a comic book geek
that grew up in the 80's.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: tracking scripts...

От
Joey Quinn
Дата:
A little reluctant, yes, but not 100%. I'm new to Postgres, but if I end up using it enough, then I will also end up learning some command line stuff. If it continues to look like a good/robust solution for this particular project (think ERIPP plus Shodan and whatever else I can come up with) then I'll be here awhile...

Joey



On Wed, Nov 27, 2013 at 4:39 AM, Rémi Cura <remi.cura@gmail.com> wrote:
I'm not an expert,
 I would think if you can spare using only one transaction , it would be way way faster to do it !

the system simply could skip keeping log to be ready to roll back for a 1 billion row update !

Of course it would be preferable to use psql to execute statement by statement as separate transactions , and do it with X several parallel psql (splitting the big text file into X parts), yet Joey seemed reluctant to use console =)


Cheers,
Rémi-C


2013/11/27 Albe Laurenz <laurenz.albe@wien.gv.at>
John R Pierce wrote:
> On 11/26/2013 9:24 AM, Joey Quinn wrote:
>> When I ran that command (select * from pg_stat_activity"), it returned
>> the first six lines of the scripts. I'm fairly sure it has gotten a
>> bit beyond that (been running over 24 hours now, and the size has
>> increased about 300 GB). Am I missing something for it to tell me what
>> the last line processed was?
>
> that means your GUI lobbed the entire file at postgres in a single
> PQexec call, so its all being executed as a single statement.
>
> psql -f "filename.sql" dbname   would have processed the queries one at
> a time.

Yes, but that would slow down processing considerably, which would
not help in this case.

I'd opt for
psql -1 -f "filename.sql" dbname
so it all runs in a single transaction.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: tracking scripts...

От
Joey Quinn
Дата:
So, no cancel... keeping my fingers crossed (rolling up on 170,000,000 ms - a bit over 47 hours - data folder size now at 1.11 TB).

Fortunately, I'm pretty sure this will be my largest batch update (since the info is static, and available all at once, I was able to generate the complete script. In most cases, I will be retrieving data and that takes a bit longer).

When I end up having to rebuild the entire thing on a separate system, can I divide the script up and launch it piecemeal in differ PGAdmin query windows for the parallel action? Or is there a particular command I have to give it?



On Wed, Nov 27, 2013 at 9:23 AM, Rémi Cura <remi.cura@gmail.com> wrote:
Sorry,
if you cancel everything will be rolled back
(it is actually what makes DB so powerfull).

Unless it finishes I don't know of a way to keep changes.

At least on my computer (I don't know if you can generalize this), it is way faster to split into many transaction, so you would gain time.
Using pgscript will make you loose some time, but it won't be much if each querry is long enough (some seconds at least).
If you intend to do it often, you may want to consider mutliple parallel psql.

Cheers,

Rémi-C



2013/11/27 Joey Quinn <bjquinniii@gmail.com>
Wow, thank-you (sometimes the answer is right there in front of you... very new to Postgres, had wondered what the difference was between the run query and run as PGS script, but hadn't looked into it yet).

So, here's the critical question(s) right now (for me)...

With the way I launched it, using the "Execute query" button, if I now hit the "Cancel query" button, what happens? Have the last two days of updates already been committed? Or will they get rolled back? I would love to switch to the other method, so that I can gauge progress, but would hate to lose two days worth of run time...

 If I do run the same script (a bit over 100k lines) in PGS mode, will that affect the speed? If so, how much? 1%? 5%? More?




On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura <remi.cura@gmail.com> wrote:
First serious answer :
you don't have to use command line,
you can use the pgadmin gui, loading your file with all the command, and then hit F6 (or select run as pgscript).
This will wrapp each command in a transaction , and will print messages all along.
Please test this on a few line before trying on everything.

Second :
lol for the secret message.
There is a very famous one like this in french, in private letters by George Sand, a famous writter.

The text is very high level and nice french, but if you read one line then skip the next ..,
the message is very dirty !

It is of course also very hard to translate ...

Cheers,
Rémi-C

Cher ami,
Je suis toute émue de vous dire que j'ai
bien compris l'autre jour que vous aviez
toujours une envie folle de me faire
danser. Je garde le souvenir de votre
baiser et je voudrais bien que ce soit
une preuve que je puisse être aimée
par vous. Je suis prête à montrer mon
affection toute désintéressée et sans cal-
cul, et si vous voulez me voir ainsi
vous dévoiler, sans artifice, mon âme
toute nue, daignez me faire visite,
nous causerons et en amis franchement
je vous prouverai que je suis la femme
sincère, capable de vous offrir l'affection
la plus profonde, comme la plus étroite
amitié, en un mot : la meilleure épouse
dont vous puissiez rêver. Puisque votre>
âme est libre, pensez que l'abandon ou je
vis est bien long, bien dur et souvent bien>
insupportable. Mon chagrin est trop
gros. Accourrez bien vite et venez me le
faire oublier. À vous je veux me sou-
mettre entièrement.
Votre poupée 


2013/11/26 Merlin Moncure <mmoncure@gmail.com>
On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 26/11/2013 20:30, Merlin Moncure wrote:
>> There are not many ways to Hand off information outside of the
>> database while a transaction Is running. one way Is to write a Simple
>> trigger in plpgsql that 'raise'es A notice every 'n' times trigger
>> condition fires.  that'S Essentially the only Clean way to do it in
>> such a way that the information is Returned to the Executing console.
>> Thanks!
>
> Totally unrelated to the thread.... I noticed that the capitalised
> letters in the email above spell out this:
>
>  THIISASECRET
>
> .. which (almost) spells "This is a secret". Was this intentional, or am
> I just working too hard? :-)

Well, bad spelling on my part.  To get the joke, you have to be A.
observant, B. be using a gmail account, and C. be a comic book geek
that grew up in the 80's.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




Re: tracking scripts...

От
Merlin Moncure
Дата:
On Wed, Nov 27, 2013 at 8:35 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
> So, no cancel... keeping my fingers crossed (rolling up on 170,000,000 ms -
> a bit over 47 hours - data folder size now at 1.11 TB).
>
> Fortunately, I'm pretty sure this will be my largest batch update (since the
> info is static, and available all at once, I was able to generate the
> complete script. In most cases, I will be retrieving data and that takes a
> bit longer).
>
> When I end up having to rebuild the entire thing on a separate system, can I
> divide the script up and launch it piecemeal in differ PGAdmin query windows
> for the parallel action? Or is there a particular command I have to give it?

For very large updates on mostly static data it may be better to
SELECT the data into a new table then swap it in when done.   MY rule
of thumb is that updates are 10x more expensive than inserts,
particularly in terms of large operations.

merlin


Re: tracking scripts...

От
Joey Quinn
Дата:
In this case, I'm updating one column. Wouldn't the "swap" part of that still have to be an update?


On Wed, Nov 27, 2013 at 9:50 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Nov 27, 2013 at 8:35 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
> So, no cancel... keeping my fingers crossed (rolling up on 170,000,000 ms -
> a bit over 47 hours - data folder size now at 1.11 TB).
>
> Fortunately, I'm pretty sure this will be my largest batch update (since the
> info is static, and available all at once, I was able to generate the
> complete script. In most cases, I will be retrieving data and that takes a
> bit longer).
>
> When I end up having to rebuild the entire thing on a separate system, can I
> divide the script up and launch it piecemeal in differ PGAdmin query windows
> for the parallel action? Or is there a particular command I have to give it?

For very large updates on mostly static data it may be better to
SELECT the data into a new table then swap it in when done.   MY rule
of thumb is that updates are 10x more expensive than inserts,
particularly in terms of large operations.

merlin

Re: tracking scripts...

От
Merlin Moncure
Дата:
On Wed, Nov 27, 2013 at 9:00 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
> On Wed, Nov 27, 2013 at 9:50 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> For very large updates on mostly static data it may be better to
>> SELECT the data into a new table then swap it in when done.   MY rule
>> of thumb is that updates are 10x more expensive than inserts,
>> particularly in terms of large operations.
>>
> In this case, I'm updating one column. Wouldn't the "swap" part of that
> still have to be an update?


nope.  the basic mechanism is to:

BEGIN;
CREATE TABLE scratch (LIKE foo INCLUDING ALL);
INSERT INTO scratch SELECT ... FROM foo ...;
ALTER TABLE foo RENAME TO backup;
ALTER TABLE scratch RENAME TO foo;
COMMIT;

The main pain point is that you will have to recreate and table
dependent structures: views, triggers, etc.  this is generally trivial
if you properly keep your schema definitions in scripts and a big
headache otherwise.

You will probably try to avoid updates to 'foo' while the swap is
happening to keep things simple.

merlin