Обсуждение: "Tuple not found error" during Index creation

Поиск
Список
Период
Сортировка

"Tuple not found error" during Index creation

От
RUSHI KAW
Дата:
Hi All,

I have been running Postgresxc 1.1 on a 40 core, 1 TB RAM, Intel(R) Xeon(R) machine.
The data tables size with which the database is populated is approx 14GB.
The issue is that during the index creation time on the big table of approx 13GB size, the following error occurs:

ERROR: could not find tuple for relation 16413.
Statement: Create Index mdx on Network(head)

I read on the forum that this could be because the Autovacuum process may have locked the tuple that the indexing process was trying to access. So, I turned off the Autovacuum and the error didn't recur after that. 

So, I wanted to ask if this is the only workaround for this error, turning off the Autovacuum during index creation and then restarting the process once Indexes have been created or is there something else one can do?

Thank you.
Regards,
Rushi Kaw 

Re: "Tuple not found error" during Index creation

От
John R Pierce
Дата:
On 12/9/2013 10:07 AM, RUSHI KAW wrote:
I have been running Postgresxc 1.1

you'd probably be best off finding the postgresql-xc list, as that is really a rather different system, even if it is forked from community postgresql.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: "Tuple not found error" during Index creation

От
Sandeep Gupta
Дата:
Agreed. However, the "tuple not found" error problem seems to happen with postgres as well, if not in the particular case index creation over large datasets. It would helpful to know in that scenario what are the fixes and how to avoid it in the first place.

The solution/fixes for postgres will carry over to pgxc because the low level stuff (blocks, files, indexes) etc. postgres-xc is really the same as in postgres.

-Sandeep



On Mon, Dec 9, 2013 at 1:23 PM, John R Pierce <pierce@hogranch.com> wrote:
On 12/9/2013 10:07 AM, RUSHI KAW wrote:
I have been running Postgresxc 1.1

you'd probably be best off finding the postgresql-xc list, as that is really a rather different system, even if it is forked from community postgresql.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: "Tuple not found error" during Index creation

От
Sandeep Gupta
Дата:
Hello All,
 
 We are trying to trace cause and potential solution of "tuple not found" error with postgres-xc. The problem happens when indexing a large file. It seems the autovaccum locks certain cache pages that the indexer tries to read. The indexing fails with  "tuple not found" error.

I am sure if it qualifies as postgres or postgres-xc error. However, I was just wondering what is the recommended way to go about fixing this. Turning off the autovaccumer  is really not the best of solution because then the system runs into memory usage error.

Would greatly appreciate any pointers on this.

-Sandeep



On Mon, Dec 9, 2013 at 1:43 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
Agreed. However, the "tuple not found" error problem seems to happen with postgres as well, if not in the particular case index creation over large datasets. It would helpful to know in that scenario what are the fixes and how to avoid it in the first place.

The solution/fixes for postgres will carry over to pgxc because the low level stuff (blocks, files, indexes) etc. postgres-xc is really the same as in postgres.

-Sandeep



On Mon, Dec 9, 2013 at 1:23 PM, John R Pierce <pierce@hogranch.com> wrote:
On 12/9/2013 10:07 AM, RUSHI KAW wrote:
I have been running Postgresxc 1.1

you'd probably be best off finding the postgresql-xc list, as that is really a rather different system, even if it is forked from community postgresql.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast


Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Michael Paquier
Дата:
On Tue, Dec 10, 2013 at 7:17 AM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
>  We are trying to trace cause and potential solution of "tuple not found"
> error with postgres-xc. The problem happens when indexing a large file. It
> seems the autovaccum locks certain cache pages that the indexer tries to
> read. The indexing fails with  "tuple not found" error.
>
> I am sure if it qualifies as postgres or postgres-xc error. However, I was
> just wondering what is the recommended way to go about fixing this. Turning
> off the autovaccumer  is really not the best of solution because then the
> system runs into memory usage error.
>
> Would greatly appreciate any pointers on this.
This smells like a concurrency issue with autovacuum on XC side. I
recall fixing in the past issues with autovacuum not taking a correct
snapshot from GTM in certain code paths, putting in danger data
consistency in the cluster as autovacuum might clean more tuples than
it should. Another possibility to explain this bug would be the way
RecentGlobalXmin is computed for autovacuum using the GTM snapshots,
which would explain why autovacuum has cleaned away some tuples it
should not have, making the possibility of a failure higher for
long-running transactions.

Those are assumptions though. It would be great if you could provide a
self-contained test case, with let's imagine a table that has its data
generated with for example generate_series. Just by seeing the spec of
the machine you are using, I am sure that i wouldn't be able to
reproduce that on my laptop though. The core team has access to more
powerful machines.

Also: Postgres-XC 1.1.0 is based on Postgres 9.2.4.
--
Michael


Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Mason Sharp
Дата:


On Mon, Dec 9, 2013 at 8:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Dec 10, 2013 at 7:17 AM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
>  We are trying to trace cause and potential solution of "tuple not found"
> error with postgres-xc. The problem happens when indexing a large file. It
> seems the autovaccum locks certain cache pages that the indexer tries to
> read. The indexing fails with  "tuple not found" error.
>
> I am sure if it qualifies as postgres or postgres-xc error. However, I was
> just wondering what is the recommended way to go about fixing this. Turning
> off the autovaccumer  is really not the best of solution because then the
> system runs into memory usage error.
>
> Would greatly appreciate any pointers on this.
This smells like a concurrency issue with autovacuum on XC side. I
recall fixing in the past issues with autovacuum not taking a correct
snapshot from GTM in certain code paths, putting in danger data
consistency in the cluster as autovacuum might clean more tuples than
it should. Another possibility to explain this bug would be the way
RecentGlobalXmin is computed for autovacuum using the GTM snapshots,
which would explain why autovacuum has cleaned away some tuples it
should not have, making the possibility of a failure higher for
long-running transactions.

In our StormDB fork (now TransLattice Storm) I made some changes to address some issues that were uncovered with XC. I am not sure if it will address this specific issue above, but in most cases we make it an error instead of falling back to a local XID like XC does (imagine if a node cannot reach GTM and autovacuum starts cleaning up data with local XIDs and snapshots) . Also, we use GTM for getting XIDs for authentication and for autovacuum launcher because in concurrency testing not doing so results in the same XID being consumed by other sessions and causing hanging and other transaction problems. The bottom line is falling back to local XIDs and snapshots should almost always be avoided (initdb is ok).

Our code is a bit different from vanilla XC, but I can try to put together a similar patch soon.

As a community I feel we should prioritize more on testing and bug fixing like the reported issue and replicated table handling than on new features like the merged coordinator and datanode project.

 

Those are assumptions though. It would be great if you could provide a
self-contained test case, with let's imagine a table that has its data
generated with for example generate_series. Just by seeing the spec of
the machine you are using, I am sure that i wouldn't be able to
reproduce that on my laptop though. The core team has access to more
powerful machines.

Also: Postgres-XC 1.1.0 is based on Postgres 9.2.4.
--
Michael


 
Mason Sharp

TransLattice - http://www.translattice.com
Distributed and Clustered Database Solutions


Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Michael Paquier
Дата:
On Tue, Dec 10, 2013 at 11:00 PM, Mason Sharp <msharp@translattice.com> wrote:
> In our StormDB fork (now TransLattice Storm) I made some changes to address
> some issues that were uncovered with XC. I am not sure if it will address
> this specific issue above, but in most cases we make it an error instead of
> falling back to a local XID like XC does (imagine if a node cannot reach GTM
> and autovacuum starts cleaning up data with local XIDs and snapshots) .
Yep, falling back to a local xid when GTM is not reachable has been
done since the beginning of the project. Considering that as a bug
using the argument that it endangers data visibility, such a patch
should be back-patched as well. Some insight on those remarks from the
core team would be welcome though.

> Also, we use GTM for getting XIDs for authentication and for autovacuum
> launcher because in concurrency testing not doing so results in the same XID
> being consumed by other sessions and causing hanging and other transaction
> problems. The bottom line is falling back to local XIDs and snapshots should
> almost always be avoided (initdb is ok).
Check.

> Our code is a bit different from vanilla XC, but I can try to put together a
> similar patch soon.
This would be welcome.

> As a community I feel we should prioritize more on testing and bug fixing
> like the reported issue and replicated table handling than on new features
> like the merged coordinator and datanode project.
Definitely, *normal* developers cannot afford spending so much time on
projects as big as that. One of the big things that I see missing is a
public instance of an XC buildfarm, by using for example the buildfarm
code of Postgres that simply fetches the code from git, and kicks
in-core tests. For XC this should be restricted though to regressions,
and compilation. pg_upgrade or isolation tests are not really
working...

Regards,
--
Michael


Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Sandeep Gupta
Дата:
Hi Michael,

 I can provide the table schema and the data over which indexing almost always fails with tuple not found error. 
Would this be of help.  The other issue is that file is 3.2GB  so we would have work some logistics to transfer this across. 

Let me know.

-Sandeep



On Mon, Dec 9, 2013 at 5:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Dec 10, 2013 at 7:17 AM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
>  We are trying to trace cause and potential solution of "tuple not found"
> error with postgres-xc. The problem happens when indexing a large file. It
> seems the autovaccum locks certain cache pages that the indexer tries to
> read. The indexing fails with  "tuple not found" error.
>
> I am sure if it qualifies as postgres or postgres-xc error. However, I was
> just wondering what is the recommended way to go about fixing this. Turning
> off the autovaccumer  is really not the best of solution because then the
> system runs into memory usage error.
>
> Would greatly appreciate any pointers on this.
This smells like a concurrency issue with autovacuum on XC side. I
recall fixing in the past issues with autovacuum not taking a correct
snapshot from GTM in certain code paths, putting in danger data
consistency in the cluster as autovacuum might clean more tuples than
it should. Another possibility to explain this bug would be the way
RecentGlobalXmin is computed for autovacuum using the GTM snapshots,
which would explain why autovacuum has cleaned away some tuples it
should not have, making the possibility of a failure higher for
long-running transactions.

Those are assumptions though. It would be great if you could provide a
self-contained test case, with let's imagine a table that has its data
generated with for example generate_series. Just by seeing the spec of
the machine you are using, I am sure that i wouldn't be able to
reproduce that on my laptop though. The core team has access to more
powerful machines.

Also: Postgres-XC 1.1.0 is based on Postgres 9.2.4.
--
Michael

Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Michael Paquier
Дата:
On Wed, Dec 11, 2013 at 1:17 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
> Hi Michael,
>
>  I can provide the table schema and the data over which indexing almost
> always fails with tuple not found error.
> Would this be of help.  The other issue is that file is 3.2GB  so we would
> have work some logistics to transfer this across.
Transferring a data file of a couple of gigs is out of question. My
point was to know if you are able to create a self-contained test case
using automatically generated data of this type:
create table foo as select generate_series(1,100000000) as a, 'bbbb'::text as b;
create index fooi on aa(a);
Or smth like that.
This way you wouldn't need to 1) publish your schema, 2) transfer huge
files of data. And this would make tracking of this error somewhat
easier.
Regards,
--
Michael


Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Sandeep Gupta
Дата:
Hi Mason,

 Thank you so much for taking the time. We are using pgxc 1.1. This was the stable release.  Let me give it a try with commits from previous versions. May take some time. I will get back to you with an update. 

-Sandeep



On Wed, Dec 11, 2013 at 8:55 AM, Mason Sharp <msharp@translattice.com> wrote:



On Tue, Dec 10, 2013 at 11:23 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Dec 11, 2013 at 1:17 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
> Hi Michael,
>
>  I can provide the table schema and the data over which indexing almost
> always fails with tuple not found error.
> Would this be of help.  The other issue is that file is 3.2GB  so we would
> have work some logistics to transfer this across.
Transferring a data file of a couple of gigs is out of question. My
point was to know if you are able to create a self-contained test case
using automatically generated data of this type:
create table foo as select generate_series(1,100000000) as a, 'bbbb'::text as b;
create index fooi on aa(a);
Or smth like that.
This way you wouldn't need to 1) publish your schema, 2) transfer huge
files of data. And this would make tracking of this error somewhat
easier.

Perhaps I could help Sandeep. Sandeep, before working out transferring such a large file though, please instead try to pinpoint the particular commit that introduced the issue.

I could recently reproduce a problem with VACUUM FULL in XC 1.1 (stop cluster, restart cluster, execute VACUUM FULL).  I could not reproduce the problem however with a build from the commit 11339220012a9e73cb82039b0ad41afd71bafca2 on Aug 22, 2012.  I suspect that a commit after that one to procarray.c may have something to do with the problem. It may be a similar issue for you. If you have the time, perhaps you can try different commits based on the git log of procarray.c to see if any particular one introduced the issue. 

I would use the commit not just on procarray, but on the entire source tree. If you find a suspect one based on procarray.c, try the one immediately preceding it for the whole source tree to confirm that it is procarray-related.  

Again, this all may be unrelated, but I think it is a good place to start, so I apologize in advance if this chews up some of your time unnecessarily.  
 
Regards,
--
Michael

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Postgres-xc-general mailing list
Postgres-xc-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general



--
Mason Sharp

TransLattice - http://www.translattice.com
Distributed and Clustered Database Solutions



Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
鈴木 幸市
Дата:
We made changes in internal snapshot handling to solve another problem around last December to January.   It will be very helpful if you try commits before and after this period.  This period’s change could be most suspect.

Best;
---
Koichi Suzuki

2013/12/12 9:19、Sandeep Gupta <gupta.sandeep@gmail.com> のメール:

Hi Mason,

 Thank you so much for taking the time. We are using pgxc 1.1. This was the stable release.  Let me give it a try with commits from previous versions. May take some time. I will get back to you with an update. 

-Sandeep



On Wed, Dec 11, 2013 at 8:55 AM, Mason Sharp <msharp@translattice.com> wrote:



On Tue, Dec 10, 2013 at 11:23 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Dec 11, 2013 at 1:17 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
> Hi Michael,
>
>  I can provide the table schema and the data over which indexing almost
> always fails with tuple not found error.
> Would this be of help.  The other issue is that file is 3.2GB  so we would
> have work some logistics to transfer this across.
Transferring a data file of a couple of gigs is out of question. My
point was to know if you are able to create a self-contained test case
using automatically generated data of this type:
create table foo as select generate_series(1,100000000) as a, 'bbbb'::text as b;
create index fooi on aa(a);
Or smth like that.
This way you wouldn't need to 1) publish your schema, 2) transfer huge
files of data. And this would make tracking of this error somewhat
easier.

Perhaps I could help Sandeep. Sandeep, before working out transferring such a large file though, please instead try to pinpoint the particular commit that introduced the issue.

I could recently reproduce a problem with VACUUM FULL in XC 1.1 (stop cluster, restart cluster, execute VACUUM FULL).  I could not reproduce the problem however with a build from the commit 11339220012a9e73cb82039b0ad41afd71bafca2 on Aug 22, 2012.  I suspect that a commit after that one to procarray.c may have something to do with the problem. It may be a similar issue for you. If you have the time, perhaps you can try different commits based on the git log of procarray.c to see if any particular one introduced the issue. 

I would use the commit not just on procarray, but on the entire source tree. If you find a suspect one based on procarray.c, try the one immediately preceding it for the whole source tree to confirm that it is procarray-related.  

Again, this all may be unrelated, but I think it is a good place to start, so I apologize in advance if this chews up some of your time unnecessarily.  
 
Regards,
--
Michael

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Postgres-xc-general mailing list
Postgres-xc-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general



--
Mason Sharp

TransLattice - http://www.translattice.com
Distributed and Clustered Database Solutions



------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk_______________________________________________
Postgres-xc-general mailing list
Postgres-xc-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Koichi Suzuki
Дата:
In 1.0, we added new APIs to GTM so that vacuum can run with global
XID and snapshot.   We may need more improvement to use this.

It is wonderful if Mason provides a patch to fix this.

Regards;
---
Koichi Suzuki


2013/12/11 Michael Paquier <michael.paquier@gmail.com>:
> On Tue, Dec 10, 2013 at 11:00 PM, Mason Sharp <msharp@translattice.com> wrote:
>> In our StormDB fork (now TransLattice Storm) I made some changes to address
>> some issues that were uncovered with XC. I am not sure if it will address
>> this specific issue above, but in most cases we make it an error instead of
>> falling back to a local XID like XC does (imagine if a node cannot reach GTM
>> and autovacuum starts cleaning up data with local XIDs and snapshots) .
> Yep, falling back to a local xid when GTM is not reachable has been
> done since the beginning of the project. Considering that as a bug
> using the argument that it endangers data visibility, such a patch
> should be back-patched as well. Some insight on those remarks from the
> core team would be welcome though.
>
>> Also, we use GTM for getting XIDs for authentication and for autovacuum
>> launcher because in concurrency testing not doing so results in the same XID
>> being consumed by other sessions and causing hanging and other transaction
>> problems. The bottom line is falling back to local XIDs and snapshots should
>> almost always be avoided (initdb is ok).
> Check.
>
>> Our code is a bit different from vanilla XC, but I can try to put together a
>> similar patch soon.
> This would be welcome.
>
>> As a community I feel we should prioritize more on testing and bug fixing
>> like the reported issue and replicated table handling than on new features
>> like the merged coordinator and datanode project.
> Definitely, *normal* developers cannot afford spending so much time on
> projects as big as that. One of the big things that I see missing is a
> public instance of an XC buildfarm, by using for example the buildfarm
> code of Postgres that simply fetches the code from git, and kicks
> in-core tests. For XC this should be restricted though to regressions,
> and compilation. pg_upgrade or isolation tests are not really
> working...
>
> Regards,
> --
> Michael
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Postgres-xc-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general


Re: [Postgres-xc-general] "Tuple not found error" during Index creation

От
Mason Sharp
Дата:



On Tue, Dec 10, 2013 at 11:23 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Dec 11, 2013 at 1:17 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
> Hi Michael,
>
>  I can provide the table schema and the data over which indexing almost
> always fails with tuple not found error.
> Would this be of help.  The other issue is that file is 3.2GB  so we would
> have work some logistics to transfer this across.
Transferring a data file of a couple of gigs is out of question. My
point was to know if you are able to create a self-contained test case
using automatically generated data of this type:
create table foo as select generate_series(1,100000000) as a, 'bbbb'::text as b;
create index fooi on aa(a);
Or smth like that.
This way you wouldn't need to 1) publish your schema, 2) transfer huge
files of data. And this would make tracking of this error somewhat
easier.

Perhaps I could help Sandeep. Sandeep, before working out transferring such a large file though, please instead try to pinpoint the particular commit that introduced the issue.

I could recently reproduce a problem with VACUUM FULL in XC 1.1 (stop cluster, restart cluster, execute VACUUM FULL).  I could not reproduce the problem however with a build from the commit 11339220012a9e73cb82039b0ad41afd71bafca2 on Aug 22, 2012.  I suspect that a commit after that one to procarray.c may have something to do with the problem. It may be a similar issue for you. If you have the time, perhaps you can try different commits based on the git log of procarray.c to see if any particular one introduced the issue. 

I would use the commit not just on procarray, but on the entire source tree. If you find a suspect one based on procarray.c, try the one immediately preceding it for the whole source tree to confirm that it is procarray-related.  

Again, this all may be unrelated, but I think it is a good place to start, so I apologize in advance if this chews up some of your time unnecessarily.  
 
Regards,
--
Michael

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Postgres-xc-general mailing list
Postgres-xc-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general



--
Mason Sharp

TransLattice - http://www.translattice.com
Distributed and Clustered Database Solutions