Обсуждение: settings for multi-language unicode DB

Поиск
Список
Период
Сортировка

settings for multi-language unicode DB

От
Janet Bagg
Дата:
Please could somebody give me advice on settings for PGSQL with a database
with UTF-8 strings in a large number of languages? I've had no problems so
far in storing/retrieving UTF-8 strings but can't find clear answers to
other issues.

What locale would be best for sorting this dataset? Does it have to be the
same as the one set for the OS (these seem very limited for Solaris)?

As well as European accented characters, I need to handle some others used
for transcription. I have to produce simple to use, sorted lists by initial
letter for users and staff.

The database is for a library which takes journals from all over the world
and has to store titles in all european langauges (broadly defined) plus
western transcriptions of others. We are in the process of moving to
unicode. The DBMS currently runs on Solaris 9 with a backup system on
Fedora 3 Linux and user/editor access is via servlets (Java).

Thanks for any help.

Janet Bagg, CSAC, University of Kent, UK



Re: settings for multi-language unicode DB

От
Bruce Momjian
Дата:
Janet Bagg wrote:
> Please could somebody give me advice on settings for PGSQL with a database
> with UTF-8 strings in a large number of languages? I've had no problems so
> far in storing/retrieving UTF-8 strings but can't find clear answers to
> other issues.
>
> What locale would be best for sorting this dataset? Does it have to be the
> same as the one set for the OS (these seem very limited for Solaris)?

Yes, right now PostgreSQL uses the operating system to do locale
ordering.  That might change in a future release.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Deadlock Detected (revisited)

От
Wes
Дата:
I haven't seen anything to indicate that 8.x improves foreign key
refererence locking and fixes the foreign key 'deadlock detected' issue.
Has that been addressed in 8.1?

I sort my records before loading, but am still getting 'deadlock detected'
(still running 7.3.4).

I have

  B references C
  B references A
  A references C

Any new records for C are loaded first (both A and B depend on C).

A is loaded next (B depends on A).

B is loaded last.

Both A and B are loaded in 'Table C' sorted order.  However, a deadlock can
occur with one process's A records and another's B records, due to the fact
that both have a foreign key reference to C.

The only way I know of to eliminate the deadlocks without serializing the
processes is to remove the foreign key reference from either A or B.
However, jeopardizes referential integrity if something goes wrong with the
load process.

I've never quite understood why a READ of a record with a foreign key
reference results in the referenced record being locked with more than a
shared lock.

Wes



Re: Deadlock Detected (revisited)

От
Stephan Szabo
Дата:
On Fri, 25 Nov 2005, Wes wrote:

> I haven't seen anything to indicate that 8.x improves foreign key
> refererence locking and fixes the foreign key 'deadlock detected' issue.
> Has that been addressed in 8.1?

8.1 should be using the new shared row locks for doing the checks.  This
should fix the case mentioned.

> I've never quite understood why a READ of a record with a foreign key
> reference results in the referenced record being locked with more than a
> shared lock.

Up until now, we didn't have one to get on a per-record basis.

Re: Deadlock Detected (revisited)

От
Wes
Дата:
On 11/25/05 2:40 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:

> 8.1 should be using the new shared row locks for doing the checks.  This
> should fix the case mentioned.

Hmm.  I'm looking in the "What's new" for 8.1 and see that now.  I could
have sworn it wasn't there the last time I looked.  Sorry.

> Up until now, we didn't have one to get on a per-record basis.

Ok, I guess I'm confused...  From the 7.3.2 manual (6.5 updates):

"Multiversion concurrency control(MVCC) This removes our old table-level
locking, and replaces it with a locking system that is superior to most
commercial database systems. In a traditional system, each row that is
modified is locked until committed, preventing reads by other users. MVCC
uses the natural multiversion nature of PostgreSQL to allow readers to
continue reading consistent data during writer activity. Writers continue to
use the compact pg_log transaction system. This is all performed without
having to allocate a lock for every row like traditional database systems.
So, basically, we no longer are restricted by simple table-level locking; we
have something better than row-level locking."

So, until 8.1 PostgreSQL had "something better than row-level locking" for
some things, but no row locking when needed?  Or was it row locking is
there, but just no shared row locking?

Wes



Re: Deadlock Detected (revisited)

От
Stephan Szabo
Дата:
On Fri, 25 Nov 2005, Wes wrote:

> On 11/25/05 2:40 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:
>
> > 8.1 should be using the new shared row locks for doing the checks.  This
> > should fix the case mentioned.
>
> Hmm.  I'm looking in the "What's new" for 8.1 and see that now.  I could
> have sworn it wasn't there the last time I looked.  Sorry.
>
> > Up until now, we didn't have one to get on a per-record basis.
>
> Ok, I guess I'm confused...  From the 7.3.2 manual (6.5 updates):
>
> "Multiversion concurrency control(MVCC) This removes our old table-level
> locking, and replaces it with a locking system that is superior to most
> commercial database systems. In a traditional system, each row that is
> modified is locked until committed, preventing reads by other users. MVCC
> uses the natural multiversion nature of PostgreSQL to allow readers to
> continue reading consistent data during writer activity. Writers continue to
> use the compact pg_log transaction system. This is all performed without
> having to allocate a lock for every row like traditional database systems.
> So, basically, we no longer are restricted by simple table-level locking; we
> have something better than row-level locking."
>
> So, until 8.1 PostgreSQL had "something better than row-level locking" for
> some things, but no row locking when needed?  Or was it row locking is
> there, but just no shared row locking?

The latter, the row locks before were single owner and were such that a
second lock request for the same row would wait for the first to be
released. Now effectively you have two levels of locks at the row level,
the weaker of which conflicts with the stronger but not with itself. The
thing about MVCC is that readers do not have to get either lock if they
aren't trying to prevent modifications.

Re: Deadlock Detected (revisited)

От
Wes
Дата:
On 11/25/05 9:12 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:

> The latter, the row locks before were single owner and were such that a
> second lock request for the same row would wait for the first to be
> released. Now effectively you have two levels of locks at the row level,
> the weaker of which conflicts with the stronger but not with itself.

Now I understand.  Thanks for clarifying.

It sounds like there may be some performance improvements that will be
beneficial to us also.  Hopefully I can convince them to agree to the 8.1
upgrade.  Besides verifying our application, it takes a full weekend to do
the dump/load - the database is just under a billion rows.

Wes



Excessive vacuum times

От
Wes
Дата:
Some time ago I reported a problem where right after an index rebuild vacuum
might take a couple of hours, but gets slower and slower at a seeming
increasing rate as time goes on.  What takes a couple of ours after a full
index rebuild now takes 12 hours or more.

The problem was determined to be due to the fact that indexes are vacuumed
in index order, not in disk storage order.  I don't see anything about this
in the "What's new" for 8.1.  Has anything been done to resolve this?

Scheduled index rebuilds are not feasible.  I really don't even want to take
a chance on starting one.  The database is just a hair under one billion
rows, and could take the entire weekend or more to rebuild.

Wes



Re: Excessive vacuum times

От
Tom Lane
Дата:
Wes <wespvp@syntegra.com> writes:
> The problem was determined to be due to the fact that indexes are vacuumed
> in index order, not in disk storage order.  I don't see anything about this
> in the "What's new" for 8.1.  Has anything been done to resolve this?

No.  Avoiding that would require a new approach to
vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
someone has a Bright Idea (tm).

            regards, tom lane

Re: Excessive vacuum times

От
"Jim C. Nasby"
Дата:
On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote:
> Wes <wespvp@syntegra.com> writes:
> > The problem was determined to be due to the fact that indexes are vacuumed
> > in index order, not in disk storage order.  I don't see anything about this
> > in the "What's new" for 8.1.  Has anything been done to resolve this?
>
> No.  Avoiding that would require a new approach to
> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
> someone has a Bright Idea (tm).

Plus there is a TODO to only vacuum pages that are known to have dead
tuples, which should hopefully mean no more index-scans during vacuum as
well. Hopefully this makes it into 8.2...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Excessive vacuum times

От
Tom Lane
Дата:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote:
>> No.  Avoiding that would require a new approach to
>> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
>> someone has a Bright Idea (tm).

> Plus there is a TODO to only vacuum pages that are known to have dead
> tuples, which should hopefully mean no more index-scans during vacuum as
> well.

No such luck.  You delete any tuples, you need to scan the indexes.

            regards, tom lane

Re: Excessive vacuum times

От
"Jim C. Nasby"
Дата:
On Mon, Dec 12, 2005 at 11:09:01PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote:
> >> No.  Avoiding that would require a new approach to
> >> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
> >> someone has a Bright Idea (tm).
>
> > Plus there is a TODO to only vacuum pages that are known to have dead
> > tuples, which should hopefully mean no more index-scans during vacuum as
> > well.
>
> No such luck.  You delete any tuples, you need to scan the indexes.

Even though you can see what the index values were for the now-dead
tuple?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Excessive vacuum times

От
Wes
Дата:
On 12/12/05 5:26 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>> The problem was determined to be due to the fact that indexes are vacuumed
>> in index order, not in disk storage order.  I don't see anything about this
>> in the "What's new" for 8.1.  Has anything been done to resolve this?
>
> No.  Avoiding that would require a new approach to
> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
> someone has a Bright Idea (tm).

Any ideas on how I might I reconfigure to mitigate the issue?  Separating
the most offending indexes to separate drives probably isn't an option.

Wes



Re: Excessive vacuum times

От
Tom Lane
Дата:
Wes <wespvp@syntegra.com> writes:
> Any ideas on how I might I reconfigure to mitigate the issue?  Separating
> the most offending indexes to separate drives probably isn't an option.

What are you using for vacuum_mem?  A larger value should reduce the
number of times we have to scan the indexes of a large table.

            regards, tom lane

./configure --with-openssl=path fails

От
Wes
Дата:
Because of our build environment, we need to build PostgreSQL specifying the
location of the openssl libraries to insure everyone is using the same
libraries, regardless of the system software is built and executed on.  At
7.4.5, we used:

 ./configure --with-openssl=path

This no longer works with 8.x. See:

 <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>

What is the correct way to work around this restriction?  Set
LD_LIBRARY_PATH?

This is on a variety of unix platforms.

Wes



Re: ./configure --with-openssl=path fails

От
Tom Lane
Дата:
Wes <wespvp@syntegra.com> writes:
>  ./configure --with-openssl=path
> This no longer works with 8.x. See:
>  <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>
> What is the correct way to work around this restriction?

Use --with-includes and --with-libraries as needed.

            regards, tom lane

Re: ./configure --with-openssl=path fails

От
Wes
Дата:
On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>>  ./configure --with-openssl=path
>> This no longer works with 8.x. See:
>>  <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>
>> What is the correct way to work around this restriction?
>
> Use --with-includes and --with-libraries as needed.

That allows me to build, but doesn't set the library search path in the
binaries ("-Wl,-rpath /blah/blah" on linux or "-R /blah/blah" for Solaris).
This prevents programs from loading, since they can't find the libraries if
they are not in the default library search path of the user.

Wes



Re: ./configure --with-openssl=path fails

От
Wes
Дата:
On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>>  ./configure --with-openssl=path
>> This no longer works with 8.x. See:
>>  <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>
>> What is the correct way to work around this restriction?
>
> Use --with-includes and --with-libraries as needed.

That doesn't get the library paths into the binary.  If the libraries are
not in the default system search path, the user is screwed.  Is there a way
to solve this?  I think the previous --with-openssl=path set the search path
so the libraries would always be found.

Wes



Re: ./configure --with-openssl=path fails

От
Tom Lane
Дата:
Wes <wespvp@syntegra.com> writes:
> On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> Use --with-includes and --with-libraries as needed.

> That doesn't get the library paths into the binary.  If the libraries are
> not in the default system search path, the user is screwed.  Is there a way
> to solve this?  I think the previous --with-openssl=path set the search path
> so the libraries would always be found.

No, it didn't.  Really the best solution to this is to not have commonly
used libraries that aren't in the system search path ...

            regards, tom lane