Обсуждение: [Q] UTF-8 testing with Windows/ODBC 8.3.0400

Поиск
Список
Период
Сортировка

[Q] UTF-8 testing with Windows/ODBC 8.3.0400

От
"V S P"
Дата:
Hello,
I am trying to confirm that my program can read UTF-8 characters and
cannot seem to figure out what's wrong.  I get question marks

I am using ODBC PSQLODBC35W.dll

My C++ program relies on OTL C++ library to do DB access, and in the
Visual Studio debugger I see only question marks '?' for the strings.
When I convert the question marks to utf-16 (in memory) just to see if
may displaying is wrong but I get '63'.
I am using std::string to store the bytestream from varchar column an I
think it is ok
because I do not need to 'manipulate' the content.

To eliminate my program from the consideration I downloaded and
installed
ODBC Query Tool 0.72 Beta build 23
(from sourcerorge)

I get the same question marks on the columns with unicode text.
I know that the database has stored it correctly, because I can see
everything 'Ok' with  PGAdminIII (and things work ok with my PHP PDO
access).

I do not have PGCLIENTENCODING=UTF8  set in the 'System', instead I just
set it per
command prompt, however I do not think it matters.

I have set ODBC driver options to have max var char length to 75000 and
the long var char to 75,000.

I cannot figure out what else I might be doing wrong.... as I said, all
I need for now it is just to test out that a C++ program via ODBC can
get the data.


thanks in advance for any help,
Vlad



--
  V S P
  toreason@fastmail.fm

--
http://www.fastmail.fm - Same, same, but different...


Re: [Q] UTF-8 testing with Windows/ODBC 8.3.0400

От
Craig Ringer
Дата:
V S P wrote:

> My C++ program relies on OTL C++ library to do DB access, and in the
> Visual Studio debugger I see only question marks '?' for the strings.

How would Visual Studio know that the std::string instances in question
contain UTF-8 data? std::string is a byte string, not a character string
- it could contain text in any encoding (or non-text data) and VC++ has
no way of knowing how to interpret it.

What it probably does is display anything within the ASCII range, and
otherwise display ?s .

If you expect to be able to work with those strings as real text, you
probably want to use std::wstring instead, and USE APPROPRIATE ENCODING
CONVERSION ROUTINES. Note that the width of wchar_t varies from platform
to platform, so you'll need to convert to/from UTF-16 for a 2 byte
wchar_t, or to/from UTF-32 for a 4-byte wchar_t.

(I hate working with unicode and encodings in standard C++ *SO* much -
argh! One of the only areas where I really wish I was using Java. If
only the QString class from Qt was part of standard C++ ... ).

> I am using std::string to store the bytestream from varchar column an I
> think it is ok
> because I do not need to 'manipulate' the content.

True - but VC++ won't be able to understand what's in it, either.

> I cannot figure out what else I might be doing wrong.... as I said, all
> I need for now it is just to test out that a C++ program via ODBC can
> get the data.

Your description really isn't adequate to say. It's highly likely that
you're retrieving the data from the database fine, but your tools don't
know it's UTF-8 and aren't able to work with it correctly. That's mostly
a guess with the amount of information you've provided, though.

Perhaps you could post a small, self-contained test program and a SQL
script to populate a test database? Then post the results of running the
program against the database, including the hex values of the bytes
returned by the ODBC interface.

--
Craig Ringer

Re: [Q] UTF-8 testing with Windows/ODBC 8.3.0400

От
"V S P"
Дата:
Hi thank you for the follow up.
Just had a break through...

I believe I was able to resolve most of the problems

I finally found a post on the net that says:
if you want UTF8 in your ODBC-based client program and Postgres is in
UTF8
then use the ASCII driver  not the Unicode

So as soon as switched that things worked
(I also kept:  set  client_encoding='UTF' on my ODBC connections at
startup)

I used :

wchar_t wData[1000];
::MultiByteToWideChar(CP_UTF8, 0, my_normal_std_string.c_str(), -1,
wData, 1000);

to convert the read data and display in the debugger

Before the switch to the Ascii odbc driver, the above was just showing
question marks.


So reading UTF8 into ODBC programming using the Ascii driver works
perfect.

Since I now I understood what was going on, converted most of my strings
to wstrings, and then enabled Unicode Version of the PG ODBC driver --
and that works too  !

:-) so now I have a #define where I switch between wstrings and strings
and of course a few other things, and then I flip the drivers in ODBC
datasource
and things work  (I have tested selects sofar).


Three things that I am not still sure about, and may be you can help:

a) does Posgtres driver on unixODBC do the same as Windows (that is
there is a unicode and non unicode versions of the drivers ?
(I am interested in 64 bit linux and 64 bit freebsd ones)

b) I noticed that when using the Unicode version (first) and
Ascii version (second) the value of SWORD right before SQLVLEN is
different
(it is 12 on the ascii and -9 on unicode version) -- what does this
mean?

disp_otrq_x86d  8a4-b90 EXIT  SQLDescribeColW  with return code 0
(SQL_SUCCESS)
        HSTMT               013F1BA8
        UWORD                       11
        WCHAR *             0x01A28974 [       9] "cntr_data"
        SWORD                      512
        SWORD *             0x01A28BC4 (9)
        SWORD *             0x01A28BB8 (-9)
        SQLULEN *           0x01A28B94 (4096)
        SWORD *             0x01A28BA0 (0)
        SWORD *             0x01A28B88 (1)


disp_otrq_x86d  ab8-498 EXIT  SQLDescribeColW  with return code 0
(SQL_SUCCESS)
        HSTMT               013F1C38
        UWORD                       11
        WCHAR *             0x01A28974 [       9] "cntr_data"
        SWORD                      512
        SWORD *             0x01A28BC4 (9)
        SWORD *             0x01A28BB8 (12)
        SQLULEN *           0x01A28B94 (4096)
        SWORD *             0x01A28BA0 (0)
        SWORD *             0x01A28B88 (1)




another  question:  I have about 6 tables where about 20 fields in each
table,
2 fields are 65K long (they are declared as varchar(65000) is this is OK
for ODBC drivers, and what if anything I should be setting on them?



Thank you again for your follow up,
Vlad





On Wed, 18 Mar 2009 15:46 +0900, "Craig Ringer"
<craig@postnewspapers.com.au> wrote:
> V S P wrote:
>
> > My C++ program relies on OTL C++ library to do DB access, and in the
> > Visual Studio debugger I see only question marks '?' for the strings.
>
> How would Visual Studio know that the std::string instances in question
> contain UTF-8 data? std::string is a byte string, not a character string
> - it could contain text in any encoding (or non-text data) and VC++ has
> no way of knowing how to interpret it.
>
> What it probably does is display anything within the ASCII range, and
> otherwise display ?s .
>
> If you expect to be able to work with those strings as real text, you
> probably want to use std::wstring instead, and USE APPROPRIATE ENCODING
> CONVERSION ROUTINES. Note that the width of wchar_t varies from platform
> to platform, so you'll need to convert to/from UTF-16 for a 2 byte
> wchar_t, or to/from UTF-32 for a 4-byte wchar_t.
>
> (I hate working with unicode and encodings in standard C++ *SO* much -
> argh! One of the only areas where I really wish I was using Java. If
> only the QString class from Qt was part of standard C++ ... ).
>
> > I am using std::string to store the bytestream from varchar column an I
> > think it is ok
> > because I do not need to 'manipulate' the content.
>
> True - but VC++ won't be able to understand what's in it, either.
>
> > I cannot figure out what else I might be doing wrong.... as I said, all
> > I need for now it is just to test out that a C++ program via ODBC can
> > get the data.
>
> Your description really isn't adequate to say. It's highly likely that
> you're retrieving the data from the database fine, but your tools don't
> know it's UTF-8 and aren't able to work with it correctly. That's mostly
> a guess with the amount of information you've provided, though.
>
> Perhaps you could post a small, self-contained test program and a SQL
> script to populate a test database? Then post the results of running the
> program against the database, including the hex values of the bytes
> returned by the ODBC interface.
>
> --
> Craig Ringer
--
  V S P
  toreason@fastmail.fm

--
http://www.fastmail.fm - A no graphics, no pop-ups email service


Re: [Q] UTF-8 testing with Windows/ODBC 8.3.0400

От
Hiroshi Inoue
Дата:
V S P wrote:
> Hi thank you for the follow up.
> Just had a break through...

<snip>

> Three things that I am not still sure about, and may be you can help:
>
> a) does Posgtres driver on unixODBC do the same as Windows (that is
> there is a unicode and non unicode versions of the drivers ?
> (I am interested in 64 bit linux and 64 bit freebsd ones)
>
> b) I noticed that when using the Unicode version (first) and
> Ascii version (second) the value of SWORD right before SQLVLEN is
> different
> (it is 12 on the ascii and -9 on unicode version) -- what does this
> mean?

It's the SQL data type. 12 means SQL_VARCHAR and -9 means SQL_WVARCHAR.

> disp_otrq_x86d  8a4-b90 EXIT  SQLDescribeColW  with return code 0
> (SQL_SUCCESS)
>         HSTMT               013F1BA8
>         UWORD                       11
>         WCHAR *             0x01A28974 [       9] "cntr_data"
>         SWORD                      512
>         SWORD *             0x01A28BC4 (9)
>         SWORD *             0x01A28BB8 (-9)
>         SQLULEN *           0x01A28B94 (4096)
>         SWORD *             0x01A28BA0 (0)
>         SWORD *             0x01A28B88 (1)
>
>
> disp_otrq_x86d  ab8-498 EXIT  SQLDescribeColW  with return code 0
> (SQL_SUCCESS)
>         HSTMT               013F1C38
>         UWORD                       11
>         WCHAR *             0x01A28974 [       9] "cntr_data"
>         SWORD                      512
>         SWORD *             0x01A28BC4 (9)
>         SWORD *             0x01A28BB8 (12)
>         SQLULEN *           0x01A28B94 (4096)
>         SWORD *             0x01A28BA0 (0)
>         SWORD *             0x01A28B88 (1)

> another  question:  I have about 6 tables where about 20 fields in each
> table,
> 2 fields are 65K long (they are declared as varchar(65000) is this is OK
> for ODBC drivers, and what if anything I should be setting on them?

65K may be OK if you set the max long varchar size > 65000 but please
note that most applications/middlewares have their own limitation.
For example the limitation of MS Access is around 65000 AFAIR.

regards,
Hiroshi Inoue

Re: [Q] UTF-8 testing with Windows/ODBC 8.3.0400

От
"V S P"
Дата:
thank you,
I have a follow up:

I if I declare a field as varchar(30)
in the UTF-8 Postgres, I assume it can possibly be more than 30 bytes
storage.


so when I set the ODBCs (unicode version)  'max long varchar'
am I setting it to max 'byte length' of the UTF-8 (as in the dabase)
or to '60' (max number of bytes for 30 character string in UTF-16)
or to '30' (max number of valid characters) ?


And would your answer be the same for the unix Odbc version ?


Thank you again for the response,
Vlad



On Thu, 19 Mar 2009 21:28 +0900, "Hiroshi Inoue" <inoue@tpf.co.jp>
wrote:
> V S P wrote:
> > Hi thank you for the follow up.
> > Just had a break through...
>
> <snip>
>
> > Three things that I am not still sure about, and may be you can help:
> >
> > a) does Posgtres driver on unixODBC do the same as Windows (that is
> > there is a unicode and non unicode versions of the drivers ?
> > (I am interested in 64 bit linux and 64 bit freebsd ones)
> >
> > b) I noticed that when using the Unicode version (first) and
> > Ascii version (second) the value of SWORD right before SQLVLEN is
> > different
> > (it is 12 on the ascii and -9 on unicode version) -- what does this
> > mean?
>
> It's the SQL data type. 12 means SQL_VARCHAR and -9 means SQL_WVARCHAR.
>
> > disp_otrq_x86d  8a4-b90 EXIT  SQLDescribeColW  with return code 0
> > (SQL_SUCCESS)
> >         HSTMT               013F1BA8
> >         UWORD                       11
> >         WCHAR *             0x01A28974 [       9] "cntr_data"
> >         SWORD                      512
> >         SWORD *             0x01A28BC4 (9)
> >         SWORD *             0x01A28BB8 (-9)
> >         SQLULEN *           0x01A28B94 (4096)
> >         SWORD *             0x01A28BA0 (0)
> >         SWORD *             0x01A28B88 (1)
> >
> >
> > disp_otrq_x86d  ab8-498 EXIT  SQLDescribeColW  with return code 0
> > (SQL_SUCCESS)
> >         HSTMT               013F1C38
> >         UWORD                       11
> >         WCHAR *             0x01A28974 [       9] "cntr_data"
> >         SWORD                      512
> >         SWORD *             0x01A28BC4 (9)
> >         SWORD *             0x01A28BB8 (12)
> >         SQLULEN *           0x01A28B94 (4096)
> >         SWORD *             0x01A28BA0 (0)
> >         SWORD *             0x01A28B88 (1)
>
> > another  question:  I have about 6 tables where about 20 fields in each
> > table,
> > 2 fields are 65K long (they are declared as varchar(65000) is this is OK
> > for ODBC drivers, and what if anything I should be setting on them?
>
> 65K may be OK if you set the max long varchar size > 65000 but please
> note that most applications/middlewares have their own limitation.
> For example the limitation of MS Access is around 65000 AFAIR.
>
> regards,
> Hiroshi Inoue
--
  V S P
  toreason@fastmail.fm

--
http://www.fastmail.fm - A fast, anti-spam email service.


[Q] unixODBC and Unicode (2byte vs 4 byte)

От
"V S P"
Дата:
I am designing now my application and testing on Windows sofar
I would like to make sure that from a 'get go' I am doing the
'right thing' as far as

a) freebsd/windows/linux (32 and 64 bit support)
b) unicode support

my DB in encoded in UTF-8

with the help of this list and others I am now able to
use OTL (the C++ library I use for ODBC) and
connect to Postgres Ascii ODBC and use std::string (even for utf-8)
and Unicode ODBC and use wstring for varchar

all on windows/32 bit sofar.

My question is:
what is the 'accepted' way of accessing UTF-8 encoded db on Linux and
Freebsd
is it via ODBC unicode (if such thing exists)
or ASCII unicode

and if it is via 'Unicode' what do I use as a C++ type to store
string results?
wstring or something else?  (because wstring is 4 bytes on unix and 2 on
windows)

Is pg ODBC Unicode driver on Linux/FreeBSD converting utf-8 into two
byte or
4 byte character strings?
(and what do most people do in this situation)


thank you in advance,
Vlad


--
  V S P
  toreason@fastmail.fm

--
http://www.fastmail.fm - mmm... Fastmail...