Обсуждение: EUC_JP and SJIS conversion improvement
The character-code conversion from EUC_JP to SJIS is executed by converting two stages. The first stage is conversion from EUC_JP to MIC. The next stage is conversion from MIC to SJIS. (Conversion from SJIS to EUC_JP is also similar.) It is not so efficient, because it is necessary to allocate the buffer for MIC, and to execute the calculation for conversion twice. In the attached patch, it enables the direct conversion of EUC_JP and SJIS. Additionally, there is an improvement that reduce the call of pg_mic_mblen. The effect of the patch that I measured is as follows: o The Data for test was created by 'pgbench -i'. o Test SQL: set client_encoding to 'SJIS'; select * from accounts; o Test results: Linux(CPU: Pentium III, Compiler option: -O2) - original: 2.920s - patched : 2.278s regards, --- Atsushi Ogawa
Вложения
> The character-code conversion from EUC_JP to SJIS is executed by > converting two stages. The first stage is conversion from EUC_JP to MIC. > The next stage is conversion from MIC to SJIS. (Conversion from SJIS to > EUC_JP is also similar.) > > It is not so efficient, because it is necessary to allocate the > buffer for MIC, and to execute the calculation for conversion twice. > > In the attached patch, it enables the direct conversion of EUC_JP and > SJIS. Additionally, there is an improvement that reduce the call of > pg_mic_mblen. > > The effect of the patch that I measured is as follows: > > o The Data for test was created by 'pgbench -i'. > > o Test SQL: > set client_encoding to 'SJIS'; > select * from accounts; > > o Test results: Linux(CPU: Pentium III, Compiler option: -O2) > - original: 2.920s > - patched : 2.278s > > regards, > > --- > Atsushi Ogawa I have tested Atsushi's patches with PostgreSQL 8.0.3 on my Note PC running Linux 2.4 and got following results (database encoding is EUC_JP): 1) without patches $ time psql -c 'set client_encoding to 'SJIS';select * from accounts;' test >/dev/null real 0m4.926s user 0m1.680s sys 0m0.090s 2) with patches $ time psql -c 'set client_encoding to 'SJIS';select * from accounts;' test >/dev/null real 0m3.816s user 0m1.560s sys 0m0.070s 3) no encoding conversions $ time psql -c 'set client_encoding to 'EUC_JP';select * from accounts;' test >/dev/null real 0m3.220s user 0m1.760s sys 0m0.070s I got the 52% overhead decreases to 18% with the patches. This is a huge improvement! I will commit to current if there's no objection. -- Tatsuo Ishii