I’ve been reading about differences between NVARCHAR2 and VARCHAR2 and generally people say that NVARCHAR2 should be used when someone wants to have multilingual data in a column. I played a little bit with it and I had some doubts still.
Doubt no. 1
Imagine I crated simple table to play with:
create table hr.test1
col1 VARCHAR2(3 CHAR)
Then I inserted a few rows with texts using different languages (german, japanese, russian):
insert into hr.test1 values ('こんに');
insert into hr.test1 values ('Здр');
insert into hr.test1 values ('aßö');
select * from hr.test1;
And I see that I have correct multilingual result without using NVARCHAR2:
So when I really should use NVARCHAR2, if this example works well with VARCHAR2?
Answer: There is no difference whether you use NVARCHAR2 or VARCHAR2 when default character set of the database is set to multi-byte character set (i.e. UTF-8). The only one difference can be the length in bytes for the same strings. Continue reading, please…
Doubt no. 2:
Imagine I’ve crated simple table to play with:
create table hr.test2
col1 VARCHAR2(3 CHAR),
insert into hr.test2 values ('asf', 'asf');
insert into hr.test2 values ('Здр', 'Здр');
insert into hr.test2 values ('こんに', 'こんに');
select LENGTHB(col1), LENGTHB(col2) from hr.test2;
Why there are such differences in byte-length? Why for japanese chars NVARCHAR2 took less bytes, but for english chars VARCHAR2 took less?
It is due to the fact that default character set is ‘AL32UTF8’ (multi-byte one) and NVARCHAR2 uses fixed-length bytes for every character. In this case always any character stored in NVARCHAR2 takes 2 bytes and it do not depend on language used. This is opposite to VARCHAR2.
To check your character set execute:
SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET';