What is the reason, that using CONCAT() in pure UTF-8 environment MySQL still treats concatenated string (when some col in expression is for example int or date) as some other charset (probably Latin-1)?
MySQL environment seen from client (s
):
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
Test dataset:
CREATE TABLE `utf8_test` (
`id` int(10) unsigned NOT NULL auto_increment,
`title` varchar(50) collate utf8_estonian_ci default NULL,
`year` smallint(4) unsigned NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_estonian_ci;
INSERT INTO utf8_test VALUES (1, '???ü??', 2011);
This query is good:
SELECT id, title FROM utf8_test;
This one turns utf-8 flag off (already in MySQL, AFIU):
SELECT CONCAT(id, title) FROM utf8_test;
From mysql-client everything seems fine, because it is set to show chars as UTF-8, but when running through perl DBI, all results of queries having CONCAT() inside don't have utf-8 flag set. Example code:
#!/usr/bin/perl
use strict;
use utf8::all;
use Encode qw(is_utf8);
my $dbh = your_db_connect_routine('test');
my $str = $dbh->selectrow_array('SELECT CONCAT(id, title) FROM utf8_test');
print "CONCAT: False
" unless ( is_utf8($str) );
my $str = $dbh->selectrow_array('SELECT title FROM utf8_test');
print "NO CONCAT: False
" unless ( is_utf8($str) );
There is at least two workarounds i know
- quering with CAST()
SELECT CONCAT( CAST(id AS CHAR CHARACTER SET utf8), title) FROM utf8_test
- using
$str = Encode::_utf8_on($str)
(is considered as bad practice?)
but i am asking: why it is in MySQL so? Should i consider it as bug or feature?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…