It sounds like your Java source file is encoded as UTF-8 so when the cityName
string contains LEóN
it is encoded as
L E ó N
-- -- ----- --
4C 45 C3 93 4E
That is not how Access stores the value. Access does store characters as Unicode, but it does not use UTF-8 encoding. It uses a variation of UTF-16LE encoding where characters with code points U+00FF and below are stored in a single byte, and characters with code points above U+00FF are stored as a Null (0x0) value followed by their UTF-16LE byte pair(s). In this case ó
is U+00D3, which is below U+00FF, so Access stores all four characters of the string as single bytes:
L E ó N
-- -- -- --
4C 45 D3 4E
The net effect is that the encoding of the string in the Access database is the same as it would be for the ISO 8859-1 character set.
This can be confirmed with the following Java code which uses the JDBC-ODBC Bridge. It fails to find the desired record when the Java source file is encoded as UTF-8
, but it works when the Java source file is encoded as cp1252
in Eclipse:
import java.sql.*;
public class accentTestMain {
public static void main(String[] args) {
String connectionString =
"jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};" +
"DBQ=C:\__tmp\test\accented.accdb;";
try {
Connection con = DriverManager.getConnection(connectionString);
PreparedStatement stmt = con.prepareStatement("SELECT * FROM localities WHERE locName=?");
String cityName = "LEóN";
stmt.setString(1, cityName);
stmt.execute();
ResultSet rs = stmt.getResultSet();
if (rs.next()) {
System.out.println(String.format("Record found, ID=%d", rs.getInt("ID")));
}
else {
System.out.print("Record not found.");
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
If you can make do with supporting only the accented characters represented in the cp1252
character set then you should be able to simply use cp1252
as the encoding setting for your Java source file(s).
On the other hand, if you really need full Unicode character support with an Access database then the JDBC-ODBC Bridge is not going to get the job done for you. This is a long-standing interoperability issue between the JDBC-ODBC Bridge and the Access ODBC driver, and it is not going to be fixed. (More details here.)
In that case you might want to consider using UCanAccess which is a pure-Java JDBC driver for Access. The corresponding code using UCanAccess with a UTF-8 encoded source file would be
// assumes...
// import java.sql.*;
Connection conn=DriverManager.getConnection(
"jdbc:ucanaccess://C:/__tmp/test/accented.accdb");
PreparedStatement ps = conn.prepareStatement(
"SELECT ID FROM localities WHERE locName=?");
ps.setString(1, "LEóN");
ResultSet rs = ps.executeQuery();
if (rs.next()) {
System.out.println(String.format(
"Record found, ID=%d",
rs.getInt("ID")));
}
else {
System.out.println("Record not found.");
}
For more information on using UCanAccess, see the related question here.
Another solution would be to use Jackcess to manipulate the Access database like so (again, the Java source file is encoded as UTF-8):
import java.io.File;
import java.io.IOException;
import com.healthmarketscience.jackcess.*;
public class accentTestMain {
public static void main(String[] args) {
Database db;
try {
db = DatabaseBuilder.open(new File("C:\__tmp\test\accented.accdb"));
try {
Table tbl = db.getTable("localities");
Cursor crsr = CursorBuilder.createCursor(tbl.getIndex("locName"));
if (crsr.findFirstRow(tbl.getColumn("locName"), "LEóN")) {
System.out.println(String.format("Record found, ID=%d", crsr.getCurrentRowValue(tbl.getColumn("ID"))));
}
else {
System.out.println("Record not found.");
}
} catch (Exception e) {
e.printStackTrace();
} finally {
db.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}