java - JDBC-ODBC Bridge queries to Access fail when they have accented characters

Question

Welcome To Ask or Share your Answers For Others

java - JDBC-ODBC Bridge queries to Access fail when they have accented characters

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - JDBC-ODBC Bridge queries to Access fail when they have accented characters

I'm sending a query through the JDBC-ODBC Bridge to an Access database from Java, like this:

"SELECT * FROM localities WHERE locName='" + cityName + "'"

When cityName is a normal string with no accented characters, the resultset is correct. But when cityName happens to be something like LEóN, SAHAGúN, that is with accented characters in them, then I get no results. It seems like the query fails in these cases. The same queries when run in MS Access work all right, I also tried with Ms Data Acces SKD and these queries work perfectly.

They only fail when passing through the JDBC-ODBC Bridge. As I understand, Java uses UTF-8 for strings and so does Access. And they both use Unicode. Does anyone know any solution to this problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:57:45+0000

It sounds like your Java source file is encoded as UTF-8 so when the cityName string contains LEóN it is encoded as

L  E  ó     N
-- -- ----- --
4C 45 C3 93 4E

That is not how Access stores the value. Access does store characters as Unicode, but it does not use UTF-8 encoding. It uses a variation of UTF-16LE encoding where characters with code points U+00FF and below are stored in a single byte, and characters with code points above U+00FF are stored as a Null (0x0) value followed by their UTF-16LE byte pair(s). In this case ó is U+00D3, which is below U+00FF, so Access stores all four characters of the string as single bytes:

L  E  ó  N
-- -- -- --
4C 45 D3 4E

The net effect is that the encoding of the string in the Access database is the same as it would be for the ISO 8859-1 character set.

This can be confirmed with the following Java code which uses the JDBC-ODBC Bridge. It fails to find the desired record when the Java source file is encoded as UTF-8, but it works when the Java source file is encoded as cp1252 in Eclipse:

import java.sql.*;

public class accentTestMain {

    public static void main(String[] args) {
        String connectionString = 
                "jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};" + 
                "DBQ=C:\__tmp\test\accented.accdb;";
        try {
            Connection con = DriverManager.getConnection(connectionString);
            PreparedStatement stmt = con.prepareStatement("SELECT * FROM localities WHERE locName=?");
            String cityName = "LEóN";
            stmt.setString(1, cityName);
            stmt.execute();
            ResultSet rs = stmt.getResultSet();
            if (rs.next()) {
                System.out.println(String.format("Record found, ID=%d", rs.getInt("ID")));
            }
            else {
                System.out.print("Record not found.");
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

}

If you can make do with supporting only the accented characters represented in the cp1252 character set then you should be able to simply use cp1252 as the encoding setting for your Java source file(s).

On the other hand, if you really need full Unicode character support with an Access database then the JDBC-ODBC Bridge is not going to get the job done for you. This is a long-standing interoperability issue between the JDBC-ODBC Bridge and the Access ODBC driver, and it is not going to be fixed. (More details here.)

In that case you might want to consider using UCanAccess which is a pure-Java JDBC driver for Access. The corresponding code using UCanAccess with a UTF-8 encoded source file would be

// assumes...
//     import java.sql.*;
Connection conn=DriverManager.getConnection(
        "jdbc:ucanaccess://C:/__tmp/test/accented.accdb");
PreparedStatement ps = conn.prepareStatement(
        "SELECT ID FROM localities WHERE locName=?");
ps.setString(1, "LEóN");
ResultSet rs = ps.executeQuery();
if (rs.next()) {
    System.out.println(String.format(
            "Record found, ID=%d", 
            rs.getInt("ID")));
}
else {
    System.out.println("Record not found.");
}

For more information on using UCanAccess, see the related question here.

Another solution would be to use Jackcess to manipulate the Access database like so (again, the Java source file is encoded as UTF-8):

import java.io.File;
import java.io.IOException;
import com.healthmarketscience.jackcess.*;

public class accentTestMain {

    public static void main(String[] args) {
        Database db;
        try {
            db = DatabaseBuilder.open(new File("C:\__tmp\test\accented.accdb"));
            try {
                Table tbl = db.getTable("localities");
                Cursor crsr = CursorBuilder.createCursor(tbl.getIndex("locName"));
                if (crsr.findFirstRow(tbl.getColumn("locName"), "LEóN")) {
                    System.out.println(String.format("Record found, ID=%d", crsr.getCurrentRowValue(tbl.getColumn("ID"))));
                }
                else {
                    System.out.println("Record not found.");
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                db.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

Categories

java - JDBC-ODBC Bridge queries to Access fail when they have accented characters

java - JDBC-ODBC Bridge queries to Access fail when they have accented characters

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags