Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
200 views
in Technique[技术] by (71.8m points)

java - How to detect whether String.substring copies the character data

I know that for Oracle Java 1.7 update 6 and newer, when using String.substring, the internal character array of the String is copied, and for older versions, it is shared. But I found no offical API that would tell me the current behavior.

Use Case

My use case is: In a parser, I like to detect whether String.substring copies or shares the underlying character array. The problem is, if the character array is shared, then my parser needs to explicitly "un-share" using new String(s) to avoid memory problems. However, if String.substring anyway copies the data, then this is not necessary, and explicitly copying the data in the parser could be avoided. Use case:

// possibly the query is very very large
String query = "select * from test ...";
// the identifier is used outside of the parser
String identifier = query.substring(14, 18);

// avoid if possible for speed,
// but needed if identifier internally 
// references the large query char array
identifier = new String(identifier);

What I Need

Basically, I would like to have a static method boolean isSubstringCopyingForSure() that would detect if new String(..) is not needed. I'm OK if detection doesn't work if there is a SecurityManager. Basically, the detection should be conservative (to avoid memory problems, I'd rather use new String(..) even if not necessary).

Options

I have a few options, but I'm not sure if they are reliable, specially for non-Oracle JVMs:

Checking for the String.offset field

/**
 * @return true if substring is copying, false if not or if it is not clear
 */
static boolean isSubstringCopyingForSure() {
    if (System.getSecurityManager() != null) {
        // we can not reliably check it
        return false;
    }
    try {
        for (Field f : String.class.getDeclaredFields()) {
            if ("offset".equals(f.getName())) {
                return false;
            }
        }
        return true;
    } catch (Exception e) {
        // weird, we do have a security manager?
    }
    return false;
}

Checking the JVM version

static boolean isSubstringCopyingForSure() {
    // but what about non-Oracle JREs?
    return System.getProperty("java.vendor").startsWith("Oracle") &&
           System.getProperty("java.version").compareTo("1.7.0_45") >= 0;
}

Checking the behavior There are two options, both are rather complicated. One is create a string using custom charset, then create a new string b using substring, then modify the original string and check whether b is also changed. The second options is create huge string, then a few substrings, and check the memory usage.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is not a detail you need to care about. No really! Just call identifier = new String(identifier) in both cases (JDK6 and JDK7). Under JDK6 it will create a copy (as desired). Under JDK7, because the substring is already a unique string the constructor is essentially a no-op (no copy is performed -- read the code). Sure there is a slight overhead of object creation, but because of object reuse in the Younger generation, I challenge you to qualify a performance difference.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...