Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
93 views
in Technique[技术] by (71.8m points)

java - Why my string manipulation is slow using lambda expression?

A method takes comma separated words as a String and returns a String of comma separated words with the words in natural sort order, not containing any 4 letter words, contain all words in UPPER case and no duplicates. The 1st approach is quite slow in comparison to the 2nd approach. Can you help me understand why and how can I improve my approach?

Approach 1:

public String stringProcessing(String s){
      Stream<String> tokens = Arrays.stream(s.split(","));
      return tokens.filter(t -> t.length() != 4) .distinct()
                   .sorted() 
                   .collect(Collectors.joining(",")).toUpperCase();
}

Approach 2:

public String processing(String s) {
    String[] tokens = s.split(",");
    Set<String> resultSet = new TreeSet<>();
    for(String t:tokens){
        if(t.length() !=  4)
            resultSet.add(t.toUpperCase());
    }        
    StringBuilder result = new StringBuilder();
    resultSet.forEach(key -> {
        result.append(key).append(","); 
    });
    result.deleteCharAt(result.length()-1);
    return result.toString();
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A performance comparison without documenting the used JRE version, input data sets nor benchmark methodology is not suitable to draw any conclusions.

Further, there are fundamental differences between your variants. You first variant processes the original strings when using distinct(), potentially keeping much more elements than the second variant, joins all of them to a string, before transforming the complete result string to upper case. In contrast, your second variant transforms individual elements before adding to the set, so only strings with a distinct upper case representation are processed further. So the second variant may need significantly less memory and process less elements when joining.

So when doing entirely different things, there is not much sense in comparing the performance of these operations. A better comparison would be between these two variants:

public String variant1(String s){
    Stream<String> tokens = Arrays.stream(s.split(","));
    return tokens.filter(t -> t.length() != 4)
                 .map(String::toUpperCase)
                 .sorted().distinct()
                 .collect(Collectors.joining(","));
}

public String variant2(String s) {
    String[] tokens = s.split(",");
    Set<String> resultSet = new TreeSet<>();
    for(String t:tokens){
        if(t.length() !=  4)
            resultSet.add(t.toUpperCase());
    }
    return String.join(",", resultSet);
}

Note that I changed the order of sorted() and distinct(); as discussed in this answer, applying distinct() directly after sorted() allows to exploit the sorted nature of the stream within the distinct operation.

You may also consider not creating a temporary array holding all substrings before streaming over them:

public String variant1(String s){
    return Pattern.compile(",").splitAsStream(s)
            .filter(t -> t.length() != 4)
            .map(String::toUpperCase)
            .sorted().distinct()
            .collect(Collectors.joining(","));
}

You may also add a third variant,

public String variant3(String s) {
    Set<String> resultSet = new TreeSet<>();
    int o = 0, p;
    for(p = s.indexOf(','); p>=0; p = s.indexOf(',', o=p+1)) {
        if(p-o == 4) continue;
        resultSet.add(s.substring(o, p).toUpperCase());
    }
    if(s.length()-o != 4) resultSet.add(s.substring(o).toUpperCase());
    return String.join(",", resultSet);
}

which doesn’t create an array of substrings and even skips the substring creation for those not matching the filter criteria. This isn’t meant to suggest to go such low level in production code, but that there always might be a faster variant, so it doesn’t matter whether the variant you’re using is the fastest, but rather whether it runs at reasonable speed while being maintainable.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...