I have come across a problem statement to find the all the common sub-strings between the given two sub-strings such a way that in every case you have to print the longest sub-string. The problem statement is as follows:
Write a program to find the common substrings between the two given strings. However, do not include substrings that are contained within longer common substrings.
For example, given the input strings eatsleepnightxyz
and eatsleepabcxyz
, the results should be:
eatsleep
(due to eatsleepnightxyz
eatsleepabcxyz
)
xyz
(due to eatsleepnightxyz
eatsleepabcxyz
)
a
(due to eatsleepnightxyz
eatsleepabcxyz
)
t
(due to eatsleepnightxyz
eatsleepabcxyz
)
However, the result set should not include e
from
eatsleepnightxyz
eatsleepabcxyz
, because both e
s are already contained in the eatsleep
mentioned above. Nor should you include ea
, eat
, ats
, etc., as those are also all covered by eatsleep
.
In this, you don't have to make use of String utility methods like: contains, indexOf, StringTokenizer, split and replace.
My Algorithm is as follows: I am starting with brute force and will switch to more optimized solution when I improve my basic understanding.
For String S1:
Find all the substrings of S1 of all the lengths
While doing so: Check if it is also a substring of
S2.
Attempt to figure out the time complexity of my approach.
Let the two given strings be n1-String and n2-String
- The number of substrings of S1 is clearly n1(n1+1)/2.
- But we have got to find the average length a substring of S1.
- Let’s say it is m. We’ll find m separately.
- Time Complexity to check whether an m-String is a substring of an
n-String is O(n*m).
- Now, we are checking for each m-String is a substring of S2,
which is an n2-String.
- This, as we have seen above, is an O(n2 m) algorithm.
- The time required by the overall algorithm then is
- Tn=(Number of substrings in S1) * (average substring lengthtime for character comparison procedure)
- By performing certain calculations, I came to conclusion that the
time complexity is O(n3 m2)
- Now, our job is to find m in terms of n1.
Attempt to find m in terms of n1.
Tn = (n)(1) + (n-1)(2) + (n-2)(3) + ..... + (2)(n-1) + (1)(n)
where Tn is the sum of lengths of all the substrings.
Average will be the division of this sum by the total number of Substrings produced.
This, simply is a summation and division problem whose solution is as follows O(n)
Therefore...
Running time of my algorithm is O(n^5).
With this in mind I wrote the following code:
package pack.common.substrings;
import java.util.ArrayList;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;
public class FindCommon2 {
public static final Set<String> commonSubstrings = new LinkedHashSet<String>();
public static void main(String[] args) {
printCommonSubstrings("neerajisgreat", "neerajisnotgreat");
System.out.println(commonSubstrings);
}
public static void printCommonSubstrings(String s1, String s2) {
for (int i = 0; i < s1.length();) {
List<String> list = new ArrayList<String>();
for (int j = i; j < s1.length(); j++) {
String subStr = s1.substring(i, j + 1);
if (isSubstring(subStr, s2)) {
list.add(subStr);
}
}
if (!list.isEmpty()) {
String s = list.get(list.size() - 1);
commonSubstrings.add(s);
i += s.length();
}
}
}
public static boolean isSubstring(String s1, String s2) {
boolean isSubstring = true;
int strLen = s2.length();
int strToCheckLen = s1.length();
if (strToCheckLen > strLen) {
isSubstring = false;
} else {
for (int i = 0; i <= (strLen - strToCheckLen); i++) {
int index = i;
int startingIndex = i;
for (int j = 0; j < strToCheckLen; j++) {
if (!(s1.charAt(j) == s2.charAt(index))) {
break;
} else {
index++;
}
}
if ((index - startingIndex) < strToCheckLen) {
isSubstring = false;
} else {
isSubstring = true;
break;
}
}
}
return isSubstring;
}
}
Explanation for my code:
printCommonSubstrings: Finds all the substrings of S1 and
checks if it is also a substring of
S2.
isSubstring : As the name suggests, it checks if the given string
is a substring of the other string.
Issue: Given the inputs
S1 = “neerajisgreat”;
S2 = “neerajisnotgreat”
S3 = “rajeatneerajisnotgreat”
In case of S1 and S2, the output should be: neerajis
and great
but in case of S1 and S3, the output should have been:
neerajis
, raj
, great
, eat
but still I am getting neerajis
and great
as output. I need to figure this out.
How should I design my code?
See Question&Answers more detail:
os