For Android API 26 and above AND a TTS engine that supports onRangeStart (in this case, Google TTS):
public class MainActivity extends AppCompatActivity implements TextToSpeech.OnInitListener {
TextToSpeech tts;
String sentence = "The Quick Brown Fox Jumps Over The Lazy Dog.";
TextView textView;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
textView = findViewById(R.id.textView);
textView.setText(sentence);
tts = new TextToSpeech(this, this);
}
// TextToSpeech.OnInitListener (for our purposes, the "main method" of this activity)
public void onInit(int status) {
tts.setOnUtteranceProgressListener(new UtteranceProgressListener() {
@Override
public void onStart(String utteranceId) {
Log.i("XXX", "utterance started");
}
@Override
public void onDone(String utteranceId) {
Log.i("XXX", "utterance done");
}
@Override
public void onError(String utteranceId) {
Log.i("XXX", "utterance error");
}
@Override
public void onRangeStart(String utteranceId,
final int start,
final int end,
int frame) {
Log.i("XXX", "onRangeStart() ... utteranceId: " + utteranceId + ", start: " + start
+ ", end: " + end + ", frame: " + frame);
// onRangeStart (and all UtteranceProgressListener callbacks) do not run on main thread
// ... so we explicitly manipulate views on the main thread:
runOnUiThread(new Runnable() {
@Override
public void run() {
Spannable textWithHighlights = new SpannableString(sentence);
textWithHighlights.setSpan(new ForegroundColorSpan(Color.YELLOW), start, end, Spanned.SPAN_INCLUSIVE_INCLUSIVE);
textView.setText(textWithHighlights);
}
});
}
});
}
public void startClicked(View ignored) {
tts.speak(sentence, TextToSpeech.QUEUE_FLUSH, null, "doesn't matter yet");
}
}
// -------------------------------------------------------------------
Android API 25 and below:
In theory, the most intuitive way of accomplish this would be to:
1)
Break the string into pieces
2)
Detect when each piece has been/is being spoken
3)
Highlight that piece accordingly
However, unfortunately, when using the Android TextToSpeech class where the speech output is generated in real-time, the smallest unit of speech that you are able to precisely detect the progress of (using UtteranceProgressListener) is an utterance (whatever string you decided to send to the TTS) -- not necessarily a word.
There is no mechanism whereby you can simply send a multi-word string as an utterance, and then somehow detect exactly when each word has been spoken.
Therefore, in order to (easily) highlight each word in order, you would have to either:
A)
Send each word to the TTS individually as a single utterance (but this will cause disjointed pronunciation), or
B)
Highlight sentence-by-sentence instead, sending each sentence as an utterance (easiest method, but not your desired behaviour).
If you really insist on achieving a word-by-word highlighting effect, the only way I can think of (using Android TextToSpeech
) is to use sentence-size utterances, but instead of using speak(), use synthesizeToFile()... and then use a media player or sound player of some sort to play the speech back... somehow approximating the timing of the highlights in terms of where the nth word lies relative to the total audio file length. So, for example, if the sentence is 10 words long, and the file is 30% complete, then you would highlight the 4th word. This would be difficult and inexact, but theoretically possible.
There are obviously apps and games that already exist that do this... games like Parappa the Rapper, or karaoke apps, but I think the way they do it is by having pre-recorded/static audio files with markers encoded at exact times that trigger the highlights. If your text content is always going to be the same, and only in one language, then you could also do this.
However, if the spoken text is user-entered or unknown until runtime, requiring a TTS, then I don't know of any straight-forward solution.
If you decide on one of these more narrowed-down approaches, then I would suggest posting a new question accordingly.