maybe I dont understand the question but you use a program to divide a text into different subtexts?
Why dont you use train_test_split from sklearn to get the test and training files,
sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…