Here is the code for creating tokens from the list of full names
import en_core_web_sm
from datetime import datetime
from joblib import Parallel, delayed
from spacy.util import minibatch
from functools import partial
tok_text = [] # OUTPUT for our tokenised corpus
text = ["liam noah", "oliver william", "harper mason", "emma noah", "evelyn ethan", "mia lucas", "amelia benjamin", "isabella james", "sophia mason", "ava elijah"]
nlp = en_core_web_sm.load()
def process_texts(nlp, batch_id, text):
print(f"{datetime.now()} Processing batch {batch_id}")
for doc in nlp.pipe(text):
tok = [t.text for t in doc if(t.is_ascii and not t.is_punct and not t.is_space)]
tok_text.append(tok)
batch_size = 2
if __name__ == '__main__':
print("Creating Parallel batches...")
partitions = minibatch(text, size=batch_size)
executor = Parallel(n_jobs=1) #later will update it to n_jobs=2
do = delayed(partial(process_texts, nlp))
tasks = (do(i, batch) for i, batch in enumerate(partitions))
executor(tasks)
print("Tokens:: ",tok_text)
Above code generates below output (Sequential batches):
Creating Parallel batches...
2021-01-26 14:20:18.852977 Processing batch 0
2021-01-26 14:20:18.870977 Processing batch 1
2021-01-26 14:20:18.886977 Processing batch 2
2021-01-26 14:20:18.900977 Processing batch 3
2021-01-26 14:20:18.918977 Processing batch 4
Tokens:: [['liam', 'noah'], ['oliver', 'william'], ['harper', 'mason'], ['emma', 'noah'], ['evelyn', 'ethan'], ['mia', 'lucas'], ['amelia', 'benjamin'], ['isabella', 'james'], ['sophia', 'mason'], ['ava', 'elijah']]
When I change the line to n_jobs=2
executor = Parallel(n_jobs=2)
It runs, no errors, but no work done...
Creating Parallel batches...
Tokens:: [] <== Empty output
What am I missing?
question from:
https://stackoverflow.com/questions/65907033/python-joblib-parallel-n-jobs-1-is-working-n-jobs-2-is-not-working-no-error-e