I am trying to export my Pytorch network using TorchScript, since that seemed like the most straight forward method to deploy a trained network (only for inference, no more training). However, the network uses TfidfVectorizer which generates the following error.
ValueError: Unknown type annotation: 'TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
dtype=<class 'numpy.float64'>, encoding='utf-8',
input='content', lowercase=True, max_df=1.0, max_features=None,
min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
smooth_idf=True, stop_words=None, strip_accents=None,
sublinear_tf=False, token_pattern='(?u)\b\w\w+\b',
tokenizer=None, use_idf=True, vocabulary=None)'
The code:
# Before there are some data pre-processing which creates an TfidfVectorizer
# variable called tfidf_vec, labels and some responses.
class Chatbot(nn.Module):
tfidf_vec: TfidfVectorizer()
def __init__(self, tfidf_vec, labels, responses):
super(Chatbot, self).__init__()
self.tfidf_vec = tfidf_vec
self.labels = labels
self.responses = responses
self.lin11 = nn.Linear(len(tfidf_vec.vocabulary_), 50)
self.lin12 = nn.Linear(50, len(labels))
self.sigmoid = nn.Sigmoid()
self.softmax = nn.Softmax(dim=1)
def forward(self, input):
input = self.lin11(input)
input = self.sigmoid(input)
input = self.lin12(input)
input = self.softmax(input)
return input
#@torch.jit.export
def TFIDF_input(self, sentence): #TF-IDF embedd the input based on the amazon corpus/document
sentence = [sentence]
sentence=self.tfidf_vec.transform(sentence)
sentence=sentence.todense()
sentence=torch.from_numpy(np.array(sentence)).type(torch.FloatTensor)
return sentence
@torch.jit.export
def chat(self, input):
input = str(input)
input = input.replace('!', '.')
input = input.replace('?', '.')
sentence_list = input.split('. ')
return_response = ""
for i in range(len(sentence_list)):
tfidf_sentence = self.TFIDF_input(sentence_list[i])
pred = self.forward(tfidf_sentence)
item = torch.argmax(pred).item()
label = self.labels[item]
response = self.responses[label]
response = response[random.randint(0, len(response)-1)]
return_response = return_response + response + ". "
return return_response
network = Chatbot(tfidf_vec, labels, respons)
# Train the network
response = network.chat("Hello. can you help me")
print(response)
script_module = torch.jit.script(network)
script_module.save("model.pt")
Have I overlooked something? Is this even a viable strategy for exporting the network? Should I use a completely different strategy? I haven't found much good resources on how to deploy this kind of NLP networks.
question from:
https://stackoverflow.com/questions/65603202/sklearns-tfidfvectorizer-has-unknown-type-annotation-for-torchscript 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…