Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

machine learning - sklearn's TfidfVectorizer has unknown type annotation for TorchScript

I am trying to export my Pytorch network using TorchScript, since that seemed like the most straight forward method to deploy a trained network (only for inference, no more training). However, the network uses TfidfVectorizer which generates the following error.

ValueError: Unknown type annotation: 'TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.float64'>, encoding='utf-8',
                input='content', lowercase=True, max_df=1.0, max_features=None,
                min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
                smooth_idf=True, stop_words=None, strip_accents=None,
                sublinear_tf=False, token_pattern='(?u)\b\w\w+\b',
                tokenizer=None, use_idf=True, vocabulary=None)'

The code:

# Before there are some data pre-processing which creates an TfidfVectorizer
# variable called tfidf_vec, labels and some responses.

class Chatbot(nn.Module):
  tfidf_vec: TfidfVectorizer()

  def __init__(self, tfidf_vec, labels, responses):
    super(Chatbot, self).__init__()
    self.tfidf_vec = tfidf_vec
    self.labels = labels
    self.responses = responses

    self.lin11 = nn.Linear(len(tfidf_vec.vocabulary_), 50)
    self.lin12 = nn.Linear(50, len(labels))
    self.sigmoid = nn.Sigmoid()
    self.softmax = nn.Softmax(dim=1)
  
  def forward(self, input):
    input = self.lin11(input)
    input = self.sigmoid(input)
    input = self.lin12(input)
    input = self.softmax(input)
    return input

  #@torch.jit.export
  def TFIDF_input(self, sentence): #TF-IDF embedd the input based on the amazon corpus/document
    sentence = [sentence]
    sentence=self.tfidf_vec.transform(sentence)
    sentence=sentence.todense()   
    sentence=torch.from_numpy(np.array(sentence)).type(torch.FloatTensor)
    return sentence

  @torch.jit.export
  def chat(self, input):
    input = str(input)
    input = input.replace('!', '.')
    input = input.replace('?', '.')
    sentence_list = input.split('. ')
    return_response = ""
    for i in range(len(sentence_list)):
        tfidf_sentence = self.TFIDF_input(sentence_list[i])
        pred = self.forward(tfidf_sentence)
        item = torch.argmax(pred).item()
        label = self.labels[item]
        response = self.responses[label]
        response = response[random.randint(0, len(response)-1)]
        return_response = return_response + response + ". "
    return return_response

network = Chatbot(tfidf_vec, labels, respons)
# Train the network
response = network.chat("Hello. can you help me")
print(response)

script_module = torch.jit.script(network)
script_module.save("model.pt")

Have I overlooked something? Is this even a viable strategy for exporting the network? Should I use a completely different strategy? I haven't found much good resources on how to deploy this kind of NLP networks.

question from:https://stackoverflow.com/questions/65603202/sklearns-tfidfvectorizer-has-unknown-type-annotation-for-torchscript

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...