I am trying to use GPT2 for Arabic text classification task as follows:
tokenizer = GPT2Tokenizer.from_pretrained(model_path) model = GPT2ForSequenceClassification.from_pretrained(model_path, num_labels=len(lab2ind))
However, when I use the tokenizer it converts the Arabic characters to symbols like this '?ù??aù??±'
'?ù??aù??±'
2.1m questions
2.1m answers
60 comments
57.0k users