https://www.research-collection.ethz.ch/handle/20.500.11850/508469
Abstract
An ever-increasing amount of text, in the form of social media posts and news articles, gives rise to new challenges and opportunities for the automatic extraction of socio-political events. In this paper, we present our submission(1) to the Shared Tasks on Socio-Political and Crisis Events Detection, Task 1, Multilingual Protest News Detection, Subtask 2, Event Sentence Classification, of CASE @ ACL-IJCNLP 2021. In our submission, we utilize the RoBERTa model with additional pretraining, and achieve the best F1 score of 0:8532 in event sentence classification in English and the second-best F1 score of 0:8700 in Portuguese via simple translation. We analyze the failure cases of our model. We also conduct an ablation study to show the effect of choosing the right pretrained language model, adding additional training data and data augmentation.
Finetuning
Our model with the second pretrain-ing strategy achieves a 0.8395 F1 score on the validation set of this subtask. On the evaluation server, we achieve the best performance among all submissions of the shared task with an F1 score of 0.8532 on the testing set. Since our focus is on the English version of the event sentence classification task, we translate the event sentences of other languages into English using Argos Translate (Finlay, 2021). This simple method achieves the second best F1 score of 0.8670 in Portuguese.
5.2 Additional Training Data
In this subsection, we explore adding data from other languages of Subtask 2 as well as from other subtasks. When we add data not originally in English, we translate the sentences into English using Argos Translate (Finlay, 2021). We present the result in Table 4. For example, “Sub1 ES&PT+” means Spanish and Portuguese data from Subtask 1 with positive labels. While some settings result in better performance, when we use them in conjunction with second pretraining, the performance gain disappears. Thus, we do not include any additional training data in training our model for final submission.
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
References
P.J. Finlay. 2021. Argos translate. Open source neural
machine translation software
Permanent link
https://doi.org/10.3929/ethz-b-000508469
Publication status
published
External links
Editor
Date
2021