Method for Evaluating AI's Text Classification Accuracy Unveiled
Improving Text Classifier Accuracy in Automated Conversations
MIT's Laboratory for Information and Decision Systems (LIDS) has developed a novel software approach to enhance the performance of text classifiers used in automated conversations. This method, which involves generating "adversarial examples" to test and improve the classifiers' robustness, has shown significant results in reducing the success rate of adversarial attacks.
The software package, named SP-Attack, systematically tests classifiers for weaknesses against single-word adversarial changes. By identifying vulnerabilities, the team can focus remediation efforts efficiently. In some applications, the success rate of adversarial attacks has been reduced from around 66% to 33.7%, effectively doubling the robustness of the classifiers while maintaining practical usability.
To measure robustness against single-word attacks, LIDS uses a testing framework where sentences are perturbed by changing single words to semantically similar or contextually plausible alternatives without altering the true meaning. This method highlights specific words that trigger errors, focusing evaluation on a manageable subset of vulnerabilities rather than an impossible exhaustive search through all word combinations.
Lei Xu PhD '23, a member of the research team, used estimation techniques to determine the most powerful words that can change classifications. These high-impact words account for nearly half of all classification reversals in certain applications, with just 0.1% of words in the system's vocabulary having this outsized influence.
The research team also developed SP-Defense, a software tool aimed at improving the robustness of the classifier by generating and using adversarial sentences to retrain the model. This approach helps ensure that chatbot responses do not give financial advice or improper responses, which could expose the company to liability.
The team's research results were published in the journal Expert Systems. The software packages, SP-Attack and SP-Defense, are available for free download for anyone who wants to use them. Companies are increasingly using evaluation tools in real-time to monitor the output of chatbots used for various purposes, such as banking and HR issues.
In many thousands of examples, certain specific words were found to have an outsized influence in changing classifications. By focusing on these high-impact words, the team can improve the accuracy of text classifiers and ensure that automated conversations are more reliable and effective.
[1]: [Link to the research paper] [2]: [Link to the SP-Attack software download] [3]: [Link to the SP-Defense software download] [4]: [Link to the MIT LIDS website] [5]: [Link to the journal Expert Systems]
- The study conducted by MIT's Laboratory for Information and Decision Systems (LIDS) on Improving Text Classifier Accuracy in Automated Conversations was published in the journal Expert Systems.
- The research team developed SP-Defense, a software tool that aims to enhance the robustness of text classifiers by generating and using adversarial sentences to retrain the model.
- The software package, SP-Attack, developed by LIDS, tests text classifiers for weaknesses against single-word adversarial changes, playing a significant role in improving their robustness against attacks.
- By focusing on high-impact words that have an outsized influence in changing classifications, the team can improve the accuracy of text classifiers and ensure that automated conversations are more reliable and effective.
- Companies are increasing their use of evaluation tools in real-time to monitor the output of chatbots used for various purposes such as banking and HR issues, and the SP-Attack and SP-Defense software packages are available for free download to any interested user.