# 17-066/III (2017-07-31)

Author(s)
Piyanuch Chaipornkaew, College of Innovative Technology and Engineering, Dhurakij Pundit University, Thailand; Takorn Prexawanprasut, College of Innovative Technology and Engineering, Dhurakij Pundit University, Thailand; Chia-Lin Chang, Department of Applied Economics and Department of Finance, National Chung Hsing University, Taiwan; Michael McAleer, Department of Quantitative Finance, National Tsing Hua University, Taiwan; Discipline of Business Analytics, University of Sydney Business School, Australia; Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, The Netherlands
Keywords:
Email, business data, workflow management system, business transactions.
JEL codes:
J24, O31, O32, O33

One of the most powerful internet communication channels is email. As employees and their clients communicate primarily via email, much crucial business data is conveyed via email content. Where businesses are understandably concerned, they need a sophisticated workflow management system to manage their transactions. A workflow management system should also be able to classify any incoming emails into suitable categories. Previous research has implemented a system to categorize emails based on the words found in email messages. Two parameters affected the accuracy of the program, namely the number of words in a database compared with sample emails, and an acceptable percentage for classifying emails. As the volume of email has become larger and more sophisticated, this research classifies email messages into a larger number of categories and changes a parameter that affects the accuracy of the program. The first parameter, namely the number of words in a database compared with sample emails, remains unchanged, while the second parameter is changed from an acceptable percentage to the number of matching words. The empirical results suggest that the number of words in a database compared with sample emails is 11, and the number of matching words to categorize emails is 7. When these settings are applied to categorize 12,465 emails, the accuracy of this experiment is approximately 65.3%. The optimal number of words that yields high accuracy levels lies between 11 and 13, while the number of matching words lies between 6 and 8.