Insurance fraud detection using extreme gradient boosting and random forest algorithms

dc.contributor.advisorModipa, T. S.
dc.contributor.authorMabunda, Judith Goodness Khanyisa
dc.date.accessioned2024-09-04T13:24:15Z
dc.date.available2024-09-04T13:24:15Z
dc.date.issued2023
dc.descriptionThesis (M.Sc. (e-Science)) -- University of Limpopo, 2023en_US
dc.description.abstractThe rising amount of fraud in claims has been of great concern to the insurance companies. In this research work, we developed two machine learning models namely, Extreme Gradient Boosting (XGBoost) and Random Forest for the purpose of insurance fraud detection based on auto insurance claims data. The models detect fraudulent claims and classify them into fraudulent or non-fraudulent. Different data pre-processing techniques are used to clean, explore, and extract relevant features. The effectiveness of the algorithms are observed using performance evaluation metrics: precision, recall and f1 score and confusion matrix. We also introduced the Synthetic Minority Oversampling (SMOTE) and Random Oversampling (ROS) data augmentation techniques to handle the imbalanced data and compare the results of the models before and after the data is balanced. The comparative results of classification algorithms conclude that the XGBoost model is effective in fraud detection than the Random Forest model on imbalanced data. In addition to this, the Random Forest model was effective in predicting fraudulent claims when the data augmentation techniques were applied.en_US
dc.format.extentvii, 61 leavesen_US
dc.identifier.urihttp://hdl.handle.net/10386/4568
dc.language.isoenen_US
dc.relation.requiresPDFen_US
dc.subjectInsurance fraud detectionen_US
dc.subjectGradient boostingen_US
dc.subjectRandom forest algorithmsen_US
dc.subjectInsurance claimsen_US
dc.subject.lcshArtificial intelligenceen_US
dc.subject.lcshApplication softwareen_US
dc.subject.lcshComputer science -- Congressesen_US
dc.subject.lcshTechnology -- Congressesen_US
dc.subject.lcshInsurance frauden_US
dc.subject.lcshApplication software -- Developmenten_US
dc.titleInsurance fraud detection using extreme gradient boosting and random forest algorithmsen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
mabunda_jgk_2023.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: