Impact of Categorical Feature Encoding on Machine Learning Based Shear Strength Prediction

WOUBISHET ZEWDU; GENDA CHEN

doi:10.12783/shm2025/37535

Impact of Categorical Feature Encoding on Machine Learning Based Shear Strength Prediction

WOUBISHET ZEWDU, GENDA CHEN

Abstract

Recent advancements in machine learning (ML) offer powerful tools for predicting the structural integrity of civil infrastructure. A critical yet often overlooked aspect of ML modeling is data preprocessing, particularly categorical feature encoding. This study examines how different encoding schemes for interface conditions (monolithic, rough, smooth) impact ML predictions of interfacial shear strength. It compares categorical encodings (one-hot and five label encoding variations) with numerical friction coefficients (monolithic = 1.4, rough = 1.0, smooth = 0.6) across four ML models: eXtreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Regression (SVR), and Artificial Neural Network (ANN). Among 28 models, categorical encodings outperform numerical representations in 88% of cases, with ensemble models (RF, XGBoost) proving robust yet RF more sensitive to encoding variations. SVR and ANN exhibit encoding-dependent performance, with some SVR models achieving 12% higher accuracy than numerical- based models. Findings emphasize the crucial role of encoding choices in ML model performance, advocating for adaptive preprocessing techniques to enhance reliability in structural engineering and beyond.

DOI
10.12783/shm2025/37535

Full Text:

PDF

Refbacks

There are currently no refbacks.

Username
Password
Remember me

STRUCTURAL HEALTH MONITORING 2025

Impact of Categorical Feature Encoding on Machine Learning Based Shear Strength Prediction

Abstract

Full Text:

Refbacks