GK
AboutSkillsProjectsCertificatesContact
00:00:00
AboutSkillsProjectsCertificatesContact
Crafted with pixels & logic© 2026 Gautham Krishna. All rights reserved.
Predictive Maintenance
Back to Projects

Predictive Maintenance

pythonPythonPySparkSpark MLlibRandom ForestFeature Engineering

An end-to-end distributed machine learning pipeline using PySpark that predicts equipment failures 7 days in advance, processing 2M+ records across 1,900 machines. It engineered 1,150 rolling statistics features and used stratified down-sampling to handle extreme class imbalance for proactive maintenance scheduling.

Unique Features

Engineered 1,150 rolling statistics features (5 windows × 46 features × 5 statistics) for time-series analysis
Implemented over-labeling technique for 7-day advance failure prediction window
Random Forest classifier (100 trees) with Grid search and 3-fold cross-validation
PCA dimensionality reduction (172 → 50 components)
Stratified down-sampling to handle extreme class imbalance (98.5% vs 1.5%)
Processed 1.3 GB / 2M+ records across 1,900 machines over 4 years of data

Tech Stack

LanguagePython
Big DataPySpark 2.0.2, Spark MLlib
MLRandom Forest, PCA, scikit-learn
Infrastructure32-core Linux DSVM (448 GB RAM), Azure Blob Storage, Parquet