A selection of high-impact data engineering initiatives.
01
Enterprise Data Quality Framework
Designed and implemented a centralized DQ framework across multiple business domains — covering validation, accuracy, consistency, and reliability for downstream analytics.
⬇ 40–50% fewer data inconsistencies
PySparkRedshiftAWS GlueStep Functions
02
Unified Customer 360° View
Engineered online-to-offline transaction linking logic with loyalty mapping and CSAT alignment pipelines — giving the business a complete, analytics-ready customer view.
⚡ Unified view across 3+ source systems
PySparkRedshiftEventBridge
03
Global Canvas – MDR Migration
Reverse-engineered 20+ undocumented Salesforce datasets and migrated to Databricks medallion architecture. Rebuilt Power BI dashboards with automated monitoring.
⬆ 20+ datasets migrated end-to-end
DatabricksDelta LakeSalesforcePower BI
04
Hospitality Bundle Analytics
Built pipelines to identify co-purchased service bundles and decrypt complex reservation JSON payloads. Extended data models with new staging layers for analytical scale.
🔐 Complex JSON payload decryption & normalization
BigQueryPythonGCP
05
Hadoop → Azure Cloud Migration
Led full migration from Hadoop/Jython/Flume to Azure Databricks and ADF micro-batch architecture. Rebuilt ETL stages, Cosmos DB insertion logic, and Tableau → Databricks reports.
⬆ 40% processing efficiency gain
DatabricksADFCosmos DBADLS Gen2
06
Abacus Legacy SQL Migration
Migrated IBM Abacus SQL scripts to Databricks PySpark. Rebuilt gold tables with validation frameworks and updated Power BI report connections for production cutover.
✅ Zero-defect production cutover
PySparkDatabricksADLS Gen2Power BI