Gagan Goyal

Assistant Manager · Data Engineering · WNS Analytics

I architect scalable data platforms and lead end-to-end data engineering initiatives across AWS, Azure, and GCP — turning raw, messy data into reliable, production-grade pipelines.

AWS Azure GCP PySpark Delta Lake Databricks Redshift BigQuery ETL/ELT Data Lakehouse
7+
Years Experience
40%
Data Inconsistency Reduction
1TB+
Daily Data Processed
95%
Pipeline Success Rate

Technical Skills

Multi-cloud data engineering across the full stack — from ingestion to insight.

☁️
AWS
GlueRedshift Step FunctionsEventBridge S3
🔷
Azure
Data FactoryDatabricks ADLS Gen2Cosmos DB Azure SQLAzure DevOps
🟢
GCP
BigQueryGoogle Cloud Storage
Big Data & Processing
PySparkApache Spark HadoopDelta Lake Medallion ArchitectureData Lakehouse
🗄️
Databases
SQL ServerRedshift BigQueryCosmos DB Azure SQL
🛠️
Tools & Practices
PythonSQL Power BIGit Data GovernanceSchema Evolution CDCNiFi

Experience

Building data infrastructure that businesses depend on.

JUN 2023 – PRESENT
Assistant Manager – Data Engineering
WNS Analytics · Noida, India
  • Architected a centralized Data Quality Framework (PySpark + Redshift), reducing inconsistencies by 40–50%
  • Delivered 25–35% ETL performance gains through PySpark optimization and query tuning
  • Automated workflows via AWS Step Functions & EventBridge, eliminating 70%+ manual effort
  • Maintained 95%+ pipeline success rate processing 100 GB–1 TB+ of data daily
AWSPySpark RedshiftGlue DatabricksStep Functions
OCT 2021 – MAY 2023
Sr. Data Engineer
Concentrix · Noida, India
  • Designed ADF pipelines for multi-layer ETL (landing → staging → ODS) with centralized orchestration
  • Migrated 8+ on-premises SQL Server databases to Azure SQL using ADLS Gen2 staging
  • Converted Hadoop (Jython/Flume) pipelines to Databricks, improving efficiency by 40%
ADFAzure Databricks ADLS Gen2Cosmos DB PySpark
OCT 2019 – SEP 2021
Data Engineer
Mindtree (now LTIMindtree)
  • Built PySpark + ADF pipelines for large-scale ingestion and Delta Lake loading
  • Improved processing efficiency 30–40% via Delta and SQL table performance tuning
  • Designed Delta Lake data models and built reusable validation frameworks
PySparkDelta Lake ADFPower BI

Key Projects

A selection of high-impact data engineering initiatives.

01
Enterprise Data Quality Framework
Designed and implemented a centralized DQ framework across multiple business domains — covering validation, accuracy, consistency, and reliability for downstream analytics.
⬇ 40–50% fewer data inconsistencies
PySparkRedshiftAWS GlueStep Functions
02
Unified Customer 360° View
Engineered online-to-offline transaction linking logic with loyalty mapping and CSAT alignment pipelines — giving the business a complete, analytics-ready customer view.
⚡ Unified view across 3+ source systems
PySparkRedshiftEventBridge
03
Global Canvas – MDR Migration
Reverse-engineered 20+ undocumented Salesforce datasets and migrated to Databricks medallion architecture. Rebuilt Power BI dashboards with automated monitoring.
⬆ 20+ datasets migrated end-to-end
DatabricksDelta LakeSalesforcePower BI
04
Hospitality Bundle Analytics
Built pipelines to identify co-purchased service bundles and decrypt complex reservation JSON payloads. Extended data models with new staging layers for analytical scale.
🔐 Complex JSON payload decryption & normalization
BigQueryPythonGCP
05
Hadoop → Azure Cloud Migration
Led full migration from Hadoop/Jython/Flume to Azure Databricks and ADF micro-batch architecture. Rebuilt ETL stages, Cosmos DB insertion logic, and Tableau → Databricks reports.
⬆ 40% processing efficiency gain
DatabricksADFCosmos DBADLS Gen2
06
Abacus Legacy SQL Migration
Migrated IBM Abacus SQL scripts to Databricks PySpark. Rebuilt gold tables with validation frameworks and updated Power BI report connections for production cutover.
✅ Zero-defect production cutover
PySparkDatabricksADLS Gen2Power BI

Open to Opportunities

Looking for Senior Data Engineer, Lead Data Engineer, or Data Architect roles. US B1/B2 Visa valid until 2028. Based in Noida — open to remote and hybrid.