Gagan Goyal

Assistant Manager · Data Engineering · WNS Analytics

I architect scalable data platforms and lead end-to-end data engineering initiatives across AWS, Azure, and GCP — turning raw, messy data into reliable, production-grade pipelines.

AWS Azure GCP PySpark Delta Lake Databricks Redshift BigQuery ETL/ELT Data Lakehouse

Get in Touch View Experience →

Technical Skills

Multi-cloud data engineering across the full stack — from ingestion to insight.

☁️

AWS

GlueRedshift Step FunctionsEventBridge S3

🔷

Azure

Data FactoryDatabricks ADLS Gen2Cosmos DB Azure SQLAzure DevOps

🟢

GCP

BigQueryGoogle Cloud Storage

⚡

Big Data & Processing

PySparkApache Spark HadoopDelta Lake Medallion ArchitectureData Lakehouse

🗄️

Databases

SQL ServerRedshift BigQueryCosmos DB Azure SQL

🛠️

Tools & Practices

PythonSQL Power BIGit Data GovernanceSchema Evolution CDCNiFi

Experience

Building data infrastructure that businesses depend on.

JUN 2023 – PRESENT

Assistant Manager – Data Engineering

WNS Analytics · Noida, India

Architected a centralized Data Quality Framework (PySpark + Redshift), reducing inconsistencies by 40–50%
Delivered 25–35% ETL performance gains through PySpark optimization and query tuning
Automated workflows via AWS Step Functions & EventBridge, eliminating 70%+ manual effort
Maintained 95%+ pipeline success rate processing 100 GB–1 TB+ of data daily

AWSPySpark RedshiftGlue DatabricksStep Functions

OCT 2021 – MAY 2023

Sr. Data Engineer

Concentrix · Noida, India

Designed ADF pipelines for multi-layer ETL (landing → staging → ODS) with centralized orchestration
Migrated 8+ on-premises SQL Server databases to Azure SQL using ADLS Gen2 staging
Converted Hadoop (Jython/Flume) pipelines to Databricks, improving efficiency by 40%

ADFAzure Databricks ADLS Gen2Cosmos DB PySpark

OCT 2019 – SEP 2021

Data Engineer

Mindtree (now LTIMindtree)

Built PySpark + ADF pipelines for large-scale ingestion and Delta Lake loading
Improved processing efficiency 30–40% via Delta and SQL table performance tuning
Designed Delta Lake data models and built reusable validation frameworks

PySparkDelta Lake ADFPower BI

Key Projects

A selection of high-impact data engineering initiatives.

Enterprise Data Quality Framework

Designed and implemented a centralized DQ framework across multiple business domains — covering validation, accuracy, consistency, and reliability for downstream analytics.

⬇ 40–50% fewer data inconsistencies

PySparkRedshiftAWS GlueStep Functions

Unified Customer 360° View

Engineered online-to-offline transaction linking logic with loyalty mapping and CSAT alignment pipelines — giving the business a complete, analytics-ready customer view.

⚡ Unified view across 3+ source systems

PySparkRedshiftEventBridge

Global Canvas – MDR Migration

Reverse-engineered 20+ undocumented Salesforce datasets and migrated to Databricks medallion architecture. Rebuilt Power BI dashboards with automated monitoring.

⬆ 20+ datasets migrated end-to-end

DatabricksDelta LakeSalesforcePower BI

Hospitality Bundle Analytics

Built pipelines to identify co-purchased service bundles and decrypt complex reservation JSON payloads. Extended data models with new staging layers for analytical scale.

🔐 Complex JSON payload decryption & normalization

BigQueryPythonGCP

Hadoop → Azure Cloud Migration

Led full migration from Hadoop/Jython/Flume to Azure Databricks and ADF micro-batch architecture. Rebuilt ETL stages, Cosmos DB insertion logic, and Tableau → Databricks reports.

⬆ 40% processing efficiency gain

DatabricksADFCosmos DBADLS Gen2

Abacus Legacy SQL Migration

Migrated IBM Abacus SQL scripts to Databricks PySpark. Rebuilt gold tables with validation frameworks and updated Power BI report connections for production cutover.

✅ Zero-defect production cutover

PySparkDatabricksADLS Gen2Power BI

Gagan Goyal

Technical Skills

Experience

Key Projects

Open to Opportunities