Skill Profile
PySpark - Advanced (P3)
AWS - Advanced (P3)
SAS - Foundational (P1)
Key Responsibilities Technical Delivery
Design, develop, and maintain complex PySpark solutions for ETL/ELT and data mart workloads.
Convert and refactor Legacy SAS code into optimized PySpark solutions using automated tooling and manual refactoring techniques.
Build scalable, maintainable, and production-ready data pipelines.
Modernize Legacy data workflows into cloud-native architectures.
Ensure data accuracy, quality, integrity, and reliability across transformation processes.
Cloud & Data Engineering (AWS-Focused)
Develop and deploy data pipelines using AWS services such as EMR, Glue, S3, and Athena.
Optimize Spark workloads for performance, scalability, partitioning strategy, and cost efficiency.
Implement CI/CD pipelines and Git-based version control for automated deployment.
Collaborate with architects, engineers, and business stakeholders to deliver high-quality cloud data solutions.
Core Technical Skills PySpark & Data Engineering
5+ years of hands-on PySpark experience (Advanced level).
Strong ability to write production-grade, maintainable data engineering code.
Solid understanding of:
ETL/ELT design patterns
Data modelling concepts
Fact and dimension modelling
Data marts
Slowly Changing Dimensions (SCDs)
Spark Performance & Optimization
Expertise in Spark execution planning, partitioning strategies, and performance tuning.
Experience troubleshooting distributed data pipelines at scale.
Python & Engineering Quality
Strong Python programming skills with emphasis on clean, modular, and maintainable code.
Experience applying engineering best practices including:
Parameterization
Configuration management
Structured logging
Exception handling
Modular design principles
SAS & Legacy Analytics (Foundational)
Working knowledge of Base SAS, Macros, and DI Studio.
Ability to interpret and analyze Legacy SAS code for migration to PySpark.
Data Engineering & Testing
Understanding of end-to-end data flows, orchestration frameworks, pipelines, and change data capture (CDC).
Experience creating ETL test cases, unit tests, and data comparison/validation frameworks.
Engineering Practices
Proficient in Git workflows, branching strategies, pull requests, and code reviews.
Ability to document technical decisions, architecture, and data flows.
Experience with CI/CD tooling for data engineering pipelines.
AWS & Platform Expertise (Advanced)
Strong hands-on experience with:
Amazon S3
EMR and AWS Glue
Glue Workflows
Amazon Athena
IAM
Solid understanding of distributed computing and big data processing in AWS environments.
Experience deploying and operating large-scale data pipelines in the cloud.
Desirable Experience
Experience within banking, financial services, or other regulated industries.
Background in SAS modernization or cloud migration programs.
Familiarity with DevOps practices and infrastructure-as-code tools such as Terraform or CloudFormation.
Experience working in Agile or Scrum delivery environments.