Lead PySpark Engineer

  • SKILLFINDER INTERNATIONAL
  • Feb 16, 2026
Contractor Telecommunications

Job Description

Skill Profile

  • PySpark - Advanced (P3)

  • AWS - Advanced (P3)

  • SAS - Foundational (P1)

Key Responsibilities Technical Delivery

  • Design, develop, and maintain complex PySpark solutions for ETL/ELT and data mart workloads.

  • Convert and refactor Legacy SAS code into optimized PySpark solutions using automated tooling and manual refactoring techniques.

  • Build scalable, maintainable, and production-ready data pipelines.

  • Modernize Legacy data workflows into cloud-native architectures.

  • Ensure data accuracy, quality, integrity, and reliability across transformation processes.

Cloud & Data Engineering (AWS-Focused)

  • Develop and deploy data pipelines using AWS services such as EMR, Glue, S3, and Athena.

  • Optimize Spark workloads for performance, scalability, partitioning strategy, and cost efficiency.

  • Implement CI/CD pipelines and Git-based version control for automated deployment.

  • Collaborate with architects, engineers, and business stakeholders to deliver high-quality cloud data solutions.

Core Technical Skills PySpark & Data Engineering

  • 5+ years of hands-on PySpark experience (Advanced level).

  • Strong ability to write production-grade, maintainable data engineering code.

  • Solid understanding of:

    • ETL/ELT design patterns

    • Data modelling concepts

    • Fact and dimension modelling

    • Data marts

    • Slowly Changing Dimensions (SCDs)

Spark Performance & Optimization

  • Expertise in Spark execution planning, partitioning strategies, and performance tuning.

  • Experience troubleshooting distributed data pipelines at scale.

Python & Engineering Quality

  • Strong Python programming skills with emphasis on clean, modular, and maintainable code.

  • Experience applying engineering best practices including:

    • Parameterization

    • Configuration management

    • Structured logging

    • Exception handling

    • Modular design principles

SAS & Legacy Analytics (Foundational)

  • Working knowledge of Base SAS, Macros, and DI Studio.

  • Ability to interpret and analyze Legacy SAS code for migration to PySpark.

Data Engineering & Testing

  • Understanding of end-to-end data flows, orchestration frameworks, pipelines, and change data capture (CDC).

  • Experience creating ETL test cases, unit tests, and data comparison/validation frameworks.

Engineering Practices

  • Proficient in Git workflows, branching strategies, pull requests, and code reviews.

  • Ability to document technical decisions, architecture, and data flows.

  • Experience with CI/CD tooling for data engineering pipelines.

AWS & Platform Expertise (Advanced)

Strong hands-on experience with:

  • Amazon S3

  • EMR and AWS Glue

  • Glue Workflows

  • Amazon Athena

  • IAM

  • Solid understanding of distributed computing and big data processing in AWS environments.

  • Experience deploying and operating large-scale data pipelines in the cloud.

Desirable Experience

  • Experience within banking, financial services, or other regulated industries.

  • Background in SAS modernization or cloud migration programs.

  • Familiarity with DevOps practices and infrastructure-as-code tools such as Terraform or CloudFormation.

  • Experience working in Agile or Scrum delivery environments.