Pipeline Development: Design, build, and optimize scalable data ingestion pipelines to extract, transform, and load (ETL) data from source systems (Oracle, SQL MI, MongoDB, file feeds) into Databricks.
Medallion Architecture Implementation: Lead the adoption and implementation of Medallion architecture (Bronze, Silver, Gold layers) for data processing and reporting in Databricks.
Data Modeling on Silver Layer: Develop and maintain effective data models within the Silver layer to support standardized, cleansed, and business-ready datasets. Ensure that data relationships, normalization, and transformation logic are well-defined to facilitate downstream analytics and reporting.
Testing & Validation: Develop automated tests for data pipelines and transformations to ensure reliability and correctness.
Data Quality & Governance: Ensure data quality, integrity, and consistency throughout the ingestion and transformation processes. Implement data validation, cleansing, and governance best practices.
Collaboration: Work closely with data analysts, data scientists, and business stakeholders to understand requirements and deliver effective data solutions.
Automation & Monitoring: Automate data pipeline workflows, monitor performance, and proactively resolve issues to ensure high reliability and availability.
Documentation: Develop and maintain comprehensive documentation for pipelines, data flows, and architecture decisions.
Qualifications
Education: Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field.
Experience: 5 years of hands-on experience in data engineering roles, with a focus on ETL pipeline development and cloud data platforms.
Technical Skills:
Strong proficiency in SQL and Python or Scala
Experience with Databricks, Azure Data Factory, Spark, and Medallion architecture
Proven expertise in database technologies: Oracle, SQL MI, MongoDB
Familiarity with file-based ingestion (CSV, Parquet, JSON, etc.)
Expertise in data modeling, especially within the Silver layer of the Medallion architecture. Understanding of data warehousing and reporting concepts
Preferred:
Experience with cloud platforms (Azure preferred)
Knowledge of CI/CD tools and practices
Exposure to data governance and security frameworks
Core Competencies:
Analytical thinking and problem-solving
Strong communication and stakeholder management
Ability to work independently and lead initiatives
Attention to detail and commitment to data quality