JD
Key Responsibilities
• Design, develop and optimize scalable data pipelines and ETL workflows using Google Cloud Platform (GCP), particularly leveraging BigQuery, Dataflow, Dataproc and Pub/Sub.
• Design and manage secure, efficient data integrations involving Snowflake and BigQuery.
• Write, test and maintain high-quality Python code for data extraction, transformation and loading (ETL), analytics and automation tasks.
• Use Git for collaborative version control, code reviews and managing data engineering projects.
• Implement infrastructure-as-code practices using Pulumi for cloud resources management and automation within GCP environments.
• Apply clean room techniques to design and maintain secure data sharing environments in alignment with privacy standards and client requirements.
• Collaborate with cross-functional teams (data scientists, business analysts, product teams) to deliver data solutions, troubleshoot issues and assure data integrity throughout the lifecycle.
• Optimize performance of batch and streaming data pipelines, ensuring reliability and scalability.
• Maintain documentation on processes, data flows and configurations for operational transparency.
________________________________________
Required Skills
• Strong hands-on experience with GCP core data services: BigQuery, Dataflow, Dataproc and Pub/Sub.
• Proficiency in data engineering development using Python.
• Deep familiarity with Snowflake—data modeling, secure data sharing and advanced query optimization.
• Proven experience with Git for source code management and collaborative development.
• Demonstrated ability using Pulumi (or similar IaC tools) for deployment and support of cloud infrastructure.
• Practical understanding of cleanroom concepts in cloud data warehousing, including privacy/compliance considerations.
• Solid skills in debugging complex issues within data pipelines and cloud environments.
• Effective communication and documentation skills.