This role is for a Healthcare Data QA Engineer focused on testing large-scale data pipelines that transform diverse healthcare data into FHIR R4–compliant resources using tools like PySpark, Flink, and Kafka. You’ll validate both real-time (streaming) and batch processing systems, ensuring accuracy, compliance, and clinical relevance of the data. The position blends big data testing, healthcare domain knowledge, and cloud-based QA automation.
Requirements:
1. Type of Role
Backend Data QA (not UI/web testing)
Specialization in Big Data & Healthcare
Focus on data correctness, compliance, and performance
2. Core Responsibilities
Write and run test strategies for PySpark/Flink data transformations
Validate streaming jobs and batch jobs under different load conditions
Ensure FHIR R4 compliance with domain experts
Automate tests for ingestion, transformation, and output
Perform regression, schema evolution, and data migration testing
Conduct performance/load testing for big data pipelines
3. Must-Have Skills
Big Data & Streaming Testing: PySpark, Flink, Kafka
Healthcare Data Standards: FHIR R4 basics
Python Testing Tools: pytest, unittest, Great Expectations, Pandera
SQL for complex data validation
Familiarity with cloud platforms (AWS/GCP)
Knowledge of QA best practices (CI/CD, Git, TDD)
4. Nice-to-Have
Experience with FHIR/HL7, clinical terminologies (SNOMED, LOINC, ICD)
PyFlink testing, Flink SQL validation
Apache Iceberg or other data lake formats
Knowledge of healthcare compliance (HIPAA, PIPEDA)