logo

View all jobs

SRE-Data Platform

Toronto, ON

Role Overview

The Data Platform team is looking for an experienced Site Reliability engineer to help us develop a state-of-the-art SRE function. The function will serve as a cornerstone to enable growth, reliability, resiliency, availability and scaling of data onboarding, processing, and analytics. This is a new team which provides the opportunity to innovate and develop a greenfield product within an established and top-notch engineering organization. The ideal candidate will have strong software engineering and SRE skills, a passion for automation, evangelization, mentorship, and an entrepreneurial spirit. 

 

In this role, you will:

·       Help design, develop and evolve highly scalable and reliable infrastructure foundation for the application teams

·       Automating application and infrastructure deployments by developing and maintaining CI/CD pipelines

·       Engineer solutions to significantly reduce the number of issues in production and troubleshoot time-sensitive production issues

·       Perform stress tests, DR tests and enable seamless failovers to prove out production readiness

·       Work closely with the Operations team to resolve issues and identify opportunities to improve customer experience

 

Core Tech Stack

·       Infrastructure: AWS, Azure, Kubernetes and on-prem virtualization platform

·       CI/CD: Jenkins, TeamCity, Octopus, Azure DevOps

·       Infrastructure as Code: Terraform

·       OS: Linux and Windows

·       Monitoring/Logging: CloudWatch, Prometheus, Splunk, Grafana, OpenTelemetry

·       Programming Languages: Python, C++, Rust, JavaScript

 

Minimum Qualifications

·       3+ years of hands-on experience architecting and implementing automation pipelines, monitoring solutions and Infrastructure as Code across an organization

·       3+ years of experience working with immutable infrastructure and automation by using tools like Terraform (or AWS CloudFormation) to deploy complete infrastructure stack

·       3+ years of deploying and managing containerized applications in a multi-tenant Kubernetes environment

·       2+ years of implementing observability patterns/frameworks and using monitoring tools like Prometheus, Grafana, Datadog

·       Strong knowledge of CI/CD tools such as Jenkins, Spinnaker, Azure DevOps

·       Strong knowledge of Chaos Engineering, containerization technologies (Podman) and configuration management tools (Ansible/Chef/Puppet)

 

Preferred Qualifications

·       Experience using managed Kubernetes Platforms like EKS, AKS or GKE

·       Experience with big data systems (e.g. Hadoop, Spark, Snowflake, K8s, etc.)

More Openings

System Admin
Deployment Supervisor
React Native Developer -
Account Manager

Share This Job

Powered by