For a client we are seeking an experienced Software Engineer to the platform team, focusing on cloud infrastructure, serverless application development, and platform reliability. This role combines hands-on development with technical leadership, requiring expertise in AWS, Python, and modern DevOps practices.
Key Responsibilities
Infrastructure & Reliability
Design and implement multi-region AWS disaster recovery solutions, including fallback infrastructure for us-east-1 outages.
Architect and maintain highly available, scalable cloud infrastructure across multiple AWS regions.
Ensure infrastructure resilience through chaos engineering and disaster recovery testing.
Development
Develop and deploy new features using Python and the Open Source Serverless Framework.
Build and maintain serverless applications (Lambda, API Gateway, DynamoDB, etc.).
Write clean, maintainable, and well-tested code following best practices.
Contribute to architectural decisions and technical design reviews.
Platform Observability
Design and implement comprehensive observability solutions for production platforms.
Set up monitoring, logging, and alerting using tools such as CloudWatch, DataDog, Grafana, or similar.
Establish SLIs, SLOs, and error budgets to measure platform health.
Create dashboards and on-call runbooks for incident response.
CI/CD & Automation
Design, implement, and maintain CI/CD pipelines for automated testing and deployment.
Automate infrastructure provisioning using Infrastructure as Code (Terraform, CloudFormation, CDK).
Implement security scanning, testing, and compliance checks in deployment pipelines.
Optimize build and deployment processes for speed and reliability.
Team Leadership
Mentor and manage development teams, fostering a culture of technical excellence.
Conduct code reviews and provide constructive feedback.
Facilitate technical discussions and help unblock team members.
Collaborate with product and engineering teams to deliver on roadmap priorities.
Required Qualifications
5+ years of software engineering experience with strong Python development skills.
3+ years of hands-on experience with AWS services (EC2, Lambda, S3, RDS, VPC, IAM, CloudFormation, etc.).
Proven experience building and deploying serverless applications (AWS Lambda, API Gateway, Step Functions).
Strong understanding of multi-region architecture and disaster recovery patterns.
Experience designing and implementing CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar).
Demonstrated experience setting up observability and monitoring solutions.
Experience managing or mentoring development teams.
Strong understanding of networking, security, and cloud best practices.
Excellent problem-solving skills and ability to debug complex distributed systems.
Preferred Qualifications
Experience with the Serverless Framework (serverless.com).
AWS certifications (Solutions Architect, DevOps Engineer, or similar).
Experience with Infrastructure as Code tools (Terraform, AWS CDK, CloudFormation).
Knowledge of containerization and orchestration (Docker, ECS, Kubernetes).
Experience with observability platforms (DataDog, New Relic, Grafana, Prometheus).
Familiarity with event-driven architectures and message queuing systems (SQS, SNS, EventBridge).
Experience with testing frameworks and test automation.
Background in Agile/Scrum methodologies.
Strong communication skills and experience working with cross-functional teams.