Full-Time Senior Site Reliability Engineer
Job Description
About You
- You have a passion for the DevOps movement, managed cloud-based system infrastructure, and systems automation tools
- Your background in designing, building, and operating large systems with varying scalability, availability, and performance requirements gives you confidence in managing our distributed systems at Wildbit
- You see systems work as a way to serve the broader organization and approach conversations and opportunities with curiosity and thoughtfulness
What will you do?
- Maintain and sustain our current infrastructure and participate in the on-call rotation
- Evolve our operations and infrastructure to meet the needs of our customers as well as our team members
- Troubleshoot operational issues ranging from ECS tasks to MySQL replication lag
- Mentor engineers on how to observe and support their software and be an advocate for our continued adoption of cloud native practices
- Define and drive consistency in the developer toolset across different tech stacks, product lines, and product maturity levels
- Partner with Product and Customer Success to educate and advocate for privacy and security improvements in our products
- Long-term system capacity planning and improving overall system resilience
Ideally you have:
- Experience administering Linux servers
- Deep understanding of networking concepts (DNS, routing, load balancing)
- Experience in at least one programming language
- Experience managing production workloads with one or more public cloud providers
- In-depth knowledge of version control systems
- Familiarity with email-specific topics such as SMTP, SPF, DKIM, or DMARC
- Used tools and workflows that support Twelve-Factor app principles
- Experience with various deployment architecture paradigms, such as zero downtime, canary deployments, and CI/CD pipelines
Within 1 month, you’ll:
- Complete your Wildkit, our onboarding process
- Meet with your immediate team members
- Get the lay of the land on our current infrastructure
- Start understanding our products and their underlying infrastructure
- Begin your first project: containerizing and creating a deployment pipeline for a small, existing web app that is used internally by our team
Within 3 months, you’ll:
- Meet team members across the company
- Complete your first project
- Participate in regular on-call rotations and understand how we manage incidents
- Be familiar with our observability stack
- Through conversations with team members, you are beginning to understand current pain points and opportunities
- Be comfortable making changes to production AWS resources using Terraform
Within 6 months, you’ll:
- Be comfortable running incidents and collaborating across teams to pull in the necessary team members to address operational issues
- Own deliverables to improve or streamline parts of our infrastructure and move us closer to more cloud native practices
- Be a contributor to our future vision of what engineering looks like at Wildbit
- Begin understanding our remaining on-premise hardware setup and how to support it
Benefits
- Remote-first team — we optimize for asynchronous communication and creating space for focused-work.
- 4 day / 32 hour work weeks
- 20 paid days off per year
- Paid family leave
- Quarterly profit sharing
- Company-paid retreats
- Flexible work hours
- Home office allowance
- Books & healthy habits allowance
- Professional development allowance
How to Apply
https://jobs.lever.co/wildbit/4f02229d-0d83-455d-b013-d76174d57dff?lever-origin=applied&lever-source%5B%5D=Pink%20Jobs361 total views, 0 today