Senior Site Reliability Engineer

Posted by Carnegie Learning Human Resources

Company Details

Carnegie Learning Inc

Pittsburgh, PA

Remote Ok

FTE only

Description

WHAT WE SEEK

We are looking for a Site Reliability Engineer (SRE) to join our growing team to ensure that the systems that our students and teacher rely on daily are available, reliable, secure, scalable, and satisfying. We apply engineering disciplines to improve user satisfaction and prevent crises, while also responding to the inevitable error. The existing team has a broad base of collective experience, so we can accommodate a range of experience levels for this position. We seek to automate the mundane tasks, so we can research and implement exciting tools and technologies.

WHAT YOUR DAY WILL LOOK LIKE

  • Define, operate, and refine processes for continuous integration and deployment of application software.

  • Monitor and respond to warnings and alerts from our production application servers and supporting infrastructure.

  • Manage and interpret application data and logs to assist customer support teams with escalations to development.

  • Work with development to recommend and implement strategies to improve application and service availability, performance, and cost.

  • Design and implement mechanisms for proactive monitoring, alerting, trend-analysis and self-healing.

  • Support the definition of non-functional requirements as part of the product life cycle.

  • Assist software developers with fluent use of our development tools.

  • Identify opportunities to improve DevOps processes and collaborate with the team for solutions.

  • Collaborate with development and QA teams to design, maintain and support internal development and test environments.

  • Help define, measure, and report on SLIs and SLOs, drive organization to meet SLOs, and support the ability of the company to provide its customers with SLAs.

  • Participate in post-incident reviews to better expose system or process gaps.

  • Document procedures and site infrastructure.

  • Support limited on-call duties on rotating weekends.

WHAT SHOULD BE IN YOUR BOOKBAG

  • Significant experience ensuring reliability of SaaS solutions within Amazon Web Services.

  • Experience with configuration management, infrastructure as code tools, and orchestration (e.g., Chef, OpsWorks, Puppet, Ansible, and Terraform).

  • Experience with techniques/software/platforms for High Availability, Load Balancing and Content Delivery.

  • Excellent spoken and written communication skills.

  • Demonstrated customer service orientation with strong diagnostic and problem-solving skills.

  • Experience with CI/CD concepts and IaaS for cloud native applications.

  • Understanding of concepts of latency, availability, performance, and service level objectives.

  • Strong desire and experience with increasing the speed and reliability of software deployments while reducing the complexity and customer interruptions so we can release anything, anytime.

  • Experience collecting, analyzing and utilizing system logs/metrics and other operational data to drive infrastructure and application insights.

  • Experience with data management and visualization tools (e.g., Prometheus, Graphana, Splunk, and ELK).

  • Excellent knowledge of a scripting language (e.g., Python, Bash, and Powershell.).

  • Experience with software development and CI/CD pipeline and tools (e.g., Bamboo, Jenkins, Azure DevOps, Git, and Jira).

  • Experience with web application and/or JavaScript frameworks/Languages (e.g., Laravel, JavaScript, TypeScript, Angular, and Node.js).

  • Familiarity with Application Performance Management tools for troubleshooting and capacity planning (e.g., NewRelic, Datadog, and AppDynamics)

  • Eligible to work in the US without visa sponsorship.

  • Bonus points for experience with:

    • DevSecOps, including integrating code analysis and vulnerability scanning tools into the CI/CD pipeline.
    • Containers and Container orchestration (e.g., ECS, EKS, Fargate, Kubernetes, and Docker Swarm).
    • Working within or implementing cybersecurity and regulatory frameworks (e.g., NIST, ISO, HIPAA, and COBIT.)

WHAT GIVES US PURPOSE

Carnegie Learning is a leading provider of K-12 education technology, curriculum, and professional learning solutions. With the highest quality, research-based offerings for K-12 math, ELA, world languages, and more, Carnegie Learning is changing the way we think about learning and creating powerful results for teachers and students alike. At Carnegie Learning we strive to create an environment where people want to work - one where the larger team comes first, where trying new things (and sometimes failing) is encouraged, and where we pursue our mission relentlessly.

Carnegie Learning is a major disruptive force in the digital curriculum market by combining world-class research, differentiated technology, best in class content together with a world-class mission-oriented team. This is where you come in! Are you ready to do the best work of your career and shape the future of learning?

WHAT WE PROVIDE

  • Named a Pittsburgh Top Workplace four years in a row

  • Medical, dental, and vision benefits

  • Virtual health services

  • Basic life and disability insurance offered at no cost

  • HSA, FSA, DCSA, and Commuter saving accounts

  • 401k with company match

  • Employee assistance program

  • Pet and Legal services insurance

  • Generous paid time off and holidays

  • Variable compensation opportunities

  • Business casual work environment

  • Mission-driven culture

  • Flexible working hours, leveraging remote capabilities

WHAT WE BELIEVE

We respect and celebrate the unique attributes, characteristics, and perspectives that make each person who they are. We also believe that bringing diverse individuals together allows us to collectively and more effectively address the issues that face our business and industry. Carnegie Learning is an Equal Opportunity Employer.

How to Apply

Please log in or sign up to view this posting's application instructions.