Our Platform Site Reliability team is expanding. This team will help to configure, deploy, monitor, and handle ongoing operations of SPS Commerce's networks.
Does this sound like you?
- You love problem solving and collaborating across technical teams, including application engineering, support and infrastructure. You love to roll up your sleeves and jump into the fray when issues arise.
- You have basic understanding of the technical foundation of how applications are built and supported.
- You work well under pressure - when things go awry, you can work with a sense of urgency without losing your cool. You are adept at assessing and evaluating potential issues and isolating the likley key issues.
We solve retail supply chain problems by cutting through inefficiency with innovation and automation. At SPS we empower retailers, suppliers, distributors, grocers and logistics partners to work better together with our people, our process and our tech products.
We have the world’s largest retail network, and we don’t just lead the industry, we are the industry. At SPS, we believe every employee makes a difference. We ensure employees have the tools, resources and training to explore new ideas and execute them. Our success comes from playing as a team and always playing to win. Careers don’t just grow here, they’re made here.
The Day To Day
The Associate Site Reliability Engineer collaborates with the product development teams to deliver market leading products and services. The Associate SRE is the first point of contact for support related issues for the SRE team. This position will use automation and other technologies to intelligently cope with challenging failures while collaborating with various engineering organizations to resolve failure risks at the source. The SRE team at SPS approaches Operations as a software problem and aims to apply software engineering approaches to those problems.
- Help to support and maintain highly available, secure, and cost-effective container orchestration platforms such as Kubernetes and ECS
- Support our product engineering partners to build and deploy their services
- Support robust monitoring and observability services and patterns to consistently improve the team’s ability to identify, react, respond, and recover from complex failures
- Partner with service teams to support and optimize application services in a high velocity environment
- Write clean and correct code, write test plans and identify code quality improvements when reviewing code
What Skills do I need?
- Bachelor’s degree AND 1 year of relevant experience or 4 years of experience without a degree
- Aptitude and aspirations to acquire experience and grow as a technologist
- Basic coding experience with interest in Python and/or Golang preferred
- Understanding of Linux operating systems and networking systems
- Experience working with Agile development methodology and task execution
- Experience with Amazon Web Services, Google Cloud, Azure Cloud
- Experience with monitoring solutions such as metrics platforms, logging, distributed tracing, or similar
- Experience in Python and/or Golang with software engineering mindset
- Exposure to Linux
- Experience with immutable and scalable infrastructure (infrastructure as code concepts)
- Basic understanding of networking systems, various identity and authorization systems
- Problem solving and collaboration skills
EOE including veterans/ disability