20 July 2021

Site Reliability Engineer Us Convenience Store Years T989

Who we are?
Imagine working in a place where continuous improvement and innovation is celebrated and rewarded; where fast-paced, high-impact teams come together to positively drive results for one of the largest and most iconic brands in the world. 7-Eleven is a rapidly growing retailer, known for our highly sought-after products, such as Slurpee® and Big Bite®. “Brain Freeze” is a 7-Eleven registered trademark for our 53-year old Slurpee® and with over 70,000 + stores globally (more than any other retailer or food service provider), we sell over 14 million a month. But there’s a lot more to our story and much more left to be written. We are transforming our business, ensuring we are customer obsessed and digitally enabled to seamlessly link our brick and mortar stores with digital products and services. Today we are redefining convenience and the customer experience in big ways.we are fundamentally changing our culture and we want talented, innovative, customer obsessed, and entrepreneurial people like you to come make history with us.
About This Opportunity
· Strong knowledge working with monitoring and observability tools (Preferably New Relic) · Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues. · Effectively work across teams and functions to influence design, operations and deployment of highly available software. · Strong analytical skills in support of production issue resolution and root cause identification. · Develop tools and provide web services to parse application logs, system health information and network captures to help in faster debugging and reduce issue resolution time. · Maintain and enhance monitoring and debugging tools. · Develop tools to automate monitoring that assess and monitor Key Performance Indicators and provide one stop interfaces that indicate health of applications. · Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications and alerting/notification systems.
Required Skills Thriving on autonomy and trust, you’re a team player and an innovator who enjoys the challenge of solving problems. And your skills include: Cues: List of competences and experience required for the role · Must be a self-starter and proactive individual. 3 + years of experience with logging, monitoring, and event detection on cloud or distributed platforms.
- experience with Observability/Monitoring technologies: Splunk, New Relic, Grafana, Kafka, ELK, Prometheus, Cloudwatch.
- At least 5 years of practical experience with most if not all of the following components of AWS: VPC, EC2, ELB, EBS, Route53, S3, Cloud Watch, Cloud Trail, IAM, RDS, SNS, SQS, Lambda.
- experience with high availability and scalability in AWS.

Email: EXPIRED



REPORT
Jobs
goto: Engineering Jobs