Lead SRE Engineer
Company: Cognizant
Location: Plano
Posted on: October 23, 2024
Job Description:
About Cognizant's Digital Engineering Practice: At Cognizant
Digital Engineering, a small cross functional team comprised of a
Product Manager, an Architect, Full-Stack Developers, UI/UX
designers and Big Data analysts builds higher quality software
faster siloed individuals working independently. Small, nimble
engineering teams generate collective empathy and comradery, thus
increasing their ability to anticipate unforeseen development scope
changes and maintain high quality deliverables. Across our US
Studio system or within client development sites, our Digital
Engineering teams ideate and develop innovative cloud-based
solutions following a Lean-Agile process with DevOps culture.
Working in Cognizant Digital Engineering provides DevOps engineers
consistent opportunities to push digital boundaries while growing
their exposure to transformational technologies. The Role:
Cognizant is looking for an experienced and innovative Lead SRE
Engineer experience to serve our diverse base of global clients. As
a member of our team, you will build cutting edge, cloud-based
software that powers modern business. An ideal candidate is someone
who enjoys working in a diverse, collaborative, geographically
distributed team. Similarly, the ideal candidate is an expert
engineer who values the "team", drives continuous improvement and
is unafraid to challenge the legacy status quo with creative
cloud-based solutions. Location: Plano, Texas Responsibilities:
- Should be strong SRE, experience with java, AWS / DevOps /
deployment strategy and monitoring tools. Candidates should have
hands-on experience with Dynatrace / Splunk / CICD / Grafana
etc.
- Should possess application trouble shooting experience. More on
core SRE metrics before going to Prod. uptime vs availability,
monitoring vs Observability, and incident and outage etc.
- Should be familiar with SLO, SLA, SLI or other SRE keywords or
terms.
- Experience with deploying using CICD pipeline and
debugging/troubleshooting issues and coordinate with the
application team such as Java, Spring Boot, Python, .Net, etc.
- Ability to perform API performance testing using tools such as
JMeter / Blazemeter.
- Experience on identifying RCA for any production issues on AWS
environment with multiple microservices.
- Expertise in Terraform to manage infrastructure as code would
be highly desirable. Troubleshoot and resolve technical issues to
ensure smooth operation of applications
- Demonstrates and champions site reliability culture and
practices and exerts technical influence throughout your team.
- Leads initiatives to improve the reliability and stability of
your team's applications and platforms using data-driven analytics
to improve service levels.
- Collaborates with team members to identify comprehensive
service level indicators and stakeholders to establish reasonable
service level objectives and error budgets with customers.
- Demonstrates a high level of technical expertise within one or
more technical domains and proactively identifies and solves
technology-related bottlenecks in your areas of expertise.
- Acts as the main point of contact during major incidents for
your application and demonstrates the skills to identify and solve
issues quickly to avoid financial losses.
- Documents and shares knowledge within your organization via
internal forums and communities Required Skill:
- 8+ years of relevant work experience
- Deep proficiency in reliability, scalability, performance,
security, enterprise system architecture, toil reduction, and other
site reliability best practices with the ability to implement these
practices within an application or platform.
- Fluency in JAVA programming.
- Proficiency and experience in observability such as white and
black box monitoring, SLO alerting, and telemetry collection using
tools such as Splunk, Grafana, Dynatrace, Prometheus, Datadog.
- Proficiency in continuous integration and continuous delivery
tools (e.g., Jenkins, GitLab, Terraform, etc.)
- Experience with container and container orchestration (e.g.,
ECS, Kubernetes, Docker) Preferred qualifications, capabilities,
and skills.
- Experience with infrastructure as code tools such as Terraform.
also experience managing/supporting Cloud based applications, AWS
preferred.
- Excellent communications desired. Benefits: Cognizant offers
the following benefits for this position, subject to applicable
eligibility requirements: - Medical/Dental/Vision/Life Insurance -
Paid holidays plus Paid Time Off - 401(k) plan and contributions -
Long-term/Short-term Disability - Paid Parental Leave - Employee
Stock Purchase Plan Disclaimer: The salary, other compensation, and
benefits information is accurate as of the date of this posting.
Cognizant reserves the right to modify this information at any
time, subject to applicable law.
Keywords: Cognizant, Grapevine , Lead SRE Engineer, Engineering , Plano, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...