This position participates as a key team manager within Site Reliability Operations group and directly manages support staff dedicated to 24x7 operational response to one of the largest eCommerce companies in the world. This manager will strive to empower the team to achieve a high level of quality, including continuous improvement in KPI and SLA related targets. A manager in this role must be comfortable with a constantly evolving list of action items and should be comfortable re-prioritizing work effort frequently. This manager should also be capable of assisting and advising on tool improvement, runbook creation and other DevOps functions in partnership with customer teams as well as tool development groups.
- Directly manage a variety of roles dedicated in supporting production availability and response to service impacting events.
- Participate in, lead and drive weekly stand-up meetings to review and supervise improvement stories dedicated to departmental process improvement and enhancements
- Enable resources to resolve and improve workload areas where FTR is not achieved
- Provide training structure and requirements to staff; meet needs of newly on-boarded technologies and tools by ensuring staff readiness
- Assist with the development of runbooks and tool automation with a focus on empowering the team to resolve the majority of incidents and critical issues they own
- Communicate swiftly and effectively across the group
- Analyze the needs of assigned department(s); and establish priorities for systems and/or new monitoring tool / automation implementation
- Hire, terminate, and conduct performance appraisals for assigned staff
- Manage the output and set the strategic direction of assigned function
- Recommend changes to policies, processes or applications that affect immediate organization
- Partner with organizations across the business to drive higher standards and improvements in operational support
- Bachelor's degree in Computer Science, IT, or equivalent work experience
- 5+ year's experience in assigned function, including + years in a managerial role
- Proficient understanding of standard e-commerce platform infrastructure and design, including Microsoft server products, Apache, Linux, load balancing (F5/Netscaler,) database solutions, and networking fundamentals
- Hands on experience with SOX/PCI standards and processes
- Prior end user experience with Agile/Scrum project methodologies and common software applications such as Confluence and JIRA
- Experience with monitoring and alert management solutions such as SCOM, Tivoli (Netcool etc,) Splunk, AppManager and Nagios
- Ability to manage multiple priorities and work on multiple simultaneous activities in a fast pace environment
- Ability to work as a team member on the IT leadership team
- Solid ability to empower team members and relate their function to broader business goals
- Inventive, energetic, and self-confident when dealing with others
- Typically works on problems/issues where implementation of solutions spans 3 to 18 months
- Comfortable with general ITIL standards, with extended focus around incident management processes
Religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
Expedia is committed to creating an inclusive work environment with a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. This employer participates in E-Verify. The employer will provide the Social Security Administration (SSA) and, if necessary, the Department of Homeland Security (DHS) with information from each new employee's I-9 to confirm work authorization.