Contract Position Summary
•Supporting and monitoring a 24/7 high-quality video streaming service in a fast paced startup-like environment.
•Participation in oncall rotation, carrying oncall phone.
•Promptly responding to alerts and other issues raised by customers or internal business units.
•Promptly notifying relevant parties and acknowledging alerts.
•Promptly working with vendors, third party service providers (Content delivery networks, payment gateways, content licensing partners etc.) and internal teams to solve production issues.
•Collecting approvals and running software rollouts post-approvals.
•Working on day-to-day tasks like managing user access and supporting internal business units.
•Troubleshooting and fixing infrastructure issues from hardware layer to application layers with no or minimal supervision.
•Working on tailoring monitoring and alerting systems to avoid false positives and update alert related settings.
•Reviewing alerts, emails, updating relevant teams in meetings regarding issues and outages.
Roles & Responsibilities
•Knowledge or at least familiarity with Linux shell, system internals, network, java applications and MySQL databases.
•Love for debugging: troubleshooting and debugging monitoring alerts and internet connectivity issues.
•Ability to read and understand server/systems logs and produce meaningful issue analyses.
•Good analytics, troubleshooting skills and intuition about probable root cause.
•Familiarity with Splunk, Python, Apache, rsync and monitoring/alerting tools like Nagios, PagerDuty, OpsGenie will be a plus.
Job Title: Sr. Site Reliability – Senior Application Software Engineer Recruiter Note: Thanks to taking a look at this Sr....Apply For This Job