Héctor Bautista Flores

SRE Lead @ FICO · Building AI-augmented operations with Claude Code · AWS · Kubernetes · Grafana

📍 Aguascalientes, Mexico · 📩 hbautista [at] gmail.com · hbautista.com · LinkedIn

Summary

Site Reliability Engineer Lead with 10+ years owning production reliability across AWS and Azure environments at enterprise scale. Experienced driving incident management, observability pipelines, and operational automation for highly distributed, multi-tenant financial decisioning platforms.

Skilled in Kubernetes, Grafana, Splunk, and AWS — focused on eliminating toil, enforcing SLA/SLO/SLI frameworks, and evangelizing SRE culture across engineering teams.

Skills

Domain	Technologies
Cloud & Infrastructure	AWS (EC2, EKS, CloudWatch, S3), Azure (PaaS, IaaS, DevOps Pipelines)
Containers & Orchestration	Kubernetes, Docker, Amazon EKS, Helm
Observability	Grafana, AppDynamics, Splunk, CloudWatch, Zabbix, Kibana
CI/CD & Automation	Jenkins, Ansible, Puppet, Git, Azure DevOps, Bash, Python, Go
Networking	F5 GTM/LTM (BigIP), Load Balancing, SSL/TLS management
ITSM & Collaboration	ServiceNow (ITIL), Jira, Confluence
AI-Assisted Operations	Claude Code (Anthropic) — agentic SRE workflows, runbook automation, incident triage
Operating Systems	RHEL, Amazon Linux, Windows Server, AIX, Debian/Ubuntu

Experience

Site Reliability Engineer Lead · FICO · Feb 2023 – Present

Aguascalientes Metropolitan Area · Remote

Leading end-to-end production reliability for enterprise-scale financial decisioning platforms serving millions of credit and risk decisions per day — across AWS, EKS, and private cloud. Responsible for incident management, observability, automation, and SRE culture across a highly distributed, multi-tenant architecture.

3 Key Results:

Achieved minimal downtime Kubernetes/EKS rolling restarts across all production financial platforms — eliminating deployment-related SLA risk.
Built full observability coverage across all 4 golden signals (saturation, traffic, latency, error rate) using Grafana, AppDynamics, CloudWatch, and Splunk — closing the blind spots that previously caused SLA breaches, cutting mean time to detect (MTTD) by ~40%.
Introduced Claude Code (Anthropic) for agentic SRE workflows — automating runbook execution and incident triage, reducing manual toil in production response.

How I do it: Own the full SRE lifecycle: incident management - blameless postmortems - permanent corrective actions (PCAs) - continuous SLI/SLO/SLA tuning. Enforce ITIL processes through ServiceNow for change control and problem management. Evangelize SRE best practices across the engineering org.

Stack: AWS · EKS · Kubernetes · Docker · Grafana · AppDynamics · Splunk · CloudWatch · Ansible · Puppet · Bash · ServiceNow · Jenkins · Jira · Claude Code

Technology Specialist · Amdocs · Jun 2018 – Mar 2023

Zapopan, Jalisco, Mexico · 4 yrs 10 mos

Led middleware infrastructure operations and cloud migration for multi-client enterprise environments — coordinating cross-functional engineering teams across Mexico, USA, and India. Responsible for platform stability, deployment pipelines, and cloud transformation across multiple concurrent enterprise accounts.

3 Key Results:

Reduced operational toil by 50% by building Bash automation scripts for recurring service workflows — freeing the team to focus on higher-value engineering work.
Led end-to-end Azure cloud migration (PaaS/IaaS/SaaS) for on-prem enterprise applications — aligning stakeholders across geographies and business units to deliver on time.
Achieved zero SSL expiration incidents across all accounts by designing and owning a proactive certificate lifecycle management process across multiple environments.

How I worked: Managed middleware performance and escalated incident resolution for WebLogic, JBoss, WebSphere, and Tomcat environments. Led and coordinated application deployments via Azure DevOps pipelines across non-prod and prod. Mentored team members on CI/CD practices, Git workflows, and Docker/Kubernetes container fundamentals.

Stack: Azure DevOps · WebLogic · JBoss · WebSphere · Tomcat · Docker · Kubernetes · Bash · Git · ServiceNow · Jira

Software Engineer · Softtek · Sep 2016 – Mar 2018

Aguascalientes, Mexico · 1 yr 7 mos

Middleware engineer for Java enterprise applications supporting global financial services and telecom clients — operating IBM WebSphere environments in high-availability, multi-geography production settings.

3 Key Results:

Resolved escalated production incidents for global financial and telecom clients by performing root cause analysis and coordinating cross-geography distributed teams — preventing repeat failures.
Operated IBM PureApplication System across enterprise-scale environments — managing WebSphere Application Server (WAS) and WebSphere Process Server (WPS) for mission-critical Java workloads.
Delivered middleware design reviews and performance analysis in collaboration with application development teams — bridging the gap between dev and ops before DevOps was the norm.

How I worked: Supported high-stakes middleware environments for clients where downtime was not an option. Coordinated incident resolution across distributed teams spanning multiple geographies and time zones — building the cross-functional collaboration and escalation skills that would define my later SRE career.

Stack: IBM WebSphere (WAS/WPS) · IBM PureApplication · Java · ServiceNow · Linux

Technical Lead & Backup Administrator · IBM · Sep 2013 – Sep 2016

Guadalajara / El Salto, Jalisco, Mexico · 3 yrs 1 mo

Technical Lead for Middleware infrastructure, coordinating a cross-geography team across Mexico, USA, and India supporting enterprise accounts.

Led technical best practices and onboarding for new team members
Administered IBM Tivoli Storage Manager (TSM) and R1Soft backup solutions for on-prem and Softlayer cloud servers
Coordinated middleware deployments and change control across geographies

Stack: IBM TSM · R1Soft · Softlayer · Middleware · Linux · AIX

IT Specialist, WW DST Backups · IBM · Sep 2013 – Sep 2016

Guadalajara / El Salto, Jalisco, Mexico

Sysadmin and backup engineer for IBM's worldwide DST operations — part of a global team managing 7,000+ servers hosting hundreds of enterprise projects.

Administered IBM AIX systems (LPAR/HMC/NIM), Linux (RHEL, SUSE), and Windows Server (2008/2012) environments on-prem and on Softlayer cloud
Managed IBM System Z (zLinux/SUSE) — disk cloning, DASD administration, LVM filesystem management
Installed and configured IBM WebSphere (WAS/WPS), InfoSphere, and DB2 databases (v9.5–10.1) across AIX, Linux, and Windows
Investigated and resolved OS and middleware security vulnerabilities at global account level
Developed Bash/ksh automation scripts for server administration and operational efficiency

Stack: AIX · RHEL · SUSE · IBM TSM · WebSphere · DB2 · Softlayer · Bash/ksh

Education

Computer Systems Engineer (I.S.C.) Universidad Privada del Sur de México · 2007 – 2010

Higher Technical Degree in Computer Science (T.S.U.I.) Universidad Tecnológica de la Selva · 1999 – 2001

Certifications

Certification	Issuer	Valid
AWS Certified Cloud Practitioner	Amazon Web Services · verify	Mar 2026 – Mar 2029
Microsoft Certified: Azure Fundamentals	Microsoft · verify	Nov 2020
Claude Code in Action	Anthropic · verify	May 2026

Training & Badges

Badge	Issuer	Earned
AWS Cloud Quest: Cloud Practitioner	Amazon Web Services · verify	Jan 2025

Languages

Spanish — Native or bilingual proficiency
English — Professional working proficiency