Enable job alerts via email!

Lead Cloud Operations Engineer - Production Support Services

Fannie Mae

Great Falls Crossing (VA)

On-site

USD 121,000 - 158,000

Full time

6 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Fannie Mae is seeking a Lead Cloud Operations Engineer to optimize AWS applications and enhance operational resilience. This role involves leading cloud initiatives, mentoring engineers, and driving continuous improvement across teams, ensuring efficient and robust cloud infrastructure.

Qualifications

  • 4 years of experience in production environments.
  • Strong understanding of AWS and cloud maturity.

Responsibilities

  • Lead cloud operations maturity and optimization.
  • Manage incident response and operational excellence.
  • Collaborate for disaster recovery and system resilience.

Skills

Cloud operations
Incident management
AWS services
Scripting (Python, Bash)
CI/CD pipelines

Education

Bachelor's Level Degree

Tools

AWS
Terraform
Jenkins

Job description

Lead Cloud Operations Engineer - Production Support Services

Join to apply for the Lead Cloud Operations Engineer - Production Support Services role at Fannie Mae

Lead Cloud Operations Engineer - Production Support Services

1 day ago Be among the first 25 applicants

Join to apply for the Lead Cloud Operations Engineer - Production Support Services role at Fannie Mae

At Fannie Mae, the inspiring work we do helps make a home a possibility for millions of homeowners and renters. Every day offers compelling opportunities to impact the future of the housing industry while being part of a collaborative team thriving in an energizing environment. Here, you will grow your career and help create access toaffordable housing finance.

Job Description

THE IMPACT YOU WILL MAKE

As a Lead Cloud Operations Engineer, you will play a pivotal role in enhancing the resilience, efficiency, and performance of our AWS-hosted applications. With our cloud adoption complete, the focus now shifts to optimizing our systems for scalability, observability, and cost-effectiveness. You will lead key initiatives, mentor engineers, and collaborate across teams to ensure our cloud infrastructure is robust and future ready.

Key Responsibilities

  • Cloud Maturity & Optimization Partner across teams to elevate cloud operations maturity, focusing on improving resiliency, observability, performance, and cost-effectiveness of AWS-based systems.
  • Monitoring, Incident Response & Operational Excellence Utilize observability platforms (e.g., CloudWatch, Splunk, Dynatrace, OpenTelemetry) to lead incident triage, root cause analysis, and continuous improvement efforts.
  • Infrastructure & Network Insight Maintain comprehensive knowledge of application architecture, including firewalls, load balancers, DNS (Route53), WAF, and Layer 3/4 network components to ensure secure and efficient system operations.
  • Collaboration & Stakeholder Engagement Partner with engineering, architecture, and product teams to influence infrastructure roadmaps, support enterprise-wide changes, and ensure alignment with business goals.
  • Resilience & Disaster Recovery Collaborate with cross-functional teams to lead disaster recovery planning and execution, ensuring critical systems remain highly available and resilient.

Lead Expectations

  • Operational Leadership: Serve as the escalation point for critical incidents, leading resolution efforts and post-incident reviews. Ensure clear communication with stakeholders and vendors during high-impact events.
  • Strategic Influence: Collaborate with engineering and application teams to shape infrastructure roadmaps and align operational goals with organizational priorities.
  • Mentorship & Knowledge Sharing: Coach and support engineers through training and hands-on guidance. Foster a culture of continuous learning and shared ownership.
  • Change Management: Evaluate and provide guidance on risks associated with enterprise-wide cloud changes, ensuring implementations are resilient, minimally disruptive, and aligned with compliance and governance frameworks.
  • Process Improvement & Best Practices: Drive the development and adoption of operational best practices, including automation, monitoring, and incident response frameworks. Lead initiatives to improve efficiency, reduce risk, and enhance system resilience.

Required Experience

  • 4 years of experience in hands-on incident management in 24x7 production environments, including on-call responsibilities.
  • 4 years of experiencein cloud operations with a focus on maturing cloud environments and driving operational excellence.
  • Strong experience in 24x7 operational environments, including on-call rotations.
  • Proven ability to lead cross-functional initiatives and influence technical roadmaps.

Desired Experience

  • Bachelor's degree or equivalent
  • AWS certification (Solutions Architect, SysOps Administrator, or DevOps Engineer); Azure certification is a plus.

Technical Skills

  • Advanced knowledge of AWS services (EC2, ECS, Lambda, EB, EMR, Glue, RedShift, IAM, CloudTrail, CloudFormation, CloudWatch, VPC, CloudFront, ELB, RDS, SNS/SQS, S3, EFS); working knowledge of Azure services and multi-cloud environments.
  • Advanced scripting skills in Python and Bash; experience with AWS SDKs (e.g., boto3) for automation and custom tooling.
  • Hands-on experience with CI/CD pipelines using tools like Jenkins, GitLab CI, AWS CodePipeline, and GitHub Actions; strong understanding of release automation and deployment strategies
  • Proficient in Terraform, AWS CloudFormation, and CDK for infrastructure provisioning and management across multiple environments.
  • Familiarity with IAM policies, KMS, Secrets Manager, and AWS Config.
  • In-depth knowledge of enterprise networking concepts including VPC design, subnets, NAT gateways, VPNs, Direct Connect, firewalls, WAF, Route53, DNS, and Layer 3/4 appliances; familiarity with Zero Trust and network segmentation principles.
  • Experience with Docker, Amazon ECS, and EKS (Kubernetes); understanding of container lifecycle management, service discovery, and scaling strategies.
  • Experience with ITSM tools like ServiceNow and Jira; understanding of ITIL practices including incident, change, and problem management.
  • Experience with observability platforms such as Splunk/SignalFX, Dynatrace, OpenTelemetry, and Grafana.

Qualifications

Education:

Bachelor's Level Degree (Required)

The future is what you make it to be. Discover compelling opportunities at Fanniemae.com/careers.

For most roles, employees are encouraged to work onsite on a regular basis at their designated office location. In-office work cadence is determined by your manager. Proximity within a reasonable commute to your designated office location is preferred unless the job is noted as open to remote.

Fannie Mae is an equal opportunity employer and considers qualified applicants for employment without regard to race, color, religion, sex, national origin, disability, age, sexual orientation, gender identity/gender expression, marital or parental status, or any other protected factor. Fannie Mae is committed to providing reasonable accommodations to qualified individuals with disabilities who are employees or applicants for employment, unless to do so would cause undue hardship to the company. If you need assistance using our online system and/or you need a reasonable accommodation related to the hiring/application process, please complete this form .

The hiring range for this role is set forth below. Final salaries will generally vary within that range based on factors that include but are not limited to, skill set, depth of experience, certifications, and other relevant qualifications. This position is eligible to participate in a Fannie Mae incentive program (subject to the terms of the program). As part of our comprehensive benefits package, Fannie Mae offers a broad range of Health, Life, Voluntary Lifestyle, and other benefits and perks that enhance an employee's physical, mental, emotional, and financial well-being. See more here .

Requisition Compensation

121000

to

158000

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology

Referrals increase your chances of interviewing at Fannie Mae by 2x

Get notified about new Cloud Engineer jobs in Reston, VA.

Arlington, VA $117,000.00-$150,000.00 3 days ago

Tysons Corner, VA $99,008.00-$134,368.00 1 month ago

Arlington, VA $99,008.00-$134,368.00 1 month ago

Lorton, VA $67,600.00-$122,200.00 2 days ago

Washington, DC $150,000.00-$170,000.00 1 day ago

Rockville, MD $99,008.00-$134,368.00 1 month ago

Washington, DC $99,008.00-$134,368.00 1 month ago

Washington, DC $95,000.00-$115,000.00 1 week ago

District of Columbia, United States $90,000.00-$145,000.00 6 months ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior DevOps Cloud Engineer and RHEL Specialist

Saic

Alabama

Remote

USD 120,000 - 160,000

Today
Be an early applicant

Principal Platform Engineer (Frontend)

Vonage

Remote

USD 130,000 - 180,000

4 days ago
Be an early applicant

Lead Data Operations Engineer

Thryv

Remote

USD 120,000 - 170,000

6 days ago
Be an early applicant

Principal Platform Architect - Financial Services

ServiceNow

Addison

Remote

USD 120,000 - 160,000

7 days ago
Be an early applicant

DevOps Engineer IV

Southern Arizona Legal Aid Inc.

Arizona

Remote

USD 120,000 - 160,000

Today
Be an early applicant

Lead Database Infrastructure and Operations Engineer

Healthfirst

Remote

USD 97,000 - 165,000

12 days ago

Lead Data Operations Engineer

Thryv

Remote

USD 100,000 - 135,000

8 days ago

APPLICATIONS ANALYST 3 – BILLING (TWO OPENINGS)

University of Washington

Seattle

Remote

USD 125,000 - 150,000

5 days ago
Be an early applicant

Engineer, Systems Expert

Minnesota Ag Connection

Rosemount

Remote

USD 127,000 - 168,000

Today
Be an early applicant