Site Reliability Engineer

United States

Closing in 5 days

A World-Changing Company

Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role

We're looking for Site Reliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure, across both cloud & on-prem environments. Site Reliability Engineers combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes wherever possible, using whichever tools are best for the job. You'll be the experts for the environments that you operate infrastructure in, helping partner teams build & configure their software to operate reliably within.

We strongly believe in engineering teams being responsible for the operations of their services in production. In this role, you'll work closely with engineers to advocate and participate in sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues.

Core Responsiblities
 Maintaining availability of cloud & physical Linux servers that power the Palantir platform in air-gapped production environments
 Design, deploy, and operate infrastructure to support customer & product requirements via modern orchestration & monitoring platforms
 Collaborate closely with product teams on requirements & SLOs for deploying software into air-gapped environments.
 Identifying, troubleshooting, and solving network & systems issues
 Scripting to automate away routine operational tasks
What We Value
 Confidence in troubleshooting complex systems issues independently using observability tools and stack traces
 Ability to identify and automate highly manual tasks
 Comfort with large scale production systems and technologies - for example, load balancing, monitoring, distributed systems, or configuration management
 Proficiency with programming languages such as Java, C++, Python, JavaScript, or similar languages
 Ability to work with a high level of autonomy and responsibility in a rapidly changing environment with dynamic objectives and iteration with users
 Demonstrated ability to continuously learn and drive ongoing improvements within and across teams
 Active security clearance or the ability to obtain a clearance a plus
What We Require
 5+ years of experience with Linux system administration (RHEL or equivalent preferred)
 Experience with cloud-based hosting platforms like AWS, Azure, or GCP and/or experience with hardware-based environments
 Familiarity with monitoring systems using tools like Prometheus and writing health checks

Life at Palantir

We want every Palantirian to achieve their best outcomes, that's why we celebrate individuals' strengths, skills, and interests, from your first interview to your longterm growth, rather than rely on traditional career ladders. Paying attention to the needs of our community enables us to optimize our opportunities to grow and helps ensure many pathways to success at Palantir. Promoting health and well-being across all areas of Palantirians' lives is just one of the ways we're investing in our community. Learn more at Life at Palantir and note that our offerings may vary by region.

In keeping consistent with Palantir's values and culture, we believe employees are “better together” and in-person work affords the opportunity for more creative outcomes. Therefore, we encourage employees to work from our offices to foster connectivity and innovation. Many teams do offer hybrid options (WFH a day or two a week), allowing our employees to strike the right trade- off for their personal productivity. Based on business need, there are a few roles that allow for “Remote” work on an exceptional basis. If you are applying for one of these roles, you must work from the city and or country in which you are employed. If the posting is specified as Onsite, you are required to work from an office.

Palantir is committed to promoting a culture of diversity, equity, and inclusion. We believe that all Palantirians share the responsibility of upholding our commitment to these values and encourage candidates from a wide range of backgrounds, perspectives, and lived experiences to join us in solving the world's hardest problems.

Palantir is committed to making the job application process accessible to everyone. If you are living with a disability (visible or not visible) and need to request a reasonable accommodation for any part of the application or hiring process, pleasereach out and let us know how we can help.

Job details


Site Reliability Engineer




United States


March 31, 2024

Application deadline

April 30, 2024

Job type



ML, AI, Data Science

About the employer

In 2004, when we looked at the available technology, we saw products that were too rigid to handle novel problems, and custom systems that took too long to deploy and required too many services to maintain and improve.

We saw automated approaches that failed against adaptive adversaries, and all-or-nothing access controls that forced organizations to make unacceptable trade-offs between collaborating and securing sensitive data from misuse.

We saw a need for a different kind of technology, and we knew it would take a different kind of company to build it. That’s why we founded Palantir.


Similar jobs

Recent blogs