Infrastructure - DevOps & Site Reliability Engineer



Software Engineering, Other Engineering
New York, NY, USA
Posted on Friday, October 21, 2022

Blackbird.AI helps organizations discover emergent threats and stay one step ahead of real-world harm through our AI-powered Narrative and Risk Intelligence Platform. Our commitment is to prioritize safety and security, providing the tools to identify potential risks and ensure a safer environment proactively. No matter the job or where it’s located, we’re all connected by a shared vision: To lead and enhance the landscape of risk intelligence.

Reporting to the Head of Infrastructure, you will be instrumental in shaping the infrastructure architecture for a real-time streaming cloud-hosted analytics platform, aiding Blackbird.AI in laying a robust groundwork for the deployment of various microservices, databases, and frameworks. Additionally, you will be involved in integrating performance monitoring tools, continuous integration, and deployment pipelines to support these endeavors.

The DevOps & Site Reliability Engineer will be the driving force behind the architecture of our cutting-edge platform. You’ll work on Linux servers, ensuring their optimal performance. Your expertise in Kubernetes, Docker, and scripting languages will be instrumental in engineering fault tolerance and crafting efficient deployment scripts. Crucially, in this role, you will maintain and optimize essential components like Elasticsearch, and Prometheus, all while upholding rigorous security standards.

As the DevOps & Site Reliability Engineer, you’ll have the chance to:

  • Manage both AWS-hosted and self-hosted Linux servers.
  • Demonstrate proactive troubleshooting and clear communication during server infrastructure deployments.
  • Create and maintain Kubernetes clusters that house ETL processes and web services.
  • Engineer fault tolerance mechanisms, backup procedures, and data retention policies.
  • Develop deployment and rollout scripts for seamless processes.
  • Monitor and scale various databases to optimize performance.
  • Develop monitoring and telemetry as needed to ensure comprehensive system observability and alert generation
  • Maintain diverse web applications through web servers and ingresses.
  • Scale and manage multiple deployments, including ElasticSearch, PostgresDB, Redis, and more.
  • Support “security by design” to meet infosec objectives and conduct security audits and oversee security-related aspects like TLS and firewalls.
  • Automate deployments that are cloud-agnostic and adaptable to AWS or on-premise environments.
  • Collaborate with the data engineering and full-stack development teams to uphold best practices in stack selection and deployment

What you’ll bring:

  • Bachelor’s degree in Computer Science or equivalent.
  • Proven track record of successfully deploying products in the cloud and SaaS model, emphasizing horizontal scalability and distribution.
  • Expert-level proficiency in Linux systems.
  • Familiarity with infrastructure as code tools, such as Terraform or equivalent.
  • Mastery of the Kubernetes ecosystem and Docker containers.
  • Proficiency in Helm charts, Python, and/or Golang.
  • Strong knowledge of web servers and security-related concepts.
  • Demonstrated experience in building and maintaining Prometheus, Grafana, and establishing infrastructure monitoring.
  • Good familiarity with ElasticSearch and MetricBeat for log monitoring.
  • Solid background in addressing infrastructure security concerns.
  • 2+ years of hands-on experience in developing with Python and Bash.
  • Experience in managing secret stores, including Vault or similar solutions.
  • Expertise in build automation, continuous integration, and deployment (CI/CD) tools.
  • Experience working with cloud-based services (similar to AWS S3, CloudFront, Route53, and ElastiCache).
  • Proven track record of collaborating with distributed teams.

Helpful to have:

  • Experience with MLOps frameworks like Kubeflow, SeldOn, or similar.
  • Technical background or experience in AI/ML deployments.
  • Experience with multi-tenant deployments in AWS or similar environments.
  • Experience with obtaining certifications such as SOC2 and Fedramp.
  • Familiarity with mainstream ETL tools, such as Airflow or equivalent.
  • Experience handling massive datasets on the order of terabytes.

We’ve outlined specific skills, experience, and requirements for this position, but don’t stress if you don’t meet every single one. Our Talent Team is dedicated to discovering exceptional individuals, and they might identify a relevant aspect of your background that suits this role or another opportunity within Blackbird.AI.

If you have passion for the role, please still apply.

What’s in it for you:

Blackbird.AI is embarking on an exciting growth journey with numerous opportunities for career development within the company. You will join a nurturing, inclusive, and experienced team.

Join us as we soar to new heights!


At Blackbird.AI, our core values shape how we work and make decisions. Our values inspire us to be authentic and continue improving.

We embrace a strong sense of responsibility to society, recognizing the vital role our services play in empowering governments, communities, and individuals to foster critical thinking and empowerment. We believe in integrating personal and professional lives with societal needs, emphasizing the importance of creating an environment that attracts top talent and provides substantial growth opportunities. We are motivated by the potential of science and technology to impact humanity positively.

Why you’ll love working here:

  • Competitive compensation package, 401(k), and equity - everyone has a stake in our growth!
  • Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - an apple a day doesn’t always keep the doctor away!
  • Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
  • A flexible work environment with opportunities to collaborate with your team in person - you can have it all!
  • Inclusion and Impact - soar to new heights!
  • Bi-annual offsites - have fun with your colleagues!
  • Professional development stipend - never stop learning!

Pay Transparency:

For individuals assigned and/or hired to work in New York, Blackbird.AI is required by law to include a reasonable estimate of the compensation range for this role. This compensation range is specific to New York. It takes into account the wide range of factors that are considered in making compensation decisions, including, but not limited to, skill sets, experience and training, licensure and certifications, and other business and organizational needs. At Blackbird.AI, it is not typical for an individual to be hired at or near the top of the range for their role, and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current compensation range for this position is expected to be $130,000-$170,000. This range may vary for positions outside of New York and as it has not been adjusted for the applicable geographic differential associated with the location where the position may be filled and does not consider our bonus and commission structures.

Regardless of location, candidates can expect during the first few conversations with Blackbird.AI’s Talent Team and Hiring Managers to share any approved budget and details on our competitive bonus and commission packages.

Apply Today

Equal Opportunity Employer