prepme.io
← New briefing
DevOps Engineer
Vetric · senior
Hands-on labs
Debug the Vetric platform — one pod won't go Readyeasy
A live k3d cluster inside your lab container is running the Vetric scraping platform (postgres, redis, scraper-api, scraper-worker). Three services are healthy; scraper-api never reaches Ready. Find the cause and fix it without restarting the whole cluster.
Enter lab →
GitOps CI/CD with GitHub Actions + ArgoCDmedium
Build an end-to-end delivery pipeline where application code in GitHub flows through GitHub Actions into an ArgoCD-managed EKS cluster, with automated promotion and rollback. Simulate a bad deploy and validate recovery.
Enter lab →
Observability and Event-Driven Autoscaling for a Data Pipelinehard
Instrument a simulated high-volume data ingestion workload on Kubernetes with Prometheus/Grafana and scale it dynamically with KEDA based on queue depth. Prove the system stays within SLO under a synthetic load spike.
Enter lab →
Troubleshooting drills
2 scenarios — run them as interactive practice
hard
A Terragrunt-managed Terraform module change caused drift in production VPC routing and broke egress for a subset of pods. How do you detect, contain, and recover?
Run drill →
medium
Pods in one node group are being OOMKilled during peak scraping bursts while other nodes sit idle. How do you diagnose and fix?
Run drill →
Stack
18 mentioned · 4 inferred
AWS (EKS, EC2, IAM, VPC)KubernetesTerraformOpenTofuTerragruntGitHub ActionsJenkinsArgoCDPrometheusGrafanaELK StackCloudWatchBashPythonJavaScriptAmazon ECSKEDAGitHubHelmDockerKafka or similar streamingS3
Likely questions
  • architecturehard
    Walk me through how you would design a multi-AZ EKS cluster to run data-heavy scraping and ETL pipelines with predictable cost and strong blast-radius isolation.
  • troubleshootinghard
    A Terragrunt-managed Terraform module change caused drift in production VPC routing and broke egress for a subset of pods. How do you detect, contain, and recover?
    Practice
  • systemsmedium
    How would you structure a Terraform/OpenTofu/Terragrunt monorepo for multiple AWS accounts and environments from scratch?
    Practice
  • architecturehard
    Design a GitOps CI/CD flow using GitHub Actions and ArgoCD for ~50 microservices across dev/staging/prod with automated rollback.
  • troubleshootingmedium
    Pods in one node group are being OOMKilled during peak scraping bursts while other nodes sit idle. How do you diagnose and fix?
    Practice
  • systemsmedium
    How would you use KEDA to autoscale workers consuming from a queue, and what pitfalls have you hit with event-driven autoscaling?
    Practice
  • behavioraleasy
    You're the first DevOps hire. Walk through your 30/60/90 plan and how you get engineering buy-in for new standards.
    Practice
  • securityhard
    How do you secure an EKS cluster end-to-end: IAM roles for service accounts, network policies, secrets, image provenance?
    Practice
  • scriptingmedium
    Write a Bash or Python script that rotates IAM access keys across all AWS accounts and updates them in GitHub Actions secrets.
    Practice
  • architecturemedium
    What SLOs and observability signals would you define for a data pipeline ingesting billions of records per day?
Culture
  • · Bootstrapped and profitable — long-term thinking over hype-chasing, expect pragmatic tech choices
  • · First DevOps hire with full technical authority — high ownership, greenfield, must be self-directing
  • · Small but growing team — your decisions set precedent for years, low bureaucracy
  • · Mission-critical customers in cybersecurity/public safety — high reliability bar and likely on-call expectations
  • · Engineering-discipline-heavy culture — code quality, standards, and documentation are valued
  • · On-site/hybrid in Tel Aviv implied — in-person collaboration expected, not a remote-first org
From the bank
4 for this stack
  • We run Terraform across ~40 modules. What's your strategy for dependency ordering, state isolation, and avoiding drift?
  • Write a Bash one-liner that tails every pod's logs in a namespace and highlights any line containing 'ERROR' or '5xx'.
  • Tell me about a time a deploy went wrong in production. What was the blast radius, how did you recover, and what did you change afterwards?
Browse all →
Original job description
DevOps Engineer
Engineering Tel Aviv, IsraelSeniorFull-time
Description
What is Vetric?

Vetric builds large-scale public data infrastructure.

We provide data pipelines that collect, structure, and deliver high-volume public web data for mission-critical companies operating in cybersecurity, public safety and digital risk protection.

Our systems power platforms that detect bad actors, uncover impersonation and fraud, identify coordinated manipulation, and help public safety organizations respond faster to real-world risks.

We don’t build dashboards, and we don’t sell surface-level insights.

We build stable, production-grade data flows that become part of our customers’ core products, with the real impact of saving lives or huge known organizations from bad actors.

Operating globally, we serve industry leaders across more than 20 countries who rely on us for scale, reliability, and depth.



Why Vetric?

Vetric is profitable from day one (fully bootstrapped - we haven’t raised external funding), and we’re building foundational technology - not chasing trends. Because this is infrastructure that matters, we operate with engineering discipline, strong ownership, and long-term thinking.

We’re at a true inflection point: the team is now large enough to require real infrastructure, yet still small enough that what you build will define how things work for the next several years.

This is infrastructure that matters and so is how we operate internally. You’ll be working with a sharp, focused team that takes engineering discipline seriously and is intentionally building an organization that matches the quality of its product.



Position Overview

We are seeking a DevOps Engineer to lead and own the entire DevOps function at Vetric. 

As our first DevOps hire, you won’t just maintain systems, you will set the vision, establish best practices, and build the foundation of our infrastructure strategy for years to come. This is a unique opportunity to step into an impactful role with full technical authority, influencing architectural decisions and guiding how our engineering teams deliver, scale, and secure our large-scale, data-intensive platform.



Responsibilities:

Define and drive Vetric’s infrastructure strategy across all environments
Architect and operate Kubernetes clusters at production scale with a focus on reliability, resilience, and data-heavy workloads
Lead the adoption of Infrastructure as Code (Terraform, OpenTofu, Terragrunt) and establish automation standards
Implement modern CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD, or similar)
Champion observability, monitoring, and reliability engineering practices
Build and optimize infrastructure that powers large-scale, data-driven pipelines at massive scale
Serve as the technical authority for all DevOps matters, influencing and aligning engineering teams
Partner with engineering leadership to shape infrastructure roadmaps and technology choices
Requirements
Qualifications:

5+ years of deep, hands-on AWS experience (EKS, EC2, networking, IAM, scaling)
Proven success in senior DevOps / Cloud Engineering leadership roles
Expert knowledge of Terraform and modern IaC tools (OpenTofu, Terragrunt)
Strong Kubernetes expertise at scale (design, scaling, optimization)
Experience running high-scale, production-grade environments handling large data volumes
Excellent communication skills with the ability to influence, guide, and align teams
Solid scripting/automation skills (Bash, JavaScript, Python, or similar)
Familiarity with cloud-native monitoring & logging stacks (Prometheus, Grafana, ELK, CloudWatch, etc.)


We’d be lucky if you:

Experience with Amazon ECS
Proficiency with GitHub or similar platforms (GitLab, Bitbucket, etc.)
Exposure to event-driven architectures and autoscaling frameworks (KEDA or similar)
prepme.io — DevOps interview prep