ML Ops Engineer

JOB OPENING

ML Ops Engineer

Supporting Country:

India

Location:

India

Vacancy ID:

VAC3612

Job Description

Responsibilities

You'll play a key role in developing our ML & data platform from ground up, as part of a small but high-performing team. You will influence the scope and technical direction as well as champion best practices within the team. You will continuously pursue clean code practices and contribute towards overall platform architecture, collaborating with our other Engineering and Product teams.
You will:
- Work with engineers, researchers and data scientists to build the next generation of Tractable’s ML & data platform
- Help identify and realise capabilities in our ML & data platform that massively speed up getting research to production across dataset & model management, model training, model serving, labelling, data & ML pipeline orchestration and more
- Support Research and Product Engineers with tools and processes to enable a
seamless data flywheel
- Deploy and continuously develop robust infrastructure, using best practices for
managing infrastructure-as-code
- Solve cost and performance scalability challenges in both model training and model serving
- Run, monitor and maintain business-critical, production systems
- Adopt open-source technologies to best leverage our in-house resources
- Promote engineering best practices throughout the team
- Suggest, collect and synthesise requirements to create an effective feature roadmap

Description

ML foundations team focuses on building tools and services for our internal customer within Tractable, research, product, engineering and Operation specialists.
We have 3 teams that tackle different aspects of this space, ML applications, Data operations and ML Infrastructure. You'll be collaborating with peer teams and enhance, build and maintain the ML infrastructure stack.
We are looking for a Senior [Data|ML Ops] Engineer to build and support systems that enable the core mission of Tractable - to make applied AI possible - by optimising the end-to-end Machine Learning life cycle. The vision of the ML Infrastructure is to enable researchers to spend 80%+ of their time solving tricky ML problems rather than dealing with engineering/infra/ops challenges.
You will help mature our ML and data platform to a world-class state. You will influence the scope and technical direction as well as champion best practices within the team. You have a relentless focus on user experience (researchers, data scientists and product engineers.
and you care deeply about what your team is building to make sure it will have the biggest impact on your users. You will be a strong mentor, nurturing an encouraging and supportive environment to enable the team to do their best work.

Education and Experience

Skills and Behaviours

We rely heavily on the following tools and technologies, but we are likely to explore newtechnologies / frameworks as we are building the platform from ground up. You don't need to have prior experience in all of them, and we actively encourage diverse views on what the best tools for the job are. We’re just keen to know that you're willing to break things, fix things, learn fast and help build a great team that is capable of building a platform that delights our customers.
- Main Infrastructure: AWS (EC2, S3, MSK, Lambda, StepFunctions, Glue, IAM,
Cognito, Systems Manager, CloudWatch, SQS, Route 53, Sagemaker), Apache Kafka (AWS MSK), Kubernetes, Datadog (Metrics, Logs, Synthetics), Pagerduty, Loki, Elastic Search
- Main CI/CD: Terraform, Docker, Harness
- Main Databases: Postgres / RDS, Redis, DynamoDB
- Main Languages: Python, Node + Typescript, SQL (Postgres)
- Main Data stack: AWS MSK, AWS Lambda, AWS Redshift, dbt, Airflow, Airbyte, AWS Glue
- Main ML stack: Triton, TFServing, KServe, AWS Sagemaker, AWS Lambda, AWS
MSK, sync/async APIs, Weights & Biases, Tensorflow, Pytorch, dvc, Dagster/Flyte, Streamlit

We encourage you to drop us a line even if you don’t have all the points above. That’s a lot of different areas of responsibility! We will help you pick them up because we believe that great people come from all walks of life

JOB OPENING

ML Ops Engineer

Job Description

Responsibilities

Description

Education and Experience

Skills and Behaviours

Connect With us

Fill up the form and our team will get back to you within 24 business hours.

Let's get the conversation started

Fill up the form and our team will get back to you within 24 business hours.

Let's get the conversation started

Fill up the form and our team will get back to you within 24 business hours.