Location: Bay Area (Hybrid) Duration: Long-Term Contract Note: Local candidates only
Job Overview
We are seeking an experienced MLOps Engineer to design, build, and maintain scalable machine learning operations pipelines that support the full model lifecycle—from development and training to deployment, monitoring, and retraining. This role focuses on enabling production-grade ML systems using modern cloud platforms, CI/CD practices, and MLOps frameworks, ensuring reliability, scalability, governance, and performance of machine learning models in enterprise environments.
Key Responsibilities
ML Pipeline Development & Operations
Develop and maintain robust machine learning pipelines using frameworks such as MLflow, Kubeflow, or Vertex AI.
Automate the end-to-end ML lifecycle, including model training, validation, testing, deployment, and monitoring in cloud environments.
Implement reusable and scalable workflows for model versioning, tracking, and retraining.
CI/CD & Model Lifecycle Management
Design and implement CI/CD pipelines for machine learning models, ensuring seamless integration from development to production.
Manage model versioning, model registry, and deployment pipelines with strong governance practices.
Ensure reproducibility and traceability across ML lifecycle stages.
Cloud, Containers & Deployment
Deploy and manage ML workloads on cloud platforms such as GCP, AWS, or Azure.
Work with containerization technologies like Docker and Kubernetes to provision scalable model serving environments.
Enable low-latency model scoring APIs for real-time inference use cases.
Monitoring, Governance & Compliance
Implement model monitoring and observability frameworks to track performance, drift, and anomalies in production.
Ensure compliance with model risk management (MRM) standards, including documentation, explainability, and audit readiness.
Establish alerts and feedback loops for continuous model improvement and retraining.
Collaboration & Engineering Enablement
Collaborate with data engineering and platform teams to build and optimize data pipelines and ML infrastructure.
Support engineering teams in provisioning scalable environments for ML model development and deployment.
Partner with stakeholders to translate business requirements into ML-driven solutions.
AutoML & Accelerated ML Development
Leverage AutoML tools such as Vertex AI AutoML and H2O Driverless AI to accelerate model development and deployment.
Enable low-code/no-code ML workflows where appropriate, while ensuring production-grade quality and governance.
Required Qualifications
10+ years of experience in Software Engineering, with at least 3+ years focused on AI/ML and MLOps.
Strong programming experience in Python and Java, along with SQL and ML libraries such as scikit-learn, XGBoost, TensorFlow, or PyTorch.
Hands-on experience with cloud platforms (GCP, AWS, or Azure).
Strong knowledge of containerization technologies (Docker, Kubernetes).
Experience with data engineering and workflow orchestration tools such as Airflow and Spark.
Solid understanding of DevOps principles, CI/CD practices, and software engineering best practices.
Strong communication skills with the ability to explain complex ML concepts to both technical and non-technical stakeholders.
Preferred Qualifications
Experience with Vertex AI, MLflow, Kubeflow, or similar MLOps platforms.
Familiarity with model governance frameworks (MRM, model documentation, explainability tools).
Experience building real-time inference systems and scalable ML APIs.
Exposure to enterprise-scale ML systems in regulated industries.
Career Form
Do you want to work with us? Please fill in your details below.