Sr. Machine Learning Operations Engineer
Reports to: Head of AI
Location: Latin America
Hours: Full-time
Compensation: $65K - $95K
About ReflexAI
ReflexAI brings the best in machine learning and natural language processing to mission-driven, people-centric organizations via innovative tools that transform how they train, develop, and empower their frontline teams. The AI / ML team at ReflexAI is involved in multiple large-scale projects, including ReflexAI’s highly visible development of a tool that trains veterans to support each other through mental health challenges.
About this role
As a Senior Machine Learning Operations Engineer at ReflexAI, you will be involved in multiple large-scale projects across building, iterating, and maintenance of the ML platform. The purpose of the ML platform is to accelerate development, deployment, and maintenance of ML models. To be successful, you should have significant experience with ML system design, cloud platforms, and databases. You will also collaborate with software engineering, product, and business team members. You should be excited about being a part of a fast-paced startup that makes a real impact across important industries including crisis response, public safety, healthcare, and more.
What you’ll do
- Design, plan, and implement changes and improvements to our ML platform for efficient, large-scale training and maintenance of ML models and multi-platform support
- Work cross-functionally with ML engineers, product managers, and software engineers to understand the various use cases and pipelines needed
- Set standards and procedures on how to run workflows for machine learning
- Implement continuous integration and continuous deployment (CI/CD) for machine learning models and data pipelines
- Expand the data warehouse with new data sources, access features, and dashboards
Requirements for a great fit
- Significant experience (typically 3+ years) of ML or ML Ops experience in a production environment
- Significant experience (typically 3+ years) building end-to-end scalable ML infrastructure and data pipelines with on-premise or cloud platforms including Google Cloud Platform (GCP), Amazon Web Services (AWS) or Azure
- Significant experience (typically 3+ years) with container-based deployments (e.g., Docker, Kubernetes)
- Proficiency in Python, Terraform, and data warehousing languages and tools (SQL, dbt, BigQuery, Fivetran, etc.)
- Strong teamwork skills including communication and collaboration with both technical and non-technical team members
- Open mindedness as demonstrated by ability to consider other perspectives and feedback, ability to engage in discussions with professionalism and empathy, and a strong desire to learn