Skip to content

Machine Learning Ops (MLOps) and Data Science

  1. Introduction. MLOps
  2. MLOps Roadmap
  3. Blogs
  4. ML Infra
  5. Object Detection Libraries
  6. MLFlow
  7. Kubeflow
  8. Flyte
  9. AWS ML
  10. Azure ML
  11. KServe Cloud Native Model Server
  12. Data Science
  13. Machine Learning workloads in kubernetes using Nix and NVIDIA
  14. Other Tools
  15. Debugging ML Jobs
  16. Samples
  17. ML Courses
  18. ML Competitions and Challenges
  19. Polls
  20. Tweets

Introduction. MLOps

MLOps Roadmap


ML Infra

Object Detection Libraries




  • Union Cloud ML and Data Orchestration powered by Flyte
  • MLOps with Flyte: The Convergence of Workflows Between Machine Learning and Engineering
  • Machine Learning in Production. What does an end-to-end ML workflow look like in production? (transcript) 🌟🌟🌟 - Play Recording
    • Kelsey Hightower joined the @flyteorg team to discuss what ML looks like in the real world, from ingesting data to consuming ML models via an API.
    • @kelseyhightower You can’t go swimming in a #data_lake if you actually can’t swim, right? You’re going to drown. 🏊‍♂️
    • @ketanumare Machine Learning products deteriorate in time. If you have the best model today it’s not guaranteed to be the best model tomorrow.
    • @thegautam It’s hard to verify models before you put them in production. We need our systems to be fully reproducible, which is why an #orchestration_tool is important, running multiple models in parallel.
    • @ketanumare We at @union_ai unify the extremely fragmented world of ML and give the choice to users when to use proprietary technology versus when to use open source. (½)
    • @ketanumare #Flyte makes it seamless to work on #kubernetes with spark jobs, and that’s a big use case, but you can also use @databricks. Similarly, we are working on Ray and you can also use @anyscalecompute. (2/2)
    • @Ketanumare Most machine learning engineers are not distributed systems engineers. This becomes a challenge when you’re deploying models to production. Infrastructure abstraction is key to unlock your team’s potential.
    • @ketanumare on #Machine_Learning workflows: Creating Machine Learning workflows is a team sport. 🤝
    • @arnoxmp: A Machine Learning model is often a blackbox. If you encounter new data, do a test run first.
    • @fabio_graetz In classical software engineering the only thing that changes is the code, in a ML system the data can change. You need to version and test data changes.
    • @Forcebananza This is actually one of the reasons I really like using #Flyte. You can map a cell in a notebook to its own task, and they’re really easy to compose and reuse and copy and paste around. (½)
    • @Forcebananza Jupyter notebooks are great for iterating, but moving more towards a standard software engineering workflow and making that easy enough for data scientists is really really important.(2/2)
    • @jganoff Taking snapshots of petabytes of data is expensive, there are tools that version a dataset without having to copy it. Having metadata separate from the data itself allows you to treat a version of a dataset as if it were code.
    • @SMT_Solvers In F500s it is mostly document OCR. Usually batch jobs - an API wouldn’t work - you need the binaries on the server even if it is a sidecar Docker container. One org (not mine) blows $$ doing network transfer from AWS to GCP when GCP could license their OCR in a container.
    • @Forcebananza Flyte creates a way for all these teams to work together partially because writing workflows, writing reusable components… is actually simple enough for data scientists and data engineers to work with.
    • @kelseyhightower We’re now at a stage where we can start to leverage systems like to give us more of an opinionated end-to-end workflow. What we call #ML can become a real discipline where practitioners can use a common set of terms and practices.
  • How is Flyte tailored to “Data and Machine Learning”?
  • Production-Grade ML Pipelines: Flyte™ vs. Kubeflow Kubeflow and Flyte are both production-grade, Kubernetes-native orchestrators for machine learning. Which is best for ML engineers? Check out this head-to-head comparison.
  • MLOps Simplified: orchestrating ML pipelines with infrastructure abstraction. Enabled by Flyte
  • Who Let the DAGs out? Register an External DAG with Flyte (Chapter 3)


Azure ML

KServe Cloud Native Model Server

Data Science

Machine Learning workloads in kubernetes using Nix and NVIDIA

Other Tools

Debugging ML Jobs

  • Attach a Visual Debugger to ML-training Jobs on Kubernetes
    • As machine learning models grow in size and complexity, cloud resources are more and more often required for training. However, debugging training jobs running in the cloud can be time-consuming and challenging. In this blog post, we’ll explore how to attach a visual debugger in VSCode to a remote deep learning training environment, making debugging simpler and more efficient.
    • In this tutorial, you’ll deploy a local Kubernetes cluster with k3d, install the MLOps workflow orchestration engine Flyte, create a simple training workflow, and finally visually debug it using VSCode and debugpy


  • fepegar/vesseg Brain vessel segmentation using 3D convolutional neural networks
  • MEDICAL-DATA-PROJECT-END2END-WITH-FEW-MLOPS We are on a mission to transform medical data into actionable insights using the power of machine learning. Whether you are a data scientist, healthcare professional, or an enthusiast in the field, your contributions and ideas are invaluable to us. Join us in making a difference!

ML Courses

ML Competitions and Challenges


Click to expand!

MLOps Workflow Scheduler Poll


Click to expand!