Big Data and Kubernetes Big Data
Introduction
Apache Spark
- itnext.io: Migrating Apache Spark workloads from AWS EMR to Kubernetes
- towardsdatascience.com: How to guide: Set up, Manage & Monitor Spark on Kubernetes
- tomlous.medium.com: CI/CD for Data Engineers. Reliably Deploying Scala Spark containers for Kubernetes with Github Actions
- datamechanics.co: Apache Spark 3.1 Release: Spark on Kubernetes is now Generally Available
- dzone: Run and Scale an Apache Spark Application on Kubernetes Learn how to set up Apache Spark on IBM Cloud Kubernetes Service by pushing the Spark container images to IBM Cloud Container Registry.
- dzone: Quickstart: Apache Spark on Kubernetes See how to run Apache Spark Operator on Kubernetes.
- dzone: Running Apache Spark on Kubernetes This article covers using Spark on K8s to overcome dependency on cloud providers and running Apache Spark on Kubernetes.
- cloud.redhat.com: Getting Started running Spark workloads on OpenShift
- medium: Running Apache Spark on Kubernetes Using Spark on K8s to overcome dependency on cloud providers
- hevodata.com: Building Apache Spark Data Pipeline? Made Easy 101 🌟
- coderstan.com: Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors (Databricks Data+AI Summit 2022) In this case study, you will learn how Apple uses Spark and Kubernetes to process 380K+ jobs per day
- spot.io: Setting up, Managing & Monitoring Spark on Kubernetes
- levelup.gitconnected.com: Master SparkML: Practical Guide for Machine Learning Unleash the power of SparkML with our hands-on tutorial. Discover machine learning made easy and efficient.
Databricks
- aprenderbigdata.com: Databricks: Introducción a Spark en la nube
- Databricks es el nombre de la plataforma analítica de datos basada en Apache Spark desarrollada por la compañía con el mismo nombre. La empresa se fundó en 2013 con los creadores y los desarrolladores principales de Spark. Permite hacer analítica Big Data e inteligencia artificial con Spark de una forma sencilla y colaborativa.
- Esta plataforma está disponible como servicio cloud en Microsoft Azure y Amazon Web Services (AWS).
- docs.databricks.com: Use scheduler pools for multiple streaming workloads
- github.com/databrickslabs/ucx: Databricks Labs UCX Automated migrations to Unity Catalog