Blogs

Large Scale Image Processing with Spark through Fugue

Jan 15, 2023 · 6 min.

How Clobotics Runs Distributed Image Processing

Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask

Oct 4, 2022 · 6 min.

Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation

Why SQL-Like Interfaces are Sub-optimal for Distributed Computing

Aug 23, 2022 · 10 min.

Examining the limitations of the SQL interface

Why Pandas-like Interfaces are Sub-optimal for Distributed Computing

Jun 7, 2022 · 10 min.

A deep look at the assumptions of the Pandas interface

Introducing Fugue — Reducing PySpark Developer Friction

Feb 14, 2022 · 19 min.

Increase developer productivity and decrease costs for big data projects

Scaling PyCaret with Spark (or Dask) through Fugue

Jan 6, 2022 · 7 min.

Run PyCaret functions on each partition of data distributedly

Delivering Spark Big Data Projects Faster and Cheaper with Fugue

Nov 8, 2021 · 7 min.

Increase developer productivity and decrease compute usage for big data projects

Using FugueSQL on Spark DataFrames with Databricks

Nov 5, 2021 · 5 min.

Connecting FugueSQL with Databricks Connect

The Simple Guide to Productionizing Data Workflows with Docker

Sep 9, 2021 · 7 min.

An intro to Python packaging and Docker

Seamlessly Porting Python and Pandas Functions to Spark

Aug 9, 2021 · 5 min.

We can do this the easy way, or the hard way