Dask Tool | Data Revenue

517

Github issues

6600

Github stars

1

Days since last commit

2569

Stackoverflow questions

517

Github issues

6600

Github stars

1

Last commit days ago

2569

Stackoverflow questions

Dask in one line: Dask works with data that's too big for memory – and still uses Pandas syntax.

What is Dask?

Dask is a Python library for data wrangling and parallelization. It helps with scaling and parallelizing programs. It’s used to create programs that deal with big data or have long computation times, making it easier to distribute these workloads across multiple computer cores, either on a single machine or across a compute cluster.

Dask aims to be both simple and powerful. It implements the same APIs as Pandas, NumPy and scikit-learn, which many programmers already know well. This means Dask code is likely faster to write and easier to read than other frameworks, such as Spark or Hadoop, which are also often used to scale software across compute clusters.

What problems does Dask solve?

When choosing a framework for a software development or data analysis task, you usually have to make tradeoffs. Frameworks like Spark often bring you more power and scalability, but at the cost of added complexity. You could call this “making the easy things hard and the hard things easy.” You wouldn’t want to use Spark to read a CSV file of 1000 rows: it would create a lot of overhead and complexity, and it wouldn’t offer any benefits. You’d use Pandas instead. But you wouldn’t want to use Pandas to read and analyze 1 million CSV files per hour. It wasn’t designed to be used on that scale.

Dask aims to find a middle way in this tradeoff: the simplicity and familiarity of Pandas, NumPy, and scikit-learn, with Spark-like power to parallelize across cores.

Used by:

Oxylabs, Data Science, Gitential

Dask

Dask and RAPIDS: The Next Big Things for Data Science and Machine Learning at Capital One

Ryan McEntee and Mike McCarty, Capital One

Dask

517

6600

1

2569

517

6600

1

2569

What is Dask?

What problems does Dask solve?

Top articles for

Dask

Comparison of Dask and Spark

From chunking to parallelism: faster Pandas with Dask

Why and How to Use Dask with Big Data

Dask and RAPIDS: The Next Big Things for Data Science and Machine Learning at Capital One

More

Data Wrangling

tools

vaex

Modin

Spark

Pandas

vaex

Modin

Spark

Get Notified of New Tools

Dask

517

6600

1

2569

517

6600

1

2569

What is Dask?

What problems does Dask solve?

Share the tool

Top articles for

Dask

Comparison of Dask and Spark

From chunking to parallelism: faster Pandas with Dask

Why and How to Use Dask with Big Data

Dask and RAPIDS: The Next Big Things for Data Science and Machine Learning at Capital One

More

Data Wrangling

tools

vaex

Modin

Spark

Pandas

vaex

Modin

Spark

Get Notified of New Tools