Apache Spark

Technical Article

Improving Performance In Spark Using Partitions

  • DatabaseWeekly

In this blog post we are going to show how to optimize your Spark job by partitioning the data correctly. To demonstrate this we are going to use the College Score Card public dataset, which has several key data points from colleges all around the United States. We will compute the average student fees by state with this dataset.

You rated this post out of 5. Change rating

2019-04-12

Blogs

A New Word: Vicarous

By

vicarous – adj. curious to know what someone else would do if they were...

SQL Server Cross Platform Availability Groups and Kubernetes

By

Say we have a database that we want to migrate a copy of into...

Using Managed Identities with Azure SQL DB

By

We are trying to get apps and users off of using SQL accounts to...

Read the latest Blogs

Forums

We Stink!

By Grant Fritchey

Comments posted to this topic are about the item We Stink!

View works for me ...but doesn't return results for a user in SSMS but no errors

By krypto69

Hi I have this view to check if a job is running:   SELECT...

Dark mode, other color schemes

By mjdemaris

All, if you are like me and do not care for the built-in color...

Visit the forum

Question of the Day

Internal Checkpoints

Certain internal SQL Server actions cause internal checkpoints. Which of these actions does not cause an internal checkpoint?

See possible answers