Analyzing ‘Google Play Store Apps User Reviews’ dataset using
Spark SQL and DataFrames by leveraging Databricks Cloud Platform for Spark.

Steps to create a cluster on Databricks:

  1. Go to clusters tab, click on create cluster

Upload dataset on Databricks file system:

Databricks has its own distributed file system.

  1. Select the data tab. Hit add data.

Filtering and cleaning the Data:

  1. Read data as text file and store it as RDD of strings.

Now let’s analyze the data using spark SQL

SQL Query: Plot graph Rating vs Installs. Generally, if Ratings are high installs for that App are high.

Rating Vs Installs
Normalized Data
Most Popular Apps
Most Popular Games



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Bhushan Gosavi

Bhushan Gosavi


Big Data Engineer. Software Engineer. Datastax Certified Cassandra Developer. (