Nitendra Gautam

Analyze retail DB using Structured Query Language(SQL)

In this blog post we will load the retail data into MySQL instance and analyze it using Structured Query Language(SQL). SQL is one of the most popular languages among developers,data engineers,data scientists and DBAs.

Data model for the Retail databse

This retail database is obtained from the tutorial that cloudera team provided as part of their Big data developer training .It has mainly 6 tables.

  • Departments
  • Categories
  • Products
  • Order Items
  • Orders
  • Customers
Retail Data Model

Download the retail_db database

Retail Database SQL File

cd /tmp
wget https://gitlab.com/nitendragautam/samp_data_sets/raw/master/retail_data/retail_db.sql

Create database retail_db and user retail_dba for this Database

mysql -root -padmin
mysql>CREATE database retail_db;
mysql>CREATE user retail_dba identified by 'hadoop';

Grant the required permission to the retail_dba user

mysql> GRANT ALL ON retail_db.* to retail_dba;

Flush all the Privileges

mysql> flush privileges;

Login into mysql using the retail_dba user and load the retail_db data

Previously we had set up the retail_dba user using the password hadoop.Now we will login into the mysql using this retail_dba user.

$ mysql -u retail_dba -phadoop

mysql> USE retail_db;

Load the retail_db

Use the directory location where we downloaded the Retail Database /tmp/retail_db.sql to load the file in MySQL databases.

mysql>source /tmp/retail_db.sql;

It will load the data into the MySQL Instance as a retail_dba user.

Display all the tables within the Database

mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories          |
| customers           |
| departments         |
| order_items         |
| orders              |
| products            |
+---------------------+
6 rows in set (0.00 sec)

Give me the Unique Order date in which Order status is Pending Payment State

 SELECT DISTINCT(Year(order_date)) FROM orders WHERE order_status='PENDING_PAYMENT';

Number of Customers whose Order status is Pending Payment

mysql> SELECT COUNT(*) as count FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE o.order_status='PENDING_PAYMENT';
+-------+
| count |
+-------+
| 15030 |
+-------+
1 row in set (0.06 sec)

Number of Customers whose Order status is Closed

mysql> SELECT COUNT(*) as count FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE o.order_status='CLOSED';
+-------+
| count |
+-------+
|  7556 |
+-------+
1 row in set (0.03 sec)

Get Customer details , order status who live in state of texas

SELECT c.customer_fname,c.customer_lname,o.order_date,o.order_status FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE c.customer_state='TX' LIMIT 20;
Customer Details

Complete version of the Script

You can find the complete version of the script at below location.

Load Retail Data to Mysql

To run this script you can follow below steps .

  • Get the Script in you local environment
cd ~
wget https://gitlab.com/nitendragautam/blog_tutorials/raw/master/database/load_retail_data_mysql.sh
  • Make the Script executable using Change Mode chmod command
cd ~
chmod a+x load_retail_data_mysql.sh
  • Run the Script by passing your MySQL user name and password

Syntax: ./load_retail_data_mysql.sh -u <User_Name> -p<Password>

cd ~
./load_retail_data_mysql.sh -u root -padmin

Reference

Cloudera Excercises