In this blog post we will load the retail data into MySQL instance and analyze it using Structured Query Language(SQL). SQL is one of the most popular languages among developers,data engineers,data scientists and DBAs.
Data model for the Retail databse
This retail database is obtained from the tutorial that cloudera team provided as part of their Big data developer training .It has mainly 6 tables.
- Order Items
Download the retail_db database
cd /tmp wget https://gitlab.com/nitendragautam/samp_data_sets/raw/master/retail_data/retail_db.sql
Create database retail_db and user retail_dba for this Database
mysql -root -padmin mysql>CREATE database retail_db; mysql>CREATE user retail_dba identified by 'hadoop';
Grant the required permission to the retail_dba user
mysql> GRANT ALL ON retail_db.* to retail_dba;
Flush all the Privileges
mysql> flush privileges;
Login into mysql using the retail_dba user and load the retail_db data
Previously we had set up the
retail_dba user using the password
hadoop.Now we will login into the mysql using this
$ mysql -u retail_dba -phadoop mysql> USE retail_db;
Load the retail_db
Use the directory location where we downloaded the Retail Database
/tmp/retail_db.sql to load the file in MySQL databases.
It will load the data into the MySQL Instance as a
Display all the tables within the Database
mysql> show tables; +---------------------+ | Tables_in_retail_db | +---------------------+ | categories | | customers | | departments | | order_items | | orders | | products | +---------------------+ 6 rows in set (0.00 sec)
Give me the Unique Order date in which Order status is Pending Payment State
SELECT DISTINCT(Year(order_date)) FROM orders WHERE order_status='PENDING_PAYMENT';
Number of Customers whose Order status is Pending Payment
mysql> SELECT COUNT(*) as count FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE o.order_status='PENDING_PAYMENT'; +-------+ | count | +-------+ | 15030 | +-------+ 1 row in set (0.06 sec)
Number of Customers whose Order status is Closed
mysql> SELECT COUNT(*) as count FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE o.order_status='CLOSED'; +-------+ | count | +-------+ | 7556 | +-------+ 1 row in set (0.03 sec)
Get Customer details , order status who live in state of texas
SELECT c.customer_fname,c.customer_lname,o.order_date,o.order_status FROM customers c JOIN orders o ON(c.customer_id=o.order_customer_id) WHERE c.customer_state='TX' LIMIT 20;
Complete version of the Script
You can find the complete version of the script at below location.
To run this script you can follow below steps .
- Get the Script in you local environment
cd ~ wget https://gitlab.com/nitendragautam/blog_tutorials/raw/master/database/load_retail_data_mysql.sh
- Make the Script executable using Change Mode
cd ~ chmod a+x load_retail_data_mysql.sh
- Run the Script by passing your MySQL user name and password
./load_retail_data_mysql.sh -u <User_Name> -p<Password>
cd ~ ./load_retail_data_mysql.sh -u root -padmin