SQL Notes

All SQL Topic Videos – https://www.scaler.com/topics/sql/

Sites to practice SQL question

  • Data Lemur – https://datalemur.com/
  • HackerRank – https://www.hackerrank.com/
  • LeetCode – https://leetcode.com/

# SQL Questions, Books, Practice, All Knowledge –

https://drive.google.com/drive/u/0/folders/1tycLNS-tkwVs9y32o5QhvqbRrHvScJz8

# Amit Sing LinkedIn Profile – https://www.linkedin.com/in/amit-singh-aba7b6201/

# SQL Grammar – https://github.com/antlr/grammars-v4/blob/master/sql/mysql/Positive-Technologies/MySqlParser.g4

/// HR Data Set – https://drive.google.com/drive/folders/1UQjPyGm96lqq_2feVw0_C3Za87AzwSPY

/// SQL Business Target Case Study – https://docs.google.com/document/d/1NAwI70yGQXPikfqxPh7TWSjEBoeFKkL8dLWrSfcxZcs/edit#

Window Function Concepts and Syntax – https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html

Install MySQL Video – https://www.youtube.com/watch?v=2c2fUOgZMmY

SQL Class 1 Notes

SQL Class 1

1. schema design of farmer market

2. concept of keys

3. types of relationship

4. intro to sql, types of sql

DB schema

Q. Where is Data Stored – DBMS

Database Management System

#Relational DBMS – RDBMS- collection of interrelated tables

Database – collection of interrelated tables

example transaction table to customer table

**************

Management System

a) set of operation that helps in managing and update the database

b) example of operation – insert, delete, modify, etc

Example – 

-Insert of a transaction record/row into a transaction database when a new transaction is made.

-Deleting the account data if customer deletes the account

-modifying the phone number of a customer.

**************

Type of Data

1. Structured (SQL) – Tables (data stored in rows and column format)

2. Unstructured (NoSQL)- 

Text – FB Posts, tweet, email, pdf

Image – sticker, photo

audio – songs, audio recording

video – gif, videos

***********

Relational DB – Oracle, MySQL, MS SQL Server, Postgre SQL

Unstructured DB – MongoDB, Casandra, Redis, DynamoDB

redis, dynamoDB – they store data as Key & value format

Key value format – 

1. Name:Rajesh, Qualification: B.Sc.

2. Name: Ramesh, Qualification: B.E., Age:27

***********

#Unstructured Database is also called – No SQL

**********

Another type of Database – Graph DB – Neo4j & Amazon Neptune

Here graph data means network (doesn’t mean bar chart or line chart).

example – friends network, facebook network, LinkedIn Connections, networking

***********

What is DB Schema?

DB schema’s other name – ER (Entity Relation) Diagram

it shows how data is stored. What data you get from the table. What columns are present in each table.

Return table – shows returned orders data

customer – age, address, phone, email

return – return date, reason, return order id

Product – brand, name, cost, color

We don’t store all tables together due to

redundancy (No longer useful), null value, storage, retreival, fast execution, performance.

It helps to efficiently use different data without using lots of storage.

**************

mainly, there are 3 data types moslty used in SQL (there are more also):

1. String 

– char (character) 

char (5) – means limit of character should exactly be 5 character – ROVIN, PRAVE (not more than 5 and less than 5)

– varchar (variable character)

Varchar (5) – Unlike char (5), it doesn’t need to be exactly 5 length, It can be 1 to 5 characters also, with maximum limit of 5.

2. Numeric – 

– Int (Integer) – whole number – 1, negative value

– float – decimal – 3.4

3. Data & Time – 

– Date – 8th Feb

– time – 11:30 pm

****************************

Keys:

Keys will help in identifying the unique record/rows.

It is like giving unique name to a record.

Types of Key:

  1. Primary Key – Only one most appropriate candidate key is primary key
  2. Candidate Key – Minimal set of attributes in super key
  3. Super Key – an attribute or set of attributes to uniquely identify each row.
  4. Foreign Key
  5. Composite key

Super Key – an attribute or set of attributes to uniquely identify each row.

example 

Type 1 super key

– customer id, 

– phone number

Type 2 super key

– name and age (set of attribute)(here we have used more than one attribute)

– cust id and name

– cust id and phone

– cust id and age

Type 3 super key

– cust id, name, phone (3 sets of attribute)

Type 4 super key

– cust id, name, phone, age (4 sets of attribute)

We can add many attributes to a super key as long as it is required to make it a super key.

************

Candidate key:

Minimal set of attributes in super key

– cust id

– phone no.

These are the only 2 possible key for candidate key.

If id and phone is not there, then may be two sets of attribute is required to make Minimal set of attributes

– name and age,

– name 

**************

Primary key

Only one most appropriate candidate key is primary key

It will be cust id.

We can’t choose phone no. as primary key because 

– phone no. may change

– some may not have phone no.

************************

Types of Relationship in Schema:

1. One to One

cust id, cust name

cust id, cust age

In both table, cust id cannot be more than one because same id can’t have two different name or different age.

It will connect one row to one row in both table.

2. One to many

customer info table and customer order table

one customer can have multiple orders

one row is connected to more than one row in order table.

3. many to One

reverse of “one to many”

student table and university table

many student will study in one university

multiple rows are connected to one row

4. many to many

subject table and university table

many subjects are taught in many university. It is not like one subject is only taught in one university. or One university is teaching only one subject.

multiple rows from first table connected to multiple rows of second table

*******************

What does line mean in ER diagram?

It means relationship 

one relationship

many relationship

**************

yellow bulb – primary key

filled diamond and empty diamond

**************

Big Query – Data Warehousing tool (Azure, etc)

DB vs Data Warehouse

-DB – stores raw data in tables (collections of table)

– Data warehouse – this is also a DB, but can be used for analytical operation

ETL – Extract transform load

E-Extract data

T-transform data(summing of each rows and showing as total; every sales of each day is summed to show daily sales)

L-load into warehouse

every transaction is converted to calculate daily sale (aggregated sales) aggregated data.

*******

Project<Dataset<Table

***************

SQL Class 2 Notes

Data Warehouse

It is also like a database of different tables – transaction table, cust table, product table, but it is aggregated to be used for analytical thinking.

The process is called ETL –

  • Extracting data from one database,
  • Transforming data into another type of data for analytical purpose,
  • Loading into another database

Where SQL sits in Data Science?

Data is extracted using SQL
Then Data is cleaned using Python, Excel
Then Data is Visualized and Analyzed using Tableau, Power BI.


Characteristic of Primary Key

  • It is unique
  • It is not Null

Characteristic of Foreign key

  • Primary key in one table and foreign key in another table.
  • There can be duplicate foreign key in one table, but same foriegn key cannot be duplicate in another table where it is acting as primary key.
  • Can be duplicate
  • can be null also

How to interact with DBMS? How to extract ?

By using SQL

SQL – Structured Query Language

Different types of SQL commands: (Specific SQL commands which we should master)

Category of SQL command:
DDL – Data Definition Language
DML – Data Manipulation Language
TCC – Transaction Control Language
DQL – Data Query Language – Most important for DA/DS
DCL – Data Control Language

DQL type of SQL command which is most important for Data analyst and data scientist.


DDL – Data Definition Language
Table level changes
-create table
-drop table
-alter table
-truncate table

DML – Data Manipulation Language
Row level changes

  • insert new record in a table (new registration)
  • update row (changing mobile number)
  • delete rows (deleting from cart)

TCC – Transaction Control Language

  • Commit –
  • Save Point (the point where all changes were saved like in video game or restore point)
  • Rollbakc (going back to the previous saved point)
  • Set constraint

DQL – Data Query Language – Most important for
DA/DS

  • Select

DCL – Data Control Language

  • Grant (grant access to a user)
  • Revoke (remove access of a user)

SQL basic extract syntax

Select all columns from table of this database.
Select *
From DB Name .| Table Name

  • mean all

select *
from DB name.product table

Q. How to extract product table from farmer market database

select * from farmer_market.product

Then we run the query and data is extracted.

How to extract specific column and not all column?
-we will not use “*” (star)

select
product_id, product_name
from
farmer_market.product

ctrl+shift+f – format query – it will arrange in proper reading syntax in structure.

run it – ctrl + enter


Always use column name than using select * because it will load all data which are very huge with thousands of rows and column. It takes lots of time and may even crash.

In organization, it is not recommended to use select all syntax it.


to understand the data, we need to see it. But large data may get crashed and we can’t see that data.

We can load sample of the data with only 10 or 20 rows. It will help us to understand the data. Sample data will also load faster. Then we can decide which column is required for our analysis.
we can also understand the data using sample data.


Use limit 10 in syntax to load only 10 rows

like this –
select
product_id, product_name
from
farmer_market.product
limit 10


Now we know what all columns are present in the table by seeing limited rows.

  • It helps in giving sample data to understand the schema or structure of the data.
  • Inexpensive computationaly – it does not need to fetch all the rows.

Problems of using select *

  • high loading time
  • huge data can crash
  • more memory consumption
  • production pipeline might break if there is an update

List down 10 rows of farmer’s market vendor booth assignment,
displaying the market date, vendor ID, and booth number from the vendor_booth_assignment table.

Limitting 10 rows with limited columns

select
market_date
vendor_id
booth_number
from farmer_market.vendor_booth_assignments


Key pair value is found in No SQL, so SQL does not process it.


Offset

limit 20
offset 10 – skip first 10 rows and show next 20 rows.

limit 3 – first 1,2,3 rows will be shown

limit 3
offset 2 – skip first 2 rows and show 3,4,5 rows


Sorting in SQL – order by

order by market_date
Then data will be sorted based on market date.

mention asc – ascending order which is by default.
mention desc – for descending order


nested sorting based on two column

order by market_date desc, customer_id asc


Sequence of sql syntax:
select
from
orderby
limit
offset


  • Alias – new column name or rename an existing column – “AS” – column 1 AS col1
  • Round – round off numeric value, – round(column,2) – 2 means 2 numbers after decimal.
  • Calculated field – new column – “column 1 * column 2”

.Key pair value is found in No SQL only (unstructured data), It cannot be present in structured data??
2.SQL does not process key pair value?


Junction table – middle table between three tables. first table and third table are connecting using the middle junction table.


Schema – Design of table- column/attribute of the table with data type of each attribute.

Primary Key – student id (only 1 in a table)
candidate key – name and city, name and department
(minimal set of super key) 2 1
superkey –
attribute or set of attribute to identify each rows

alternate primary key – eligible to become primary key, if primary key is absent.
composite key


purpose of date warehouse – analyse date, make prediction, help in decision making by combine different database.
purpoose of database – to store similar type of data

limit 5 – it will show first 5 rows

Using offset with limit
offset 1
limit 5
What it will show?
2nd row to 6th row

SQL Class 3 Notes

SQL Class 3 –

round – gives nearest whole number 11.02 – 11
ceil – give the highest whole number 11.02 – 12
floor – give the lowest whole number 11.02 – 11

ceiling of the house, floor of the house

How to remove .0?


how did you give quotes in shortcut?


concat(first_name, ” “, last_name)
to combine first and last name with space in between


code button on keyboard


function inside function

nested function

upper function
lower function

upper(concat(firstname,” “,lastname)) – function within function


capitalize function – initcap (specific to bigquery, not available in mysql, oracle, etc)

This function will capitalize all the first letter of all words, not only just for first word.


to pull out first 3 character from the word
substr(firstname,1,3)
it means pull out 3 character from 1st position
substr(firstname,2,4)
means pull 4 characters from 2nd position
substr(firstname,2,2)
means pull 2 characters from 2nd position
substr(firstname,2) – if we don’t give character length in the end, then it will pullout all the characters from that position
substr(column_name,starting position,length of character to pullout)


concat(substr(firstname,1,1), substr(firstname,2),” “,(substr(lastname,1,1), substr(lastname,2)


If you put minus in substr, then it will give value from the right side.


Filter

select all product name which are under product id 1

select
from DB.TableName
where

select *
from FM.Product
where product_cat=1
(where product_id > 3) – other way to find more than 3

single equal to “=” in sql, no need of doubt equal to “==”


New order of syntax after including where
select
from
< where
order by
limit
offset


select customer id
from farmermarkt.customer
where id = 4
order by market date, vendor id, product id
limit 5


multiple condition filter
more than 3, less than 8 and equal to 10

where product_id > 3 and product_id<8


find first name = carlos and last name = diaz

where firstname=”Carlos” – in this case is important and it should exactly match

so if the capital letter is no there or if it is small, then itwill not match, because it has to be exact match.


There is a trick here.
we can convert the first name into full upper or full lower case using upper or lower function.

Then we can put full upper or full lower character in the where condition.

where firstname=”carlos”
where firstname=”CARLOS”

Even if it is cArLoS, then it will be converted into carlos or CARLOS, then it will be easy to find. otherwise, w can check all the rows individually to find such unique cases.


If you wanted to find out what booths vendor 1 was assigned to on or before April 20, 2019

select * from farmert_market.vendor_booth_assignment
where vendor_id=1 and market_date =<2019-04-20


distinct function

distinct(booth_number)
also work without work


‘= , <= are correct form =>, =< are wrong form’


Group Study Notes 1

%- many character – “%n”
_(underscore) – single character

%n% – check from both side many character
n – check one character from both side

single quote and double quotes both work in MySQL and BigQuery.

exactly match
contains this word

IN – to find multiple specific words or number. ( It helps to find all words together without writing code separately for each word.)
Between – used for specific range of number

CAST
The CAST() function converts a value (of any type) into a specified datatype.
convert to integer data type (useful to remove decimal also-
cast(quantity as int)
cast(quantity as integer)

no comma in the last column before from. comma means additional column.

Like will only work with String.

Difference between = and like

Find null values

  • where manager_id is null
    where manager_id is not null

Left Join:
select salary,job_title,(salary*12) as Annual_sal from HR_Data.employees left join HR_Data.jobs on employees.job_id = jobs.job_id
where job_title like ‘%Clerk’

1, Calculate the salary increase of rupees 100 ?
2, Calculate the Annual salary with one-time bonus of rupees 1000 ?
3, Display annual salary of all Clerk ?
4, Display all employees with salary increase of Rs.500 whose earning salary less than (or) equal to 5000
5, Display all employees who’s name doesn’t start with A

SELECT *
FROM
HR_Dataset.employees
where first_name not like ‘A%’
order by first_name
LIMIT
100

6, Disp all the employees who joined the company in the year of 1996

SELECT hire_date,
concat(first_name,’ ‘,last_name) as full_name,
— substr(cast(hire_date as string),1,4) as datestring
— EXTRACT(ISOYEAR FROM hire_date) AS isoyear
FROM
scaler-dsml-sql-378013.HR_Dataset.employees
— where EXTRACT(ISOYEAR FROM hire_date) =1996
— where substr(cast(hire_date as string),1,4)=”1996″
LIMIT
1000


https://www.w3schools.com/SQL/sql_like.asp

trunc – only rows are deleted, column heading will remain
drop – delete the entire table including all rows and column
delete – delete any specific rows or record or specific data.

***********************

https://www.hackerrank.com/
Try SQL queries on hackerrank as well. For every question, there is a discussion section and there you will find different approaches and logics done by others for that particular questions you are solving.

You will earn certificate for same. you can mention in resume or LinkedIn Profile. even companies also hire people from HackerRank

***********************

For excel — follow this Youtube Channel

https://www.youtube.com/@chandoo_
amazing content , will surely help us a Data Analyst to tackle various problems

SQL Class 4 Notes

Between Function

Find the booth assignment for vendor 7 for market date between april 3 2019 and may 16 2019 including both dates.

Select *
from farmer_
where vendor_Id=7 and
market_date >= ‘2019-04-03′ and <=’2019-05-16’
order by market_date

format of date is year-month-date (2019-04-30) usually, but it also depend on the database. Check the format of date before creating syntax.

second way to solve it –

where market_date between ‘2019-04-03’ and ‘2019-05-16’

between means equavalent to <= and >=.


We will need to enclose date with quotes like string.
We dont need to enclose integer with quotes.


Q. Return a list of customer with selected last names – Diaz, Edwards and Wilson

select * from customer
Where customer_last_name = ‘Diaz’
OR customer_last_name = ‘Edwards’
OR customer_last_name = ‘Wilson’

Alternate method to do it: ( we will put lowercase and also lower the case for the matching name)

select * from customer
Where lower(customer_last_name) = ‘diaz’
OR lower(customer_last_name) = ‘edwards’
OR lower(customer_last_name) = ‘wilson’

IN Function

Another alternate method to do it: (By using IN Function)

select * from customer
where lower(last_name) IN (‘diaz’,’edwards’,’wilson’)

Here “,” (comma) is like OR condition. It means any one condition matches will be returned. It is not like AND, where both condition needs to meet.

We dont have to write name condition multiple times, it can be easily done using comma.

=(equal to) wont work, because it works with numeric value.


NOT IN Function – To filter out those lastname, earlier we were trying to see only those names,

  • where
    not lower(last_name) IN (‘diaz’,’edwards’,’wilson’)

Not will come before the column name


WildCards – % (Character Length is not Fixed, It can mean 0 to many characters)

  • starts with “man” – man% : Manas, Manual
  • ends with “man” – %man : Superman, Postman
  • contains “man” – %man% :

It is used to extract rows based on few matching character.


Underscore Wildcard – “_” (Fixed Length of Character)

It fixes the length of character. One underscore(_) means one character length.

If you want to find 4 characters, then you will need to put 4 underscores.

4 underscore means 4 characters only:
_ _ _ _ man – aquaman, ironman, postman.

Superman or spiderman cannot come in this.

q _man – Only aquaman


Like Wildcard –

where first_name like “%man”

we can't use '=' because for '=' it has to be exact match ("man").


Q. You want to get data about a customer you knew as “Jerry”, but you aren’t sure if he was listed in the database as “Jerry” or “Jeremy” or “Jeremiah”

Answer:

Where customer_first_name like ‘Jer%’

we can also use as ‘%jer%’ and it will still show same result, because % also means 0 character. If there are no character before ‘Jer’, then it will also show that.


Homework Question:
Question for Practice: Find all rows where middle most character is A in first name (doesn’t matter the length of firstname)??


Null Function

Question: Find all of the products from the product table without sizes.

(There are null and black(empty space) in the rows)
Answer:

Select * from ‘product’
where product_size is NULL
OR trim(product_size) = “”

Trim function


Try – use like ‘ %’ to find unlimited blank space without using trim.


COALESCE Funtion – To Replace Null Value

Null means absence of value.

It will find null value and replace it with the mentioned name.

Coalesce(product_size, ‘size not available’)

another way:
coalesce(product_size,product_name) –
It means it will repalce null value with product name which is mentioned in the same null row.

It means we can use this function to produce constant value also as well as dynamic value.


Homework Question:
can we pass two column name in this fn?? – Practice


Replace function

Replace(product_size, ” “, ‘this is a space’)

Replace(column_name, what to replace, replace with what)

It will replace even if it not exacly matching. It means:
If you put condition to replace red potato to blue potato, then:

  • Red potato – Blue potato
  • small red potato – small blue potato

  • Between ‘A’ and ‘G’ – It will show all the names starting between A and D.
  • Between ‘Ae’ and ‘De’ – show names starting between Ae and De.

Group Study Notes 2

Find Data Type Function

select ’employee_id’, ‘first_name’, ‘last_name’, ‘salary’
from employees
where salary between 5000 and 10000
and department in (50,10,80)
or
commission_pct


mod(col,2) = col %2 (can be also written as ‘col%2’)

— and mod(id,2)<>0
and id %3=0
order by rating desc


<>, != – not equal to

strpos function


1,Display Ram your hiredate is 08/01/1992

SELECT
concat(first_name,” – Your Hire Date is “,hire_date) as col1,
first_name, hire_date
FROM
scaler-dsml-sql-378013.HR_Dataset.employees
where first_name=”Lex”
LIMIT
1000

  1. Disp. Whose name has string N in 4th character using instr

select *
from HR_Dataset.employees
where instr(first_name,’n’,4,1)=4

3,Replace the employe name A to 5 and E to 6 and S to *

select first_name, last_name,
replace(replace(replace(upper(first_name),’A’,’5′),’E’,’6′),’S’,’‘) — replace(upper(first_name),’E’,’6′), — replace(upper(first_name),’S’,’‘)
— replace(original_value, from_value, to_value)
from HR_Dataset.employees

  • Here if we put replace three times to replace 3 different value, then it will show 3 different column.
    But we can put replace function within the replace function and all replacement will be done within one single column.

4,Could you please remove the rightside ‘S’ on String ‘SATHYAS’

  • rtrim(‘satyas’,’s’)

5,Can you Please put (Rs.) for all employees before salary

concat(‘Rs. ‘, salary)

  1. Name
    Mr. Satya – Satya
    Mrs. Sangeeta – Sangeeta
    Ms. Priya – Priya
  2. Disp employee whose name is exactly 5 characters

length(first_name)

  1. Find first name with odd character length:

select first_name, length(first_name) as len
from HR_Dataset.employees
where
mod(length(first_name),2)!=0
order by salary


Instr –

instr(first_name,’e’,1,2) as InStr,
— instr(source_value, search_value [, position[, occurrence ]])

position – what is the position of the letter (case sensitive)

occurence – The character can occur more than 1 time. It will display the position of same character in what position. Example – Steven – e is occuring at 3rd and 5 position. if you put 1, it will show the 1st position that is 3rd. If you put 2, it will show the 2nd position that is 5th.


STRPOS – It gives position of character mention for only 1st time. If the same character is mentioned in the string second time, it will not show. We can use INSTR to find out remaining positions of that character.


TRIM – trim from both side, It won’t work to remove inside character.
LTrim – Trim from Left Side
RTriM – Trim from Right side


It will add the character from left or right side within the maximum length mentioned in the formula.

Even if you add more character to add, it cannot show more than the mentioned length of the character in the formula.

LPAD(cast(salary as string),8,”Rs. “)

Lpad – add from left
Rpad – add from right


cast(columnname as String/Integer)

***********************************************************

Question: Count the ‘e’ in a string

Answer:

select first_name,

length(first_name) – length(replace(first_name,’e’,””))

from HR_Dataset.employees

SQL Class 5 Notes

  1. SUBQUERY

select *
from producttable

where product_category_ID in
(
select product_category_ID
from farmer_market.product_category
where lower(product_category_name) like %fresh%
)


Subquery –
It gives value from a table.

These values can be used as a condition in same table or another table.

Here the column name need not to be same. We just need the value which we can use as a value to use where clause (filter clause).

Example fetching rain date from rain table and checking the sales happened on rainy day by using rain date as filter in where clause.


One more thing, we don’t need to give table in big query, but it is necessary in MySQL. It is a good habit to put a name to each table created using sub-query.


If you are trying to fetch column name in outer query which is not present in the inner query, then it wont work.


  1. IF FUNCTION

Q. Sat & sun as Weekend, rest as weekday

If(If(market_date=”saturday”,weekend,weekday)=’sunday,’weekend,weekday)

This is how it is written in SQL for if condition:

if(lower(market_day) in (“saturday”,’sunday’), ‘Weekend’, ‘Weekday’)

Q. Sat & sun as Weekend, rest as weekday

If else if end

like that he was going to tell,


  1. CASE

select col
case statement
from table

case statement syntax –
case when amount <5 then “less than 5” when amount >5 and <19 then “between 5 & 19”
else “greater than 20”
END


Case when… then, when… then, else, END


It will give name according to condition given.


Case when then, when… then, else, END


1:40 time – check for how the case when was converted into subquery. I missed it.


Sites to practice SQL question

data lemur
hackerank
leetcode


if null – it will fill null value, only for one column at once

coalesce – it can fill null value in multiple column at once.

Coalesce(brand,manufacturer,designer)

If the brand is null, then it will fill with manufacturer,
If manufacturer is also null in that row, then it will take designer value in that row.

In this way, we can add as many as column, to remove the null value.


case when lower(vendor_type) like ‘%fresh%’
then ‘fresh producer’
else others’
end

vendor_type like ‘%fresh%’

You can also use if function becasuse there are two condition.

It is better to use if function for binary condition.

SQL Class 6 Notes

Subquery vs Join

Output will be only from one table in Subquery.

Output will be from more than one table n Joins.


Get details about the vendor selling fruites/veg with product name and the quantity in the inventory

use 3 table – vendor, vendor inventory, product


Self Join and Cross Join

id, employee, manager_id
1, Alan, 2
2, Murph, 3
3, John, null

Find the maanger of each employee

Alan’s manager = 2 = Murph
Murph’s manager = 3 = john
john’s manager = null


Self join splits one table into two table.


Cross Join

color table:
black
blue

Product table:
shirt
pant

Find all possible combination –
black shirt,
black pant,
blue shirt,
blue pant

(we don’t need common or matching column for cross join)

(null values will be ignored)

cartesian products – means all possible combincation


Left Join Syntax

select
from
product_table
left join
product_category_table
on

product.product_category_id

product_category_table.product_category_id

  • what table we write first becomes the left table.
  • second table written will become right table
  • then put the matching column condition to create relationship

Q. Question: list all the products along with their product category name.

select * from product as P left join product_category as PC
on product.product_category_id = product_category.product_category_id

it can be written with alias of table as
P.product_category_id=PC.product_category_id

Select zip code,
from customer_purchase join customer
where purchase_date= date


cp.* – it means it gives all columns from the cp table.


3 Table Joins

Q. Lets say we want all details about all farmer’s and details about booth and booth assignment for every market date
(also want to look at booth which were not assigned)

A.


if you mention only join, it will work as inner join
By default it is inner join.


List all the customers and their associated purchases?


If there are no common column, you cannot do join on table, but you can do cross joins.


in some questions, I saw joins on tab1.col != tab2.col
when this type of joins are used?


We can also use <> or != while matching column in joins.
It will put combine the rows which are not matching with each other.
A with B
, Not A with A


Join can be done using between also, it specify the range

SQL Class 7 Notes

27-Feb SQL Class

Union – No Duplicate
Union All – See Duplicate also

Things to note for Union:

  • Same Data Type
  • Same Number of Column
  • Same Order of Column
    (If the name of column are different, it will still union if number of column is matching.)

union distinct – big query
union – It is enough in MySQL

union all is same for both big query and mysql


Cross join – we simply type cross join without any on condition to match column


we can also join on basis of >=,<=,>,<,between,<>


between syntax

select * from toys join kids on age between min_age and max_age


Its better to join table in one to many relationship than many to many relationship.
The many to many relationship can multiply the data and increase the rows.
If you are doing analysis to find total sales, then total sales may increase in many to many table join and give wrong increased total sales due to increased number of rows.


when we have no unique columns or primary key column in both table, we can combine them using two column combination.

-Like country and product column taken together as candidate key


select * from inventory join product on i.country=p.country and i.product = p.product

-here we are mentioning one more matching column to join the table.


SQL Class 8 Notes

  • Inequi Joins – when we use greater than, smaller than or between, it is called inequi (inequivalent) joins. When we don’t use matching on equal to (=)
  • Equi Joins – When we join table on equal to (=)

Group By

In grouping, many same ID will be combined to get aggregated sum and make a list of unique ID.
There will be no duplicate ID. Probably, it can also be used as primary key in the table.


Syntax of Group By

Select Cust_ID # (Columns to Group)
Sales_Column (To get aggregate or sum of sales based on cust_id)
From Table name
Group By Cust_id #(Grouping based on)

#

Select cust_id, sales
from table_name
group by cust_id


We should try to keep the column name in select statement which we use in group by function.


Alternatively, we can use Distinct function also to get same result because we will get unique values.


Order of syntax with group by function

select
from
join
where
group by
order by
limit
offset


What if there is null value in it when we use group by?


we can also use sum, count, min, max, avg

  • select sum(quantity) from table = sum of quantity
  • select count(quantity) from table = count of quantity or rows
  • select max(quantity), min(quantity), avg(quantity) from table = shows maximum, minimum and average of quantity

Q. Get a list of customer who made purchase on each market date.

select * from customer_purchases
group by customer_id, market_date

so, if it not numeric, we have to add in group by ?
If its dimension, then we need to add that column in group by clause.

If its measure, then we can get aggregation, then we don’t need to add measure column in group by clause.


count(*) in select clause – will show count of all rows.


select order, value, 10
from table

then it will give 10 value for each rows


when there are null value in the column, then it will not count null


Practice union distinct and union all


Null value can also be grouped


Whenever we use group by, we need to use aggregate -sum,min,max,count, etc


SQL Class 9 Notes

04-March SQL Class

Group by is followed by some kind of aggregation like sum, avg, etc.


Difference between Distinct and Group by?
A.

  • Distinct will only give one distinct column and will not aggregate.
  • Group by will aggregate the value with distinct value.

When we have to use filteration on aggregation or groupby , then we use Having.

We can use where clause only where there is no aggregation or group by done.


Having will always be used with group by.
Having is just similar to where clause, but it will only work in aggregated table and where clause on non-aggregated table.


avg(salary) over ( partition by department) as dept_wise_salary

Over will create new column

partition will create total department wise

running total will happen

unbounded preceding means all above will be taken.


over(order by)

range will do sum of all duplicate dates, if we order by date. It will take total of each day in calculation.


IfNull(value,null)

SQL Class 10 Notes

There are two type of windows function:

  1. Analytical
  2. Aggregation

over(partition by vendor_id) (order by price)

Optional:

partition by – is similar to group by

  • It means group on what column
  • It is like pane down, without this, it will be like table down.

order by

  • it means order on what column

Row_number()

It will give row number based on partition.


Rank () – give rank but eat up next ranking based on count of ranking already given.

DenseRank() – It will give same rank to same value, but won’t eat up ranking


Over function is useful to see aggregated value without decreasing granularity.


we can do partition by based on more than one column.

partition by vendor_id, market_date


by default – it will be –
range between unbounded preceding and current row

variations in this-

range between –

  • unbounded preceding and unbounded following
  • unbounded preceding and current row
  • current row and unbounded following
  • 1 row preceding and 1 row following

Rows between unbounded preceding and current row


Order of Execution:

From
Where
Group
Having
Window
Select
Distinct
Union
Order By
Offset
Limit

before all these sub-query will be executed in the order of inner query and outer query.


  • use aggregation with both group by and having function such as sum, min, max, count, etc

SQL Class 11 Notes

window frame means how much range or row we are using for our calculation such as:

  • rows between unbounded preceding and current row.

ntile(amount_column,2) over(order by amt_column)

10 1
10 1
20 1
40 2
50 2
60 2


nth_value

nth_value(salary,2) over (partition by dept order by salary desc)


It will create a new column, where the matching value will be shown in the column.

It will show value based on row number, not rank wise.

it means if first two rows have same value, then

nth value will look value row wise, not rank wise.

Its actually range. 2nd row having 18 then it is counting the range in first row also.

If you change to row between unbounded preceding and current row, then first row will be null even it is same to nth value.


first_value – shows value in first row

last_value – shows value in last row


when using last_value, every row is last value in this syntax –

last_value() over(order by col)

because in every row, it is the last row

It is due to the window framing which is –

range between unbounded preceding and current value


lag() – which column should be used

lag(column,n) – it gives value from previous row of a column

column – which column to look for value
n – value from how many previous row required


lead – it gives leading value means the value which is ahead. It is like opposite of lag function.

If you use lead(column,1), then it will show null in last row, then show last value in second last row.


when you get series of value, it is called analytical function.

when you get one value, then it is called aggregation function.


over function is always used with window function


is this right ?
lag(sale,1) over(partition by date)

where –
sale>previous sale or
sale<previous sale


Today we learned:
lag(col,n)
lead(col,n)
first_value()
last_value()
nth_value()
ntile(col,2)

SQL Class 12 Notes

Date & Time

  • we need to convert to date data type, how to convert?
  • sometimes the format of date will be different in different data.
  • how sql will know which is year, date, month ?
  • then date format will change after concat ?

Table creation syntax

create table database.tablename as

(select * from table)

—- whatever table is there in the bracket will become a permanent table.


parse_datetime(
%Y-%m-%d %I:%M %P

Y – Year
m – Month
d – Date

I – Hour (12 Hour format)
(H-24 hour format)
M – Minute


After converting, it will show T in the middle, which means Timestamp.


SQL Time syntax code:

https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements


How to extract date ?

extract(date from date_column)

extract(time from

year from

month from

day from

week from

dayofwek
hour
minute
second
nanosecond
quarter


Month Name –
format_datetime(“%B” ,column_name)
as month_name

Day Name –
format_datetime( ,column_name)
as day_name


parse will give only numerical output

1:00:00 – check 2 mins before this related to calendar explaination

1:05:00 – check this also, before this what happened

1:23:00 – Check this from 2 mins back…


can we change the date format to DD-MM-YYYY?


Timeadd(datetime_column,interval like 30 minutes)

this adds 30 mins to datetime column.

time_add(Market_start


date_diff – date difference

date_diff(biggerdate,smaller date, output(week/day)

time_diff – time difference


current_date() – shows current time (it takes server time)


syntax to get data type of a column ??

may be describe


output of concat will always be string ?

SQL Class 13 Notes

INTERSECT – GIVE COMMON VALUE FROM BOTH COLUMN


Including
Using (ID) – It can be used instead of on table1.ID=table2.ID

Intersect – Gives common values in same column from different table. It is like Union but without uncommon values.

we can also use sub query without usin select * from and directly start from table name.


Rank – 1,2,2,4
Dense Rank – 1,2,2,3
sum(salary) over(rows between unbounded preceding and current row) as running_total


Over() function different use case:

  • sum(salary) over() as total_salary,
  • sum(salary) over(partition by department_id) as dept_wise_total_salary,
  • sum(salary) over(order by department_id) as dept_wise_running_total_salary,
  • sum(salary) over(partition by salary order by start_date) as running total in date wise order within salary group.

column_name %2=0 – odd number
column_name %2=1 – evennumber


Date_add (date, interval 15 day)
Date_sub (date, interval 15 day)
Time_add
Time_sub


Class date – 15-March

CTE – Common Table Expression

  • Structure your query
  • Hold the data temporarily until query execution completes
  • Its like creating temporary table.

Syntax:

with table_name as (query 1),
table_name2 as (query 2 using query 1),
table_name3 as (query 3 using query 1 and query 2)

select * from table_name3


the inner query cannot get result from outer query. outer query can get result from inner query. Like a water flow….

It is similar to sub query, but not in a hierarchial way. Here query are separated as separate table.


Here alias will not work without ‘as’.


Every scan costs dollar $.

In AWB, 1 TB row scan, it will cost 5$-6$.

If it is 10 TB data, then it will cost 50$-60$ just for scan.


create table
partition by
for every year


Indexes

where year = 2022 and cus_i=’Aaron’

grouping similar element together (sorting)

  • This is also has to be done while creating a table.

Example of Partition and Indexing –

Partition –
If there are 7 books of Harry Potter, we want to read 3rd book. We directly go and take the 3rd book.

Indexing –
Then if we want to read 10th chapter of the book, we see the page no. in the index which is in the beginning of the book and directly go to the page no. to read 10th chapter.


Writing efficient queries:

  1. select only columns you need

1 column – 1 MB
100 column – 100 MB

  1. Always use the where clause

select column from table
where col1= (may be partitioned)
col 2 = (may be indexed)

  • always use time period wherever possible
  1. Use limit function to explore the data
  2. Joining table 1 and table 2
  • first filter using where clause then do table join.
  • shrink the table as much as possible before joining
  1. Wild card use – % or _ (underscore)

Avoid using wildcard in the beginning.
It will scan every row – %A

But if you mention A%, it will only scan A starting names.


  1. use order by in the last query, it is not need to be done in sub query.

  • data lemur

SQL Target Case Study Project Session Notes

Business Case: Target SQL Intro | Lecture

  • connected all the dots of each topic learned in SQL and solve some real business problems.
  • suraaj @scaler.com
  • Minimum 70% submission
  • only 1 submission allowed.
  • business case required to be eligible to sit in placement.

Pattern of Solution: Question, Query, Table Screenshot, Insight

Q1.
a. put the question
Paste the query and not screenshot of query.

then screenshot of the table with first 10 rows.

write the insight below the query and table output

convert into pdf in the end.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top