Database and database technology have a major impact on the numerous technical fields where computres are used. Some of the most popular areas include business,electronic commerce,engineering, medicine,genetics,law education and libraries.
Database is collection of related data which has certian implicit meanings. Some of the implicit properties of database is given below.
- A database represents some aspect of the real world.Changes to the miniworld are reflected in the database.
- Database is logically coherent collection of data with some inherent coherent meaning.If we have random assortment of data, we cannot referred to it as database.
- Database is designed, built and populated with data for a specific purpose.It has an intended group of users and some preconceived applications in which users are interested.
A Relational Database Management System(RDBMS) is a computer-software application that interacts with end-users, other applications, and the database itself to capture and analyze data. It is a collection of programs that enables users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications.
A database is called a relational data model as it represents the relationship between one or more databases. The relationship is known as the relational database model. It provides flexibility and allows one database to be in relation with another database. It can access the data from many databases at one time over the network.
An application program accesses the database by sending queries or requests for data to the DBMS. A query typically causes some data to be retrieved; a transaction may cause some data to be read and some data to be written into the database.
Relation in a Relational database model
Relation in the relational database model is defined as the set of tuples that have the same attributes.
Tuple represents an object and also the information that the object contains. Objects are basically instances of classes and used to hold the larger picture.
Relation is described as a table and is organized in rows and columns. The data referenced by the relation come in the same domain and have the same constraints as well. Relations in the relational database model can be modified using the commands like insert, delete etc.
Principles of relational database model
There are mainly two principal rules for the relational model.
- Entity integrity: this is used to maintain the integrity at entity level
- Referential integrity: it is used to maintain integrity on all the values which have been referenced.
The differences between them are as follows:
- Entity integrity tells that in a database every entity should have a unique key; on the other hand referential integrity tells that in the database every table values for all foreign keys will remain valid.
- Referential integrity is based on entity integrity but it is not the other way around.
For example: if a table is present and there is a set of column out of which one column has parent key set
Relationship in Database
Relationship in database defines how different tables are connected together.
There are mainly 4 types of relationship in database
- One to One Relationship
- One to Many Relationship
- Many to One Relationship
- Self-Referencing Relationship
A Natural (Business) Key is the column or column(s) in the source data that identify a unique row.It can be exposed on a business report and have business value in the company.
A Surrogate Key is an artificially produced value, most often a system-managed, incrementing counter whose values can range from one to n. It has no business meaning and should not be exposed in reports.
Surrogate vs Natural key
It is recommended to use the natural keys of the table throughout the ETL process.Assigning and maintaining a surrogate key is complex and leads to lot of errors.If invalid surrogate keys are assigned ,it might cause severe data integrity issue in the data ware house .Upon re-loading a table ,it is extremely difficult to guarantee the same Surrogate Key for a particular row.This complexity and risk can be eliminated by using natural keys and keeping the surrogate key from source system as data attribute .
Isolation level in DataBase
There are four isolation levels
- Read uncommitted or Read Dirty Buffers
- Read Committed or repeated reads
- Phantom reads
Synonyms in Database
A synonym is an alias or alternate name for a table, view, sequence, or other schema object. They are used mainly to make it easy for users to access database objects owned by other users. They hide the underlying object’s identity and make it harder for a malicious program or user to target the underlying object. Because a synonym is just an alternate name for an object, it requires no storage other than its definition There are two major uses of synonyms:
- Object invisibility: Synonyms can be created to keep the original object hidden from the user.
- Location invisibility: Synonyms can be created as aliases for tables and other objects that are not part of the local database.
Types of JDBC Statements
1.Statement -It is Used for SQL Queries
It is mainly used for repetitive SQL queries .We cannot use Stored procedure in Prepared Statement . Instances of Prepared Statement contain an SQL statement that has already been compiled. This is what makes a statement “prepared”
Because Prepared Statement objects are precompiled, their execution can be faster than that of Statement objects. The prepared statement is used to execute sql queries
A Callable Statement object provides a way to call stored procedures in a standard way for all RDBMSs. A stored procedure is stored in a database; the call to the stored procedure is what a Callable Statement object contains. It is Used for Stored Procedures and functions
Junction table is a place to store attributes of the relationships between two lists of entities.
A junction table allows us to create the many-to-many relationship and prevents us from adding duplicate entries
Performance tuning of Queries
- Do delete and update in batches
- Tracing facility that enables you to identify bottlenecks in indexing and querying.
- Add Optimizer hints Provide optimizer hints in the SQL that indicate:
- which optimizer mode to use
- whether or not to use available indexes
- the order in which tables are to be joined
- the method by which Oracle should join the tables
Database Performance Optimization
- Optimizer mode
- Use of Index
- Type of Join and Driving Table
- Order of Joins
It outstrips traditional data integration (DI), which tends to focus primarily on the selection, movement, and transformation of strictly-structured data from OLTP or other, similar sources.
Stages of Data Wrangling Steps/Stages
- Discovering what is actually in each of the data sets to determine how they can be used as part of an analysis.
- Assessing the data quickly, to ensure the fit and accuracy of data for analysis to avoid erroneous analytic results downstream.
- Shaping the data to ensure that it fits the requirements and is at the appropriate level of aggregation for the specific analysis being performed.
- Enriching the content of an individual data set by joining multiple data sources together to provide additional context, or create derived fields with calculations that highlight business opportunities or gaps.
- Distilling the data into the format required by the analysis or the downstream analytics tool being used.
Database cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records. A cursor can be viewed as a pointer to one row in a set of rows.
An index in database creates an entry for each value which makes it faster to retrieve data. It is one of the performance tuning method in database that allows faster retrieval of records from the table.
Types of Indexes
There are 3 types of Indexes in Database
- Unique Index
- Clustered Index
- NonClustered Index
User Defined function in Database
User defined functions are the custom functions that can be used in queries.
- Scalar Functions.
- Inline Table valued functions.
- Multi statement valued functions.
Aggregate Functions in SQL
An aggregate function is used to perform a calculation on a set of values to return a single scalar value. We often use aggregate functions with the GROUP BY and HAVING clauses of the SELECT statement.
Below are the most commonly used SQL aggregate functions:
AVG – calculates the average of a set of values. COUNT – counts rows in a specified table or view. MIN – gets the minimum value in a set of values. MAX – gets the maximum value in a set of values. SUM – calculates the sum of values.
View in SQL
View is like a subset of table which are stored logically in a database. A view is a virtual table. It contains rows and columns similar to a real table. The fields in the view are fields from one or more real tables. Views do not contain data of their own.
CREATE VIEW <view-name> AS SELECT <column_name1,column_name2> FROM <table_name> WHERE <condition>
Constraints in SQL
Constraints are rules you can place on columns or tables to limit the type of data that can be entered into a table. This prevents errors and can improve the accuracy and reliability of the database as a whole.
Common constraints include:
|NOT NULL||It prevents a column from having a NULL value|
|DEFAULT||It specifies a default value for a column where none is specified|
|PRIMARY KEY||It Uniquely identifies rows/records within a database table|
|FOREIGN KEY||Uniquely identifies rows/records from external database tables|
|UNIQUE||It makes sure that all values are unique|
|CHECK||Checks values within a column against certain conditions|
|INDEX||Used to quickly create and retrieve data from a database|
WHERE vs HAVING in SQL
WHERE clause is used when there is a need of condition on individual rows .Where clause is used to fetch data from database that specifies a particular criteria .It cannot be used with aggregrate function.
HAVING clause is used when we need to introduce conditions on aggregrate results . Having clause is used along with ‘GROUP BY’ to fetch data that meets a particular criteria specified by the Aggregate functions. It can be used with aggregrate function
DELETE vs TRUNCATE in SQL
DELETE command is used to remove rows from the table, and WHERE clause can be used for conditional set of parameters. With the execution of DELETE command it is possible to commit and Rollback
TRUNCATE removes all rows from the table such that it cannot be rolled back.
Standard SQL commands
The basic SQL commands can be organized into the following categories:
1. Data Manipulation Language (DML)
Data manipulation Language or DML is used to access or manipulate data in the database.
- INSERT: Creates records. The “Create” in CRUD.
- SELECT: Retrieves/fetch records. The “Read” in CRUD.
- UPDATE: Modifies records. The “Update” in CRUD.
- DELETE: Deletes records. The “Delete” in CRUD.
2. Data Definition Language (DDL)
- CREATE: Creates a new object.
- ALTER: Alters an existing object.
- DROP: Deletes an existing object.
3. Data Control Language: (DCL)
- GRANT: Grants privileges to users.
- REVOKE: Revokes privileges previously granted to a user.
Difference between CHAR and VARCHAR2 datatype in SQL
Even though both of these data types are used for characters ,
VARCHAR2 is used for character strings of variable length whereas
CHAR is used for character strings of fixed length
Sub query vs Coorelated Query
A subquery is a select statement that is embedded in a clause of another select statement. An Example
SELECT employeename ,salary FROM employeeTable WHERE salary > (SELECT salary FROM employeeTable WHERE employeename = 'Nitendra');
A Correlated subquery is a subquery that is evaluated once for each row processed by the outer query or main query. Execute the Inner query based on the value fetched by the Outer query all the values returned by the main query are matched. The INNER Query is driven by the OUTER Query.
An Correlated Query Example:
SELECT employeeNumber ,salary from branchid FROM employeeTable et WHERE salary =(SELECT AVG(salary) FROM employeeTable WHERE branchid = et.brachnid);
First inner query executes and finds a value which is again used by outer query to execute .
Difference among UNION, MINUS and INTERSECT
- UNION combines the results from 2 tables and eliminates duplicate records from the result set.
- MINUS operator when used between 2 tables, gives us all the rows from the first table except the rows which are present in the second table.
- INTERSECT operator returns us only the matching or common rows between 2 result sets.
Difference between clustered and a non-clustered index?
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
A non clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.
Transposing a table using SQL (changing rows to column or vice-versa)
In in SQL we can use CASE statement or DECODE statement to transpose a table
Advantages of using Stored Procedures
- It can reduced network traffic and latency, boosting application performance.
- Stored procedure execution plans can be reused, staying cached in SQL Server’s memory, reducing server overhead.
- It help promote code reuse.
- It can encapsulate logic. You can change stored procedure code without affecting clients. It provide better security to your data.
Fundamentals OF Database Systems Sixth, Edition