This was my revision notes when I prepared for the final exam of ISOM 3260 Database Design and Administration. This is transcribed from my old website so the formatting might be off. The content might not be updated as well.
This is served as a reference only, and I have no intention to update this note.

Section 1 - Logical Database Design (Relational Database)

Relational Data Structure

A relation is a named, two-dimensional table of data. Each relation consists of a set of named columns and an arbitrary number of unnamed rows.
A primary key is an attribute or a combination of attributes that uniquely identifies each row in a relation.
- A composite key is a primary key that consists of more than one attribute.
A foreign key is an attribute (possibly composite) in a relation that serves as the primary key of another relation.

Properties of Relations

Each relation in a database must have a unique name.
Each entry at the intersection of each row or column is atomic (or single valued). No multivalued attributes are allowed in a relation.
Each row is unique.
Each attribute within a table must have a unique name.
The sequence of columns is insignificant.
The sequence of wors is insignificant.

Integrity Constraints

The relational data model includes several types of constraints, or rules limiting acceptable values and actions, whose purpose is to facilitate maintaining the accuracy and integrity of data in the database.
Domain Constraints: to ensure the entries of an attribute are within a domain.
- A domain definition may consist of the data type, data length, allowable values, or allowable range.
Entity Integrity: to ensure entity instances can be told apart from one another.
- Each entity instance (row) has a valid primary key (with non-null components).
- Null: a value that may be assigned to an attribute when the applicable value is unknown.
Referential Integrity: to ensure the consistency amongst rows of two relations.
- Each foreign key value must match a primary key in the referred relations or the foreign key value must be null.
- Note that when the relationship is compulsory, the foreign key value must not be null. A NOT NULL constraint should be specified when defining the table. (See section 3A)

Transforming (E)ER Diagrams Into Relations

Many CASE tools automatically perform the transformation steps. However, CASE tools do not automatically capture the correct business rules, and therefore a quality check must be performed manually.

Step 1: Mapping Strong Entities

Convert the strong entity into a relation.
Each simple attribute of the entity type is stored as an attribute of the relation.
Only the simple components of the composite attribute are included in the new relation as its attributes.
When the strong entity type contains a multi-valued attribute, two new relations are created.
- The first relation contains all attributes except the multi-valued attribute.
- The second contains the primary key of the previous relation (which is now a foreign key as well), and the multi-valued attribute. They form the primary key of the second relation.

Step 2: Mapping Weak Entities

Becomes a separate relation with a foreign key taken from the identifying entity
Primary key composed of
- Partial identifier of weak entity
- Primary key of the identifying relation (also a foreign key)

Step 3: Mapping Binary Relationships

Note that one of the goals of transforming entities into relations is to reduce data redundancy.
One-to-Many: primary key on the ‘one’ side becomes a foreign key on the ‘many’ side.
Many-to-Many: create a new relation with the primary keys of the two entities as its primary key.
One-to-One: primary key on the mandatory side becomes a foreign key of the optional side. (Reducing the need to store null values in the foreign key.)

Step 4: Mapping Associative Entities

Associative entity with no pre-defined identifier: default primary key of the associative relation is the primary keys of the two entities.
Associative entity has a pre-defined identifier:
- When default primary key cannot uniquely identify instances of the associative entity
- When it is natural and familiar to end-users
- For the ease of programming

Step 5: Mapping Unary Relationships

The fundamental principles are the same as mapping binary relations.
One-to-One/One-to-Many:
- Recursive Foreign Key in the same relation
- A foreign key in a relation that references the primary key values of the same relation.
Many-to-Many:
- one for the entity type
- one for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity.

Step 6: Mapping $n$ -ary Relationships

Create $n+1$ relations: $n$ for the original entities, $1$ for the associative entity.
Associative entity has foreign keys to each entity in the relationship.
Notice the default primary key may not be sufficient in uniquely identifying entity instances. $\rightarrow$ reconsider the primary key!

Step 7: Mapping Supertype/Subtype Relationships

Supertype/Subtype relationships are not directly supported by the relational data model. Therefore, this describes the most commonly employed strategy:
- Create a separate relation for the supertype and each subtype.
- Supertype attributes go into supertype relation.
- Unique subtype attributes go into each subtype relation; primary key of supertype relation also becomes the primary key of subtype relation.

Well-structured Relations (Not Tested)

A well-structured relation is a relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies.
A high amount of redundancies in a relation may result in errors or inconsistencies (called anomalies) when a user attempts to manipulate the data in the relation.
- Insertion Anomaly: the unnecessary insertion of duplicated data or null values into a poorly designed database.
- Modification Anomaly: redundant data needs to be modified at the same time but the modifications are not applied on all of the concerned records or relations accordingly.
- Deletion Anomaly: the deletion of records in which critical data is lost or data consistency is affected in a database.

Normalization

Normalization is the process of successively reducing relations with anomalies to produce smaller, well-structured relations.
Goals of normalization:
- Minimize data redundancy, thereby avoiding anomalies and saving storage space.
- Simplify the enforcement of referential integrity.
- Make it easier to maintain data (insert, update, and delete).
- Provide a better design that is an improved representation of the real world and a stronger basis for future growth.
Normalization makes no assumptions about how data will be used. It defines rules of business based on normal forms and functional dependencies.
Normalization is a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.

Functional Dependencies

A functional dependency is a constraint between two attributes or two sets of attributes.
- For any relation $R$ , attribute $B$ is functionally dependent on attribute $A$ if, for every valid instance of $A$ , that value of $A$ uniquely determines the value of $B$ . Denoted $A\rightarrow B$ .
- In another word, if you know the value of $A$ , there can only be one value for $B$ .
- The attribute on the left side of the arrow in a functional dependency ( $A$ ) is called a determinant.

Candidate Keys

A candidate key is an attribute, or a combination of attributes, that uniquely identifies a row in a relation. (They are qualified as the primary key.)
One of the candidate keys will become the primary key.
Each non-key attribute is functionally dependent on every candidate key.

First Normal Form (1NF)

A relation is in first normal form (1NF) if and only if -
- ~~there are no repeating groups in the relation (thus, there is a single fact at the intersection of each row and column of the table), and~~
- ~~a primary key has been defined, which uniquely identifies each row in the relation.~~
- there are no multi-valued attributes, and
- every attribute value is atomic.

Second Normal Form (2NF)

A relation is in second normal form (2NF) if and only if -
- it is in first normal form, and
- it contains no partial functional dependency. If such a dependency exists, decompose the table into smaller ones.
Partial functional dependency exists when a non-key attribute is functionally dependent on part (but not all) of the primary key.
If a relation has a single primary key, by definition, there cannot be a partial dependency in such a relation.
If there are no non-key attribute in a relation, by definition, there cannot be a partial dependency in such a relation.

Third Normal Form (3NF)

A relation is in third normal form (3NF) if and only if -
- it is in second normal form, and
- it contains no transitive functional dependency. If such a dependency exists, decompose the table into smaller ones.
Transitive functional dependency is a functional dependency between the primary key and one or more non-key attributes that are dependent on the primary key via another non-key attribute.

Section 2 - Physical Database Design (Oracle)

Purpose of Physical Database Design

translate the logical description of data into the technical specifications for storing and retrieving data
create a design for storing data that will provide adequate performance and ensure database integrity, security, and recoverability
balance between efficient storage space and processing speed
efficient processing tend to dominate as storage is getting cheaper

Physical Design Process

Inputs
- Normalized Relations, Volume Estimates, Frequency of Use Estimates, Attribute Definitions, Response Time Expectations, Data Security, Backup, Recovery, Integrity Requirements, DBMS used, Regulatory Requirements
Key Decisions
- Data Types
- Indexes
- Physical Record Descriptions, File Organizations, Database Architectures, Query Optimization

Designing Fields

Field
- Smallest Unit in Data Hierarchy
- A simple attribute in the relation
Field Design
- Choosing Data Types
- Controlling/Enforcing Data Integrity
- Handling Missing Values

Choosing Data Types

A data type is considered suitable if it
- represents all possible values,
- improve data integrity,
- support all data manipulations, and
- minimize storage space.
Commonly Used Data Types in Oracle 12c
- VARCHAR2
- CHAR
- CLOB
- BLOB
- NUMBER
- DATE

Controlling Data Integrity

Default Value
OrderDate DATE DEFAULT SYSDATE
Range Control
ProductPrice NUMBER CHECK (ProductPrice > 0)
Null Value Control
OrderID NUMBER(11,0) NOT NULL
Referential Integrity
CONSTRAINT Order_FK FOREIGN KEY (CutomerID) REFERENCES CUSTOMER (CustomerID)

Handling Missing Data

Using a Default Value
Not Permitting the Use of Null Value
Substitute an Estimate of the Missing Value
Trigger a Report Listing Missing Values
Perform Sensitivity Analysis

Designing Physical Database Files

A physical file is a named portion of the secondary memory (commonly harddisk or magnetic tape) allocated for the purpose of storing physical records.
Two ways to retrieve data from secondary storage: sequential storage, pointers
File Organization
- A technique for physically arranging the records of a file on secondary storage.
- Aims at improving data retrieval speed, providing higher throughput, efficient use of storage space, etc.
- Sequential File Organization: the storage of records in a file in sequence according to a primary key value. Inserting a record in between existing records require resorting. Due to its inflexibility, it is not commonly used in production databases but may be used for backing up data from a database.
- Indexed File Organization: the storage of records either sequentially or non-sequentially with an index that allows the software to locate individual records. It is more common to create an index on a secondary key (which is a column in a table for primarily used for searching) to speed up data retrieval.
- Hashed File Organization: a storage system in which the address for each record is determined using a hashing algorithm (a routine that converts a primary key value into a relative record number or relative file address). It improves data retrieval rate by ensuring each hash leads to a smaller number of records.

Section 3A - Structured Query Language - Data Definition Language (SQL - DDL)

What is SQL?

Strctured Query Language (Pronunced as ‘S-Q-L’ or ‘Sequel’)
Standards: SQL-92, SQL:2003, SQL:2011, SQL:2016
RDBMS has Partial Compliance with Standards

Benefits of a Standardized Relational Language

Application Longevity
Reduced Dependence on a Single Vendor
Cross-system Communication
Application Portability

Types of SQL Commands

Data Definition Language (DDL)
- commands used to define a database, including those for creating, altering, and dropping tables and establishing constraints.
- CREATE DATABASE, CREATE TABLE, ALTER TABLE, DROP, etc.
Data Manipulation Language (DML)
- commands used to maintain and query a database, including those for updating, inserting, modifying, and querying data.
- INSERT INTO, DELETE FROM, UPDATE, SELECT, etc.
Data Control Language (DCL)
- commands used to control a database, including those for administering privileges.
- GRANT, REVOKE, etc.
COMMIT and ROLLBACK are classified as transaction control language (TCL) which is not mentioned in the lecture.

Creating Tables (Using the `CREATE` Command)

CREATE TABLE {table_name} (
    {col_name}    {col_definition} [, ...]
    {constraints} [, ...]
);

Explanation

{table_name}: the name of the relation to be created.
- Remember the name of the relation must be unique within a database.
- Name cannot be separated with space.
{col_name}: the name of the column/attribute to be created.
- Remember the name of the attribute must be unique within a relation.
- Name cannot be separated with space.
{col_definition}: consists of the column data type and the column constraints.
- Common data types: NUMBER(x, y), VARCHAR2(x), DATE, TIMESTAMP
- Common Column Constraints: DEFAULT {default_value}, NOT NULL
- You must put a NOT NULL statement for the attribute intended to be a primary key.
{constraints}: define the table constraints. Must be in the form CONSTRAINT {constraint_name} {constraint_definition}
- By convention, constraint_name is the name of the table + _ + shorthand of the constraint to be applied. (e.g. STUDENT_FK2)
- Common constraints:
  - PRIMARY KEY ({field_name}),
  - FOREIGN KEY ({field_name}) REFERENCES {table_name} ({field_name})
  - CHECK ({condition})

Example

CREATE TABLE department (
    deptID         VARCHAR2(10)     NOT NULL,
    deptName     VARCHAR2(100)     NOT NULL,
    CONSTRAINT department_PK PRIMARY KEY (deptID)
);
CREATE TABLE student (
    -- studentID are numbers but not for calculation
    -- so VARCHAR2 is used instead to save space
    sID         VARCHAR2(8)     NOT NULL,
    firstName     VARCHAR2(30)     NOT NULL,
    middleName     VARCHAR2(30),
    lastName     VARCHAR2(30)     NOT NULL,
    DOB         DATE             NOT NULL,
    -- Notice the student.deptID must have the same
    -- data type of department.deptID
    deptID         VARCHAR2(10)     NOT NULL,
    CONSTRAINT student_PK PRIMARY KEY (sID),
    CONSTRAINT student_FK FOREIGN KEY (deptID) 
                            REFERENCES department (deptID),
    CONSTRAINT student_CHECKNAME CHECK (lastName = UPPER(lastName))
);

Changing Table Definition (Using `ALTER` Command)

ALTER TABLE {table_name}
-- One of the following commands
[ADD {col_name} {col_definition}]
[ADD CONSTRAINT {constraint_name} {constraint_definition}]
[MODIFY {col_name} {changed_col_def}]
[DROP COLUMN {col_name}]
[DROP CONSTRAINT {constraint_name}]
[RENAME COLUMN {col_name} TO {new_col_name}];

Notes

When adding a column with existing data, you cannot specify the NOT NULL constraint on the new column because you cannot fill the value. Instead, add the column, insert the value, and then specify the constraint by using ALTER TABLE ... ADD CONSTRAINT ... statement.
By the same token, you cannot add a constraint of NOT NULL to an existing column, if the column has NULL values.
To remove a NOT NULL constraint, use ALTER TABLE ... MODIFY {col_name} NULL;.
To remove a default value, use ALTER TABLE ... MODIFY {col_name} DEFAULT NULL;
You cannot change the data type of a column that is referencing or being referenced by, another column in a FOREIGN KEY relationship. Instead, drop the foreign key constraint, modify the data type of both columns, and add the foreign key constraint again.
By the same token, you cannot drop a column that is being referenced by another column.

Removing Tables (Using the `DROP TABLE` Command)

DROP TABLE {table_name};

Section 3B - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 1)

Inserting Records (Using the `INSERT INTO` Command)

-- Inserting values from users
INSERT INTO {table_name} [({col_order})]
VALUES {val_order};
-- Inserting values from the result of a query
INSERT INTO {table_name}
{select_stmt};

Explanation

{col_order}: specifies the columns to be inserted (the unsepecified columns will get NULL)
Characters are case-sensitive inside ' '. ' ' can be omitted for numerical values.

Deleting Records (Using the `DELETE` Command)

DELETE FROM {table_name}
[{where_condition}];

Updating Records (Using the `UPDATE` Command)

UPDATE {table_name}
SET {field_name} = {new_val} [, ...]
[{where_condition}];

Querying Records (Using the `SELECT` Command)

SELECT {column_names}
FROM {table_names}
[{where_condition}]
[GROUP BY {column_names}]
[HAVING {condtion}]
[ORDER BY {column_name}, ...];

Illustration - `PRODUCT` Relation

PRO_ID	PRO_Name	PRO_Desc	PRO_Price	PRO_MemberDiscount
$\dots$	$\dots$	$\dots$	$\dots$	$\dots$

Write a SQL statement to select all product information.
```
SELECT * FROM PRODUCT;
```
Write a SQL statement to select all products which name starts with ‘Deluxe’.
```
SELECT * FROM PRODUCT
WHERE PRO_Name LIKE 'Deluxe%';
```
Write a SQL statement to select the product ID and product Name that a member cannot get a discount for.
```
SELECT PRO_ID, PRO_Name FROM PRODUCT
WHERE PRO_MemberDiscount < 1;
```
Write a SQL statement to select the product ID and the member price of the product, ordered by the member price in descending order.
```
SELECT PRO_ID, PRO_Price * PRO_MemberDiscount AS MemberPrice
FROM PRODUCT
ORDER BY PRO_Price * PRO_MemberDiscount DESC;
```

Using Aggregate Function in Query

COUNT()
MIN()
MAX()
SUM()
AVG()

Illustration - `PRODUCT` Relation

Write a SQL statement to select the number of products in the system.
```
SELECT COUNT(*) AS totalProduct
FROM PRODUCT;
```

Write a SQL statement to select the lowest member price in the system.

SELECT MIN(PRO_Price * PRO_MemberDiscount) AS minMemberPrice
FROM PRODUCT;

Write a SQL statement to the number of products in the system, where the name of the product ends with ‘Deluxe’.
```
SELECT COUNT(*) AS totalEndDeluxe
FROM PRODUCT
WHERE PRO_Name LIKE '%Deluxe';
```

Using Wildcards

* to represent all columns in SELECT statement.
% to represent any collection of characters.
_ to represent exactly one character.

Using Boolean Operators

NOT > AND > OR

Using Ranges

SELECT ... FROM ...
WHERE {column_name} BETWEEN {start_val} AND {end_val};
-- Values are inclusive.

Using `IN` and `NOT IN` Clause

SELECT ... FROM ...
WHERE {column_name} [NOT] IN ({list_of_val});

Using `DISTINCT`

SELECT DISTINCT {column_name}
FROM {table_name}
...;

Illustration - `PRODUCT` Relation

Write a SQL statement to select the product ID of all products with the word ‘Deluxe’ (case-insensitive) in the name, and all ‘desk’ product with the price lower than 200.
```
SELECT PRO_ID FROM PRODUCT
WHERE LOWER(PRO_Name) LIKE '%deluxe%'
OR LOWER(PRO_Name) LIKE '%desk%'
AND PRO_Price < 200;
```
Write a SQL statement to select the product ID of all product which provides 10% off or 20% off to members.
```
SELECT PRO_ID FROM PRODUCT
WHERE PRO_MemberDiscount IN (0.8, 0.9);
```
Write a SQL statement to select the product ID of all product which does not provide 50% off or 60% off to members.
```
SELECT PRO_ID FROM PRODUCT
WHERE PRO_MemberDiscount NOT IN (0.4, 0.5);
```

Aggregating Result with `GROUP BY` Clause and `HAVING` Clause

If you write queries using the following rule, your queries will work: Each column referenced in the SELECT statement must be referenced in the GROUP BY clause, unless the column is an argument for an aggregate function included in the SELECT clause.

Illustration - `STUDENT` Relation

STU_ID	STU_FirstName	STU_LastName	STU_AdmittedYear	MAJ_ID	NAT_ID
$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$

Assume that each student has one major only.

Write a SQL statement to select the number of student in each major. Sort the result in descending order of the major first, then in ascending order with the major ID.
```
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
```

Modify the query in Example 11 to show majors that have over 150 students only.

SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID
HAVING COUNT(MAJ_ID) > 150
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];

Modify the query in Example 11 to show majors except 'IS' and 'OM'.

-- Method 1: Filtering with WHERE
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
WHERE MAJ_ID NOT IN ('IS', 'OM')
GROUP BY MAJ_ID
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];

-- Method 2: Filtering through HAVING
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID
HAVING MAJ_ID NOT IN ('IS', 'OM')
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];

Write a SQL statement to show the number of student in each major who is admitted in 2016.

SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
WHERE STU_AdmittedYear = '2016'
GROUP BY MAJ_ID;

Write a SQL statement to show the number of student in each major from each country. Order by MAJ_ID in ascending order, then NAT_ID in ascending order.
```
SELECT MAJ_ID, NAT_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID, NAT_ID
ORDER BY MAJ_ID, NAT_ID;
```

Section 3C - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 2)

Processing Multiple Tables

Join
- a relational operation that temporarily combines two or more tables with common columns
- the common columns in joined tables are usually the primary key of the dominant table and the foreign key of the dependent table.
Equi-join
- based on equality between values in the common columns.
Natural join
- an equi-join in which one of the duplicated common columns is eliminated in the result table.
Outer Join
- a join in which rows that do not have matching values in common columns are nevertheless included in the result table.

Subquery

Location of possible subquery
- In WHERE or HAVING clauses for conditional checking
- As a virtual table of the FROM clause
Non-correlated Subquery - execute inner query once for the entire outer query
- Commonly using IN or = before subquery
Correlated Subquery - execute inner query once for each row returned by the outer query
- Commonly using EXISTS to return a boolean value.
- Commonly using ANY and ALL with logical operators such as =, >, and <.

Illustration - `STUDENT` relation, `MAJOR` relation, `DEPARTMENT` relation, and `NATIONALITY` relation

STU_ID	STU_FirstName	STU_LastName	STU_AdmittedYear	STU_GPA	MAJ_ID	NAT_ID
$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$

MAJ_ID	MAJ_Name	MAJ_OfficeRoom	DEP_ID
$\dots$	$\dots$	$\dots$	$\dots$

DEP_ID	DEP_Name	DEP_HeadName
$\dots$	$\dots$	$\dots$

NAT_ID	NAT_Name
$\dots$	$\dots$

Assume that each student has one major only.

Write a SQL statement to show the name of the student and the name of their major.

SELECT s.STU_FirstName, s.STU_LastName, m.MAJ_Name
FROM STUDENT s, MAJOR m
WHERE s.MAJ_ID = m.MAJ_ID;

Write a SQL statement to show the name of the major and the number of student in each major. Only show those majors with more than 150 students in the system.

SELECT m.MAJ_Name, COUNT(s.STU_ID) AS totalStudent
FROM MAJOR m, STUDENT s
WHERE m.MAJ_ID = s.MAJ_ID
GROUP BY m.MAJ_Name
HAVING COUNT(s.STU_ID) > 150;

Write a SQL statement to show the difference of GPA for each student against the average of all students.
```
SELECT STU_GPA - avgGPA diffGPA
FROM STUDENT, (SELECT AVG(STU_GPA) avgGPA FROM STUDENT);
```

Write a SQL to show the name of the student, and the name of the department head of their respective department.

SELECT s.STU_FirstName, s.STU_LastName, d.DEP_HeadName
FROM STUDENT s, MAJOR m, DEPARTMENT d
WHERE s.MAJ_ID = m.MAJ_ID
AND m.DEP_ID = d.DEP_ID;

Write a SQL statement to show the name of the department and the number of student in each department. Show only the department with over 200 students. Order by the number of student in descending order.

SELECT d.DEP_Name, COUNT(s.STU_ID) AS totalStudent
FROM DEPARTMENT d, MAJOR m, STUDENT s
WHERE d.DEP_ID = m.DEP_ID
AND m.MAJ_ID = s.MAJ_ID
GROUP BY d.DEP_NAME
HAVING COUNT(s.STU_ID) > 200
ORDER BY COUNT(s.STU_ID) DESC;

Using a non-correlated subquery, select the name(s) of the student(s) who get(s) the highest GPA.

SELECT STU_FirstName, STU_LastName
FROM STUDENT
WHERE STU_GPA = (SELECT MAX(STU_GPA) FROM STUDENT);

Using a correlated subquery, select the name(s) of the student(s) who get(s) the highest GPA.

SELECT STU_FirstName, STU_LastName
FROM STUDENT a
WHERE a.STU_GPA >= ALL (
    SELECT STU_GPA
    FROM STUDENT b
    WHERE b.STU_GPA != a.STU_GPA
);

Write a SQL statement to show all names of nationality, and the number of student in each nation. Show all nationalities regardless whether there are students in that nation or not.
```
SELECT n.NAT_Name, COUNT(s.STU_ID)
FROM NATIONALITY n
LEFT OUTER JOIN STUDENT s
ON n.NAT_ID = s.NAT_ID
GROUP BY n.NAT_ID;
```

Using a non-correlated subquery, select the name of nationality which no students belong to.

SELECT NAT_Name
FROM NATIONALITY
WHERE NAT_ID NOT IN (
    SELECT DISTINCT NAT_ID
    FROM STUDENT
);

Using a correlated subquery, select the name of nationality which no students belong to.

SELECT NAT_Name
FROM NATIONALITY n
WHERE NOT EXISTS (
    SELECT * FROM STUDENT s
    WHERE s.NAT_ID = n.NAT_ID
);

Section 4 - Data and Database Administration

Importance of Data/Database Administration

Data are corporate assets
Important assets of the organization need to be managed properly
Data are stored in databases
Hence, effective data and database administration provide support for managerial decision-making

Data/Database Administration

Data and Database Administration Functions
Managing Data Security
Backing Up Databases

Data Administration Functions

Data Administrator: a senior manager
Data Policies, Procedures, Standards
- Policies: e.g. ‘every user must have a password’
- Procedures: e.g. backup and recovery procedures
- Standards: e.g. naming conventions
(IS) Planning
- understand the information requirements of the organization
Data Conflict Resolution
- resolve data ownership issues
Internal Marketing of Data Standards
- reduce resistance to change in data standards
Database Administrator (DBA): a technical personnel
- Implement the data policies, procedures, standards decided by data administrator
- Managing data security, privacy, and integrity
- Performing data backup and recovery
- Selection of hardware and software
- Installing and upgrading the DBMS
- Tuning database performance

Managing Data Security

Database Security
- protection of the data against accidental or intentional loss, destructino, or misuse
Increasingly Difficult to Manage
- multiple channels for data access
- corporate intranets
- the Internet

Threats to Data Security

Accidental Losses
- human error, software failure, hardware failure
Theft
Improper Data Access
- loss of privacy (personal data)
- loss of confidentiality (corporate data)
Loss of Data Integrity
- data becomes invalid or corrupted
Loss of Availability
- destruction of hardware, networks, or applications
- virus attack

Data Management Software Security Features

Integrity Controls
- enforced by the DBMS during querying and updating
Authorization Rules
- identify users and restrict the actions they make take against a database
- GRANT SELECT, UPDATE(price) ON PRODUCT_T TO Smith
Encryption Procedures
- encode data in an unrecognizable form
Authentication Schemes
- positively identify a person attempting to gain access to a database
- fingerprints, voiceprints, eye scan, signature, secrete keys, digital certificates
Backup Databases
- facilitate recovery procedures

Backing Up Databases

We backup databases for facilitating databases recovery
Database recovery is the mechanism for restoring a database quickly and accurately after loss or damage
Recovery Manager (a DBMS module)
- resotres the database to a correct condition when a failure occurs
- resumes processing user requests

Backup Facilities

An automatic dump facility that produces backup copy of the entire database
Periodic Backup
Cold Backup: database is shut down during backup
Hot Backup
- selected portion of the database is shut down and backed up at a given time
- useful when it is a critical system that must always remain available
Backups are stored in secure, off-site location

Journalizing Facilities

An audit trail of transactions and database changes
Transaction
- a discrete unit of work that must be completely processed or not processed at all
- e.g. entering a customer order
Transaction Log
- record of essential data for each transaction processed against the database
Database Change Log
- before- and after-images of records that have been modified by transactions
- before-image is a copy of record before modification
- after-image is a copy of record after modification

Recovery and Restart Procedures

Switch

Mirrored Databases
- at least two copies of the database must be kept and updated simultaneously
- implemented in RAID 1 systems
Procedures
- when a disk failure occurs, system switches to mirrored disk
- defective disk can be removed and replaced with a new disk
Advantages
- no disruption in service; fastest recovery
- popular as cost of secondary storage has dropped
Disadvantages
- does not protect against lost of power or damage to both databases

Restore/Rerun

reprocessing the transactions (retrieved from the transaction log) against the backup copy of the database
Advantages
- does not need to create a database change log
- no special restart procedures required
Disadvantages
- time to reprocess transactions may be long
used only as a last resort

Forward Recovery (rollforward)

starts with the backup copy of the database
apply after-images (the results of good transactions) retrieved from database change log
preferable to restore/rerun
- do not need to reprocess each transaction
- only the most recent after-image of a record need to be applied

Backward Recovery (rollback)

apply before-images of record that have been changed to the database
restore the database to an earlier state
used to reverse the changes made by transactions that have been aborted to terminated abnormally (restore to an earlier state)

Section 5 - Data Warehousing

Data and Competitve Advantage

Businesses have a lot of data in their databases
Leveraging Business Data
- market basket analysis (what products tend to be purchased together)
- risk analysis
- targeted marketing

Data Warehouse

a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
Subject-oriented: high-level entities, e.g. customers, patients, students, products, time
Integrated: consistent naming conventions, formats, encoding structures from multiple data sources (both internal and external)
Time-variant: data contain a time dimension, may be used to study trends and changes
Non-updatable: read-only by end users, data loaded and periodically refreshed from operational systems

Enterprise Data Warehouse

A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications.

Data Mart

A data warehouse that is limited in scope, i.e. a mini-warehouse

Independent Data Mart

Each data mart obtains data from separate extract, transform, and load (ETL) processes from source data systems directly
No enterprise data warehouse (EDW)
Benefits
- Shorter time to production
- Lower complexity to build than EDW
Limitations
- Separate ETL processes required for each data mart, which can yield redundant data and efforts
- Data access complexity when accessing to multiple data marts for an enterprise-wide view of data

Dependent Data Mart

a data mart obtains data exclusively from the enterprise data warehouse and its reconciled data

Operational Data Store (ODS)

An integrated, subject-oriented, updateable, current-valued, detailed database designed to serve operational uses as they do decision support processing

Logical Data Mart

A data mart created by a relational view of a data warehouse
not physically separate databases
new data marts can be created quickly
always up-to-date since data is created when the view is referenced

Real-time Data Warehouse

An enterprise data warehouse that accepts near real-time feeds of transactional data from the operational systems
No need for a separate ODS, simpler architecture
Immediately transforms and loads the appropriate data into the warehouse
Provides near real-time access for the transaction rpcessing systems to an enterprise data warehouse

The User Interface of a Data Warehouse

Powerful, intuitive interface needed to help usrs easily access and analyze data
Querying Tools (e.g. Extended SQL)
Online Analytical Processing (OLAP) tools
- Graphical tools that provides users with multidimensional views of data and allows analysis using simple windowing techniques
Data Mining Tools
- Knowledge discovery (looking for patterns or trends) using techniques from traditional statistics, artificial intelligence
Data Visualization
- Representing data in graphical and multimedia formats for analysis
- To better observe trends and patterns

目錄

Final Exam Review for ISOM 3260 Database Design and Administration

Section 1 - Logical Database Design (Relational Database)

Relational Data Structure

Properties of Relations

Integrity Constraints

Transforming (E)ER Diagrams Into Relations

Step 1: Mapping Strong Entities

Step 2: Mapping Weak Entities

Step 3: Mapping Binary Relationships

Step 4: Mapping Associative Entities

Step 5: Mapping Unary Relationships

Step 6: Mapping nnn-ary Relationships

Step 7: Mapping Supertype/Subtype Relationships

Well-structured Relations (Not Tested)

Normalization

Functional Dependencies

Candidate Keys

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Section 2 - Physical Database Design (Oracle)

Purpose of Physical Database Design

Physical Design Process

Designing Fields

Choosing Data Types

Controlling Data Integrity

Handling Missing Data

Designing Physical Database Files

Section 3A - Structured Query Language - Data Definition Language (SQL - DDL)

What is SQL?

Benefits of a Standardized Relational Language

Types of SQL Commands

Creating Tables (Using the CREATE Command)

Explanation

Example

Changing Table Definition (Using ALTER Command)

Notes

Removing Tables (Using the DROP TABLE Command)

Section 3B - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 1)

Inserting Records (Using the INSERT INTO Command)

Explanation

Deleting Records (Using the DELETE Command)

Updating Records (Using the UPDATE Command)

Querying Records (Using the SELECT Command)

Illustration - PRODUCT Relation

Using Aggregate Function in Query

Illustration - PRODUCT Relation

Using Wildcards

Using Boolean Operators

Using Ranges

Using IN and NOT IN Clause

Using DISTINCT

Illustration - PRODUCT Relation

Aggregating Result with GROUP BY Clause and HAVING Clause

Illustration - STUDENT Relation

Section 3C - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 2)

Processing Multiple Tables

Subquery

Illustration - STUDENT relation, MAJOR relation, DEPARTMENT relation, and NATIONALITY relation

Section 4 - Data and Database Administration

Importance of Data/Database Administration

Data/Database Administration

Data Administration Functions

Managing Data Security

Threats to Data Security

Data Management Software Security Features

Backing Up Databases

Backup Facilities

Journalizing Facilities

Recovery and Restart Procedures

Switch

Restore/Rerun

Forward Recovery (rollforward)

Backward Recovery (rollback)

Section 5 - Data Warehousing

Data and Competitve Advantage

Data Warehouse

Enterprise Data Warehouse

Data Mart

Step 6: Mapping $n$ -ary Relationships

Creating Tables (Using the `CREATE` Command)

Changing Table Definition (Using `ALTER` Command)

Removing Tables (Using the `DROP TABLE` Command)

Inserting Records (Using the `INSERT INTO` Command)

Deleting Records (Using the `DELETE` Command)

Updating Records (Using the `UPDATE` Command)

Querying Records (Using the `SELECT` Command)

Illustration - `PRODUCT` Relation

Illustration - `PRODUCT` Relation

Using `IN` and `NOT IN` Clause

Using `DISTINCT`

Illustration - `PRODUCT` Relation

Aggregating Result with `GROUP BY` Clause and `HAVING` Clause

Illustration - `STUDENT` Relation

Illustration - `STUDENT` relation, `MAJOR` relation, `DEPARTMENT` relation, and `NATIONALITY` relation