This was my revision notes when I prepared for the final exam of ISOM 3260 Database Design and Administration. This is transcribed from my old website so the formatting might be off. The content might not be updated as well.
This is served as a reference only, and I have no intention to update this note.
Section 1 - Logical Database Design (Relational Database)
Relational Data Structure
- A relation is a named, two-dimensional table of data. Each relation consists of a set of named columns and an arbitrary number of unnamed rows.
- A primary key is an attribute or a combination of attributes that uniquely identifies each row in a relation.
- A composite key is a primary key that consists of more than one attribute.
- A foreign key is an attribute (possibly composite) in a relation that serves as the primary key of another relation.
Properties of Relations
- Each relation in a database must have a unique name.
- Each entry at the intersection of each row or column is atomic (or single valued). No multivalued attributes are allowed in a relation.
- Each row is unique.
- Each attribute within a table must have a unique name.
- The sequence of columns is insignificant.
- The sequence of wors is insignificant.
Integrity Constraints
- The relational data model includes several types of constraints, or rules limiting acceptable values and actions, whose purpose is to facilitate maintaining the accuracy and integrity of data in the database.
- Domain Constraints: to ensure the entries of an attribute are within a domain.
- A domain definition may consist of the data type, data length, allowable values, or allowable range.
- Entity Integrity: to ensure entity instances can be told apart from one another.
- Each entity instance (row) has a valid primary key (with non-null components).
- Null: a value that may be assigned to an attribute when the applicable value is unknown.
- Referential Integrity: to ensure the consistency amongst rows of two relations.
- Each foreign key value must match a primary key in the referred relations or the foreign key value must be null.
- Note that when the relationship is compulsory, the foreign key value must not be null. A
NOT NULL
constraint should be specified when defining the table. (See section 3A)
- Many CASE tools automatically perform the transformation steps. However, CASE tools do not automatically capture the correct business rules, and therefore a quality check must be performed manually.
Step 1: Mapping Strong Entities
- Convert the strong entity into a relation.
- Each simple attribute of the entity type is stored as an attribute of the relation.
- Only the simple components of the composite attribute are included in the new relation as its attributes.
- When the strong entity type contains a multi-valued attribute, two new relations are created.
- The first relation contains all attributes except the multi-valued attribute.
- The second contains the primary key of the previous relation (which is now a foreign key as well), and the multi-valued attribute. They form the primary key of the second relation.
Step 2: Mapping Weak Entities
- Becomes a separate relation with a foreign key taken from the identifying entity
- Primary key composed of
- Partial identifier of weak entity
- Primary key of the identifying relation (also a foreign key)
Step 3: Mapping Binary Relationships
- Note that one of the goals of transforming entities into relations is to reduce data redundancy.
- One-to-Many: primary key on the ‘one’ side becomes a foreign key on the ‘many’ side.
- Many-to-Many: create a new relation with the primary keys of the two entities as its primary key.
- One-to-One: primary key on the mandatory side becomes a foreign key of the optional side. (Reducing the need to store null values in the foreign key.)
Step 4: Mapping Associative Entities
- Associative entity with no pre-defined identifier: default primary key of the associative relation is the primary keys of the two entities.
- Associative entity has a pre-defined identifier:
- When default primary key cannot uniquely identify instances of the associative entity
- When it is natural and familiar to end-users
- For the ease of programming
Step 5: Mapping Unary Relationships
- The fundamental principles are the same as mapping binary relations.
- One-to-One/One-to-Many:
- Recursive Foreign Key in the same relation
- A foreign key in a relation that references the primary key values of the same relation.
- Many-to-Many:
- one for the entity type
- one for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity.
Step 6: Mapping n
-ary Relationships
- Create
n+1
relations: n
for the original entities, 1
for the associative entity.
- Associative entity has foreign keys to each entity in the relationship.
- Notice the default primary key may not be sufficient in uniquely identifying entity instances.
→
reconsider the primary key!
Step 7: Mapping Supertype/Subtype Relationships
- Supertype/Subtype relationships are not directly supported by the relational data model. Therefore, this describes the most commonly employed strategy:
- Create a separate relation for the supertype and each subtype.
- Supertype attributes go into supertype relation.
- Unique subtype attributes go into each subtype relation; primary key of supertype relation also becomes the primary key of subtype relation.
Well-structured Relations (Not Tested)
- A well-structured relation is a relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies.
- A high amount of redundancies in a relation may result in errors or inconsistencies (called anomalies) when a user attempts to manipulate the data in the relation.
- Insertion Anomaly: the unnecessary insertion of duplicated data or null values into a poorly designed database.
- Modification Anomaly: redundant data needs to be modified at the same time but the modifications are not applied on all of the concerned records or relations accordingly.
- Deletion Anomaly: the deletion of records in which critical data is lost or data consistency is affected in a database.
Normalization
- Normalization is the process of successively reducing relations with anomalies to produce smaller, well-structured relations.
- Goals of normalization:
- Minimize data redundancy, thereby avoiding anomalies and saving storage space.
- Simplify the enforcement of referential integrity.
- Make it easier to maintain data (insert, update, and delete).
- Provide a better design that is an improved representation of the real world and a stronger basis for future growth.
- Normalization makes no assumptions about how data will be used. It defines rules of business based on normal forms and functional dependencies.
- Normalization is a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.
Functional Dependencies
- A functional dependency is a constraint between two attributes or two sets of attributes.
- For any relation
R
, attribute B
is functionally dependent on attribute A
if, for every valid instance of A
, that value of A
uniquely determines the value of B
. Denoted A→B
.
- In another word, if you know the value of
A
, there can only be one value for B
.
- The attribute on the left side of the arrow in a functional dependency (
A
) is called a determinant.
Candidate Keys
- A candidate key is an attribute, or a combination of attributes, that uniquely identifies a row in a relation. (They are qualified as the primary key.)
- One of the candidate keys will become the primary key.
- Each non-key attribute is functionally dependent on every candidate key.
- A relation is in first normal form (1NF) if and only if -
there are no repeating groups in the relation (thus, there is a single fact at the intersection of each row and column of the table), and
a primary key has been defined, which uniquely identifies each row in the relation.
- there are no multi-valued attributes, and
- every attribute value is atomic.
- A relation is in second normal form (2NF) if and only if -
- it is in first normal form, and
- it contains no partial functional dependency. If such a dependency exists, decompose the table into smaller ones.
- Partial functional dependency exists when a non-key attribute is functionally dependent on part (but not all) of the primary key.
- If a relation has a single primary key, by definition, there cannot be a partial dependency in such a relation.
- If there are no non-key attribute in a relation, by definition, there cannot be a partial dependency in such a relation.
- A relation is in third normal form (3NF) if and only if -
- it is in second normal form, and
- it contains no transitive functional dependency. If such a dependency exists, decompose the table into smaller ones.
- Transitive functional dependency is a functional dependency between the primary key and one or more non-key attributes that are dependent on the primary key via another non-key attribute.
Section 2 - Physical Database Design (Oracle)
Purpose of Physical Database Design
- translate the logical description of data into the technical specifications for storing and retrieving data
- create a design for storing data that will provide adequate performance and ensure database integrity, security, and recoverability
- balance between efficient storage space and processing speed
- efficient processing tend to dominate as storage is getting cheaper
Physical Design Process
- Inputs
- Normalized Relations, Volume Estimates, Frequency of Use Estimates, Attribute Definitions, Response Time Expectations, Data Security, Backup, Recovery, Integrity Requirements, DBMS used, Regulatory Requirements
- Key Decisions
- Data Types
- Indexes
- Physical Record Descriptions, File Organizations, Database Architectures, Query Optimization
Designing Fields
- Field
- Smallest Unit in Data Hierarchy
- A simple attribute in the relation
- Field Design
- Choosing Data Types
- Controlling/Enforcing Data Integrity
- Handling Missing Values
Choosing Data Types
- A data type is considered suitable if it
- represents all possible values,
- improve data integrity,
- support all data manipulations, and
- minimize storage space.
- Commonly Used Data Types in Oracle 12c
VARCHAR2
CHAR
CLOB
BLOB
NUMBER
DATE
Controlling Data Integrity
- Default Value
OrderDate DATE DEFAULT SYSDATE
- Range Control
ProductPrice NUMBER CHECK (ProductPrice > 0)
- Null Value Control
OrderID NUMBER(11,0) NOT NULL
- Referential Integrity
CONSTRAINT Order_FK FOREIGN KEY (CutomerID) REFERENCES CUSTOMER (CustomerID)
Handling Missing Data
- Using a Default Value
- Not Permitting the Use of Null Value
- Substitute an Estimate of the Missing Value
- Trigger a Report Listing Missing Values
- Perform Sensitivity Analysis
Designing Physical Database Files
- A physical file is a named portion of the secondary memory (commonly harddisk or magnetic tape) allocated for the purpose of storing physical records.
- Two ways to retrieve data from secondary storage: sequential storage, pointers
- File Organization
- A technique for physically arranging the records of a file on secondary storage.
- Aims at improving data retrieval speed, providing higher throughput, efficient use of storage space, etc.
- Sequential File Organization: the storage of records in a file in sequence according to a primary key value. Inserting a record in between existing records require resorting. Due to its inflexibility, it is not commonly used in production databases but may be used for backing up data from a database.
- Indexed File Organization: the storage of records either sequentially or non-sequentially with an index that allows the software to locate individual records. It is more common to create an index on a secondary key (which is a column in a table for primarily used for searching) to speed up data retrieval.
- Hashed File Organization: a storage system in which the address for each record is determined using a hashing algorithm (a routine that converts a primary key value into a relative record number or relative file address). It improves data retrieval rate by ensuring each hash leads to a smaller number of records.
Section 3A - Structured Query Language - Data Definition Language (SQL - DDL)
What is SQL?
- Strctured Query Language (Pronunced as ‘S-Q-L’ or ‘Sequel’)
- Standards: SQL-92, SQL:2003, SQL:2011, SQL:2016
- RDBMS has Partial Compliance with Standards
Benefits of a Standardized Relational Language
- Application Longevity
- Reduced Dependence on a Single Vendor
- Cross-system Communication
- Application Portability
Types of SQL Commands
- Data Definition Language (DDL)
- commands used to define a database, including those for creating, altering, and dropping tables and establishing constraints.
CREATE DATABASE
, CREATE TABLE
, ALTER TABLE
, DROP
, etc.
- Data Manipulation Language (DML)
- commands used to maintain and query a database, including those for updating, inserting, modifying, and querying data.
INSERT INTO
, DELETE FROM
, UPDATE
, SELECT
, etc.
- Data Control Language (DCL)
- commands used to control a database, including those for administering privileges.
GRANT
, REVOKE
, etc.
COMMIT
and ROLLBACK
are classified as transaction control language (TCL) which is not mentioned in the lecture.
Creating Tables (Using the CREATE
Command)
CREATE TABLE {table_name} (
{col_name} {col_definition} [, ...]
{constraints} [, ...]
);
Explanation
{table_name}
: the name of the relation to be created.
- Remember the name of the relation must be unique within a database.
- Name cannot be separated with space.
{col_name}
: the name of the column/attribute to be created.
- Remember the name of the attribute must be unique within a relation.
- Name cannot be separated with space.
{col_definition}
: consists of the column data type and the column constraints.
- Common data types:
NUMBER(x, y)
, VARCHAR2(x)
, DATE
, TIMESTAMP
- Common Column Constraints:
DEFAULT {default_value}
, NOT NULL
- You must put a
NOT NULL
statement for the attribute intended to be a primary key.
{constraints}
: define the table constraints. Must be in the form CONSTRAINT {constraint_name} {constraint_definition}
- By convention,
constraint_name
is the name of the table + _
+ shorthand of the constraint to be applied. (e.g. STUDENT_FK2
)
- Common constraints:
PRIMARY KEY ({field_name})
,
FOREIGN KEY ({field_name}) REFERENCES {table_name} ({field_name})
CHECK ({condition})
Example
CREATE TABLE department (
deptID VARCHAR2(10) NOT NULL,
deptName VARCHAR2(100) NOT NULL,
CONSTRAINT department_PK PRIMARY KEY (deptID)
);
CREATE TABLE student (
-- studentID are numbers but not for calculation
-- so VARCHAR2 is used instead to save space
sID VARCHAR2(8) NOT NULL,
firstName VARCHAR2(30) NOT NULL,
middleName VARCHAR2(30),
lastName VARCHAR2(30) NOT NULL,
DOB DATE NOT NULL,
-- Notice the student.deptID must have the same
-- data type of department.deptID
deptID VARCHAR2(10) NOT NULL,
CONSTRAINT student_PK PRIMARY KEY (sID),
CONSTRAINT student_FK FOREIGN KEY (deptID)
REFERENCES department (deptID),
CONSTRAINT student_CHECKNAME CHECK (lastName = UPPER(lastName))
);
Changing Table Definition (Using ALTER
Command)
ALTER TABLE {table_name}
-- One of the following commands
[ADD {col_name} {col_definition}]
[ADD CONSTRAINT {constraint_name} {constraint_definition}]
[MODIFY {col_name} {changed_col_def}]
[DROP COLUMN {col_name}]
[DROP CONSTRAINT {constraint_name}]
[RENAME COLUMN {col_name} TO {new_col_name}];
Notes
- When adding a column with existing data, you cannot specify the
NOT NULL
constraint on the new column because you cannot fill the value. Instead, add the column, insert the value, and then specify the constraint by using ALTER TABLE ... ADD CONSTRAINT ...
statement.
- By the same token, you cannot add a constraint of
NOT NULL
to an existing column, if the column has NULL
values.
- To remove a
NOT NULL
constraint, use ALTER TABLE ... MODIFY {col_name} NULL;
.
- To remove a default value, use
ALTER TABLE ... MODIFY {col_name} DEFAULT NULL;
- You cannot change the data type of a column that is referencing or being referenced by, another column in a
FOREIGN KEY
relationship. Instead, drop the foreign key constraint, modify the data type of both columns, and add the foreign key constraint again.
- By the same token, you cannot drop a column that is being referenced by another column.
Removing Tables (Using the DROP TABLE
Command)
DROP TABLE {table_name};
Section 3B - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 1)
Inserting Records (Using the INSERT INTO
Command)
-- Inserting values from users
INSERT INTO {table_name} [({col_order})]
VALUES {val_order};
-- Inserting values from the result of a query
INSERT INTO {table_name}
{select_stmt};
Explanation
{col_order}
: specifies the columns to be inserted (the unsepecified columns will get NULL
)
- Characters are case-sensitive inside
' '
. ' '
can be omitted for numerical values.
Deleting Records (Using the DELETE
Command)
DELETE FROM {table_name}
[{where_condition}];
Updating Records (Using the UPDATE
Command)
UPDATE {table_name}
SET {field_name} = {new_val} [, ...]
[{where_condition}];
Querying Records (Using the SELECT
Command)
SELECT {column_names}
FROM {table_names}
[{where_condition}]
[GROUP BY {column_names}]
[HAVING {condtion}]
[ORDER BY {column_name}, ...];
Illustration - PRODUCT
Relation
PRO_ID | PRO_Name | PRO_Desc | PRO_Price | PRO_MemberDiscount |
---|
… | … | … | … | … |
- Write a SQL statement to select all product information.
SELECT * FROM PRODUCT;
- Write a SQL statement to select all products which name starts with ‘Deluxe’.
SELECT * FROM PRODUCT
WHERE PRO_Name LIKE 'Deluxe%';
- Write a SQL statement to select the product ID and product Name that a member cannot get a discount for.
SELECT PRO_ID, PRO_Name FROM PRODUCT
WHERE PRO_MemberDiscount < 1;
- Write a SQL statement to select the product ID and the member price of the product, ordered by the member price in descending order.
SELECT PRO_ID, PRO_Price * PRO_MemberDiscount AS MemberPrice
FROM PRODUCT
ORDER BY PRO_Price * PRO_MemberDiscount DESC;
Using Aggregate Function in Query
COUNT()
MIN()
MAX()
SUM()
AVG()
Illustration - PRODUCT
Relation
- Write a SQL statement to select the number of products in the system.
SELECT COUNT(*) AS totalProduct
FROM PRODUCT;
- Write a SQL statement to select the lowest member price in the system.
SELECT MIN(PRO_Price * PRO_MemberDiscount) AS minMemberPrice
FROM PRODUCT;
- Write a SQL statement to the number of products in the system, where the name of the product ends with ‘Deluxe’.
SELECT COUNT(*) AS totalEndDeluxe
FROM PRODUCT
WHERE PRO_Name LIKE '%Deluxe';
Using Wildcards
*
to represent all columns in SELECT
statement.
%
to represent any collection of characters.
_
to represent exactly one character.
Using Boolean Operators
Using Ranges
SELECT ... FROM ...
WHERE {column_name} BETWEEN {start_val} AND {end_val};
-- Values are inclusive.
Using IN
and NOT IN
Clause
SELECT ... FROM ...
WHERE {column_name} [NOT] IN ({list_of_val});
Using DISTINCT
SELECT DISTINCT {column_name}
FROM {table_name}
...;
Illustration - PRODUCT
Relation
- Write a SQL statement to select the product ID of all products with the word ‘Deluxe’ (case-insensitive) in the name, and all ‘desk’ product with the price lower than
200
.
SELECT PRO_ID FROM PRODUCT
WHERE LOWER(PRO_Name) LIKE '%deluxe%'
OR LOWER(PRO_Name) LIKE '%desk%'
AND PRO_Price < 200;
- Write a SQL statement to select the product ID of all product which provides 10% off or 20% off to members.
SELECT PRO_ID FROM PRODUCT
WHERE PRO_MemberDiscount IN (0.8, 0.9);
- Write a SQL statement to select the product ID of all product which does not provide 50% off or 60% off to members.
SELECT PRO_ID FROM PRODUCT
WHERE PRO_MemberDiscount NOT IN (0.4, 0.5);
Aggregating Result with GROUP BY
Clause and HAVING
Clause
- If you write queries using the following rule, your queries will work: Each column referenced in the
SELECT
statement must be referenced in the GROUP BY
clause, unless the column is an argument for an aggregate function included in the SELECT
clause.
Illustration - STUDENT
Relation
STU_ID | STU_FirstName | STU_LastName | STU_AdmittedYear | MAJ_ID | NAT_ID |
---|
… | … | … | … | … | … |
- Assume that each student has one major only.
- Write a SQL statement to select the number of student in each major. Sort the result in descending order of the major first, then in ascending order with the major ID.
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
- Modify the query in Example 11 to show majors that have over 150 students only.
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID
HAVING COUNT(MAJ_ID) > 150
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
- Modify the query in Example 11 to show majors except
'IS'
and 'OM'
.
-- Method 1: Filtering with WHERE
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
WHERE MAJ_ID NOT IN ('IS', 'OM')
GROUP BY MAJ_ID
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
-- Method 2: Filtering through HAVING
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID
HAVING MAJ_ID NOT IN ('IS', 'OM')
ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
- Write a SQL statement to show the number of student in each major who is admitted in 2016.
SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
WHERE STU_AdmittedYear = '2016'
GROUP BY MAJ_ID;
- Write a SQL statement to show the number of student in each major from each country. Order by
MAJ_ID
in ascending order, then NAT_ID
in ascending order.
SELECT MAJ_ID, NAT_ID, COUNT(MAJ_ID) AS totalStudent
FROM STUDENT
GROUP BY MAJ_ID, NAT_ID
ORDER BY MAJ_ID, NAT_ID;
Section 3C - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 2)
Processing Multiple Tables
- Join
- a relational operation that temporarily combines two or more tables with common columns
- the common columns in joined tables are usually the primary key of the dominant table and the foreign key of the dependent table.
- Equi-join
- based on equality between values in the common columns.
- Natural join
- an equi-join in which one of the duplicated common columns is eliminated in the result table.
- Outer Join
- a join in which rows that do not have matching values in common columns are nevertheless included in the result table.
Subquery
- Location of possible subquery
- In
WHERE
or HAVING
clauses for conditional checking
- As a virtual table of the
FROM
clause
- Non-correlated Subquery - execute inner query once for the entire outer query
- Commonly using
IN
or =
before subquery
- Correlated Subquery - execute inner query once for each row returned by the outer query
- Commonly using
EXISTS
to return a boolean value.
- Commonly using
ANY
and ALL
with logical operators such as =
, >
, and <
.
Illustration - STUDENT
relation, MAJOR
relation, DEPARTMENT
relation, and NATIONALITY
relation
STU_ID | STU_FirstName | STU_LastName | STU_AdmittedYear | STU_GPA | MAJ_ID | NAT_ID |
---|
… | … | … | … | … | … | … |
MAJ_ID | MAJ_Name | MAJ_OfficeRoom | DEP_ID |
---|
… | … | … | … |
DEP_ID | DEP_Name | DEP_HeadName |
---|
… | … | … |
NAT_ID | NAT_Name |
---|
… | … |
- Assume that each student has one major only.
- Write a SQL statement to show the name of the student and the name of their major.
SELECT s.STU_FirstName, s.STU_LastName, m.MAJ_Name
FROM STUDENT s, MAJOR m
WHERE s.MAJ_ID = m.MAJ_ID;
- Write a SQL statement to show the name of the major and the number of student in each major. Only show those majors with more than 150 students in the system.
SELECT m.MAJ_Name, COUNT(s.STU_ID) AS totalStudent
FROM MAJOR m, STUDENT s
WHERE m.MAJ_ID = s.MAJ_ID
GROUP BY m.MAJ_Name
HAVING COUNT(s.STU_ID) > 150;
- Write a SQL statement to show the difference of GPA for each student against the average of all students.
SELECT STU_GPA - avgGPA diffGPA
FROM STUDENT, (SELECT AVG(STU_GPA) avgGPA FROM STUDENT);
- Write a SQL to show the name of the student, and the name of the department head of their respective department.
SELECT s.STU_FirstName, s.STU_LastName, d.DEP_HeadName
FROM STUDENT s, MAJOR m, DEPARTMENT d
WHERE s.MAJ_ID = m.MAJ_ID
AND m.DEP_ID = d.DEP_ID;
- Write a SQL statement to show the name of the department and the number of student in each department. Show only the department with over 200 students. Order by the number of student in descending order.
SELECT d.DEP_Name, COUNT(s.STU_ID) AS totalStudent
FROM DEPARTMENT d, MAJOR m, STUDENT s
WHERE d.DEP_ID = m.DEP_ID
AND m.MAJ_ID = s.MAJ_ID
GROUP BY d.DEP_NAME
HAVING COUNT(s.STU_ID) > 200
ORDER BY COUNT(s.STU_ID) DESC;
- Using a non-correlated subquery, select the name(s) of the student(s) who get(s) the highest GPA.
SELECT STU_FirstName, STU_LastName
FROM STUDENT
WHERE STU_GPA = (SELECT MAX(STU_GPA) FROM STUDENT);
- Using a correlated subquery, select the name(s) of the student(s) who get(s) the highest GPA.
SELECT STU_FirstName, STU_LastName
FROM STUDENT a
WHERE a.STU_GPA >= ALL (
SELECT STU_GPA
FROM STUDENT b
WHERE b.STU_GPA != a.STU_GPA
);
- Write a SQL statement to show all names of nationality, and the number of student in each nation. Show all nationalities regardless whether there are students in that nation or not.
SELECT n.NAT_Name, COUNT(s.STU_ID)
FROM NATIONALITY n
LEFT OUTER JOIN STUDENT s
ON n.NAT_ID = s.NAT_ID
GROUP BY n.NAT_ID;
- Using a non-correlated subquery, select the name of nationality which no students belong to.
SELECT NAT_Name
FROM NATIONALITY
WHERE NAT_ID NOT IN (
SELECT DISTINCT NAT_ID
FROM STUDENT
);
- Using a correlated subquery, select the name of nationality which no students belong to.
SELECT NAT_Name
FROM NATIONALITY n
WHERE NOT EXISTS (
SELECT * FROM STUDENT s
WHERE s.NAT_ID = n.NAT_ID
);
Section 4 - Data and Database Administration
Importance of Data/Database Administration
- Data are corporate assets
- Important assets of the organization need to be managed properly
- Data are stored in databases
- Hence, effective data and database administration provide support for managerial decision-making
Data/Database Administration
- Data and Database Administration Functions
- Managing Data Security
- Backing Up Databases
Data Administration Functions
- Data Administrator: a senior manager
- Data Policies, Procedures, Standards
- Policies: e.g. ‘every user must have a password’
- Procedures: e.g. backup and recovery procedures
- Standards: e.g. naming conventions
- (IS) Planning
- understand the information requirements of the organization
- Data Conflict Resolution
- resolve data ownership issues
- Internal Marketing of Data Standards
- reduce resistance to change in data standards
- Database Administrator (DBA): a technical personnel
- Implement the data policies, procedures, standards decided by data administrator
- Managing data security, privacy, and integrity
- Performing data backup and recovery
- Selection of hardware and software
- Installing and upgrading the DBMS
- Tuning database performance
Managing Data Security
- Database Security
- protection of the data against accidental or intentional loss, destructino, or misuse
- Increasingly Difficult to Manage
- multiple channels for data access
- corporate intranets
- the Internet
Threats to Data Security
- Accidental Losses
- human error, software failure, hardware failure
- Theft
- Improper Data Access
- loss of privacy (personal data)
- loss of confidentiality (corporate data)
- Loss of Data Integrity
- data becomes invalid or corrupted
- Loss of Availability
- destruction of hardware, networks, or applications
- virus attack
Data Management Software Security Features
- Integrity Controls
- enforced by the DBMS during querying and updating
- Authorization Rules
- identify users and restrict the actions they make take against a database
GRANT SELECT, UPDATE(price) ON PRODUCT_T TO Smith
- Encryption Procedures
- encode data in an unrecognizable form
- Authentication Schemes
- positively identify a person attempting to gain access to a database
- fingerprints, voiceprints, eye scan, signature, secrete keys, digital certificates
- Backup Databases
- facilitate recovery procedures
Backing Up Databases
- We backup databases for facilitating databases recovery
- Database recovery is the mechanism for restoring a database quickly and accurately after loss or damage
- Recovery Manager (a DBMS module)
- resotres the database to a correct condition when a failure occurs
- resumes processing user requests
Backup Facilities
- An automatic dump facility that produces backup copy of the entire database
- Periodic Backup
- Cold Backup: database is shut down during backup
- Hot Backup
- selected portion of the database is shut down and backed up at a given time
- useful when it is a critical system that must always remain available
- Backups are stored in secure, off-site location
Journalizing Facilities
- An audit trail of transactions and database changes
- Transaction
- a discrete unit of work that must be completely processed or not processed at all
- e.g. entering a customer order
- Transaction Log
- record of essential data for each transaction processed against the database
- Database Change Log
- before- and after-images of records that have been modified by transactions
- before-image is a copy of record before modification
- after-image is a copy of record after modification
Recovery and Restart Procedures
Switch
- Mirrored Databases
- at least two copies of the database must be kept and updated simultaneously
- implemented in RAID 1 systems
- Procedures
- when a disk failure occurs, system switches to mirrored disk
- defective disk can be removed and replaced with a new disk
- Advantages
- no disruption in service; fastest recovery
- popular as cost of secondary storage has dropped
- Disadvantages
- does not protect against lost of power or damage to both databases
Restore/Rerun
- reprocessing the transactions (retrieved from the transaction log) against the backup copy of the database
- Advantages
- does not need to create a database change log
- no special restart procedures required
- Disadvantages
- time to reprocess transactions may be long
- used only as a last resort
Forward Recovery (rollforward)
- starts with the backup copy of the database
- apply after-images (the results of good transactions) retrieved from database change log
- preferable to restore/rerun
- do not need to reprocess each transaction
- only the most recent after-image of a record need to be applied
Backward Recovery (rollback)
- apply before-images of record that have been changed to the database
- restore the database to an earlier state
- used to reverse the changes made by transactions that have been aborted to terminated abnormally (restore to an earlier state)
Section 5 - Data Warehousing
Data and Competitve Advantage
- Businesses have a lot of data in their databases
- Leveraging Business Data
- market basket analysis (what products tend to be purchased together)
- risk analysis
- targeted marketing
Data Warehouse
- a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
- Subject-oriented: high-level entities, e.g. customers, patients, students, products, time
- Integrated: consistent naming conventions, formats, encoding structures from multiple data sources (both internal and external)
- Time-variant: data contain a time dimension, may be used to study trends and changes
- Non-updatable: read-only by end users, data loaded and periodically refreshed from operational systems
Enterprise Data Warehouse
- A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications.
Data Mart
- A data warehouse that is limited in scope, i.e. a mini-warehouse
Independent Data Mart
- Each data mart obtains data from separate extract, transform, and load (ETL) processes from source data systems directly
- No enterprise data warehouse (EDW)
- Benefits
- Shorter time to production
- Lower complexity to build than EDW
- Limitations
- Separate ETL processes required for each data mart, which can yield redundant data and efforts
- Data access complexity when accessing to multiple data marts for an enterprise-wide view of data
Dependent Data Mart
- a data mart obtains data exclusively from the enterprise data warehouse and its reconciled data
Operational Data Store (ODS)
- An integrated, subject-oriented, updateable, current-valued, detailed database designed to serve operational uses as they do decision support processing
Logical Data Mart
- A data mart created by a relational view of a data warehouse
- not physically separate databases
- new data marts can be created quickly
- always up-to-date since data is created when the view is referenced
Real-time Data Warehouse
- An enterprise data warehouse that accepts near real-time feeds of transactional data from the operational systems
- No need for a separate ODS, simpler architecture
- Immediately transforms and loads the appropriate data into the warehouse
- Provides near real-time access for the transaction rpcessing systems to an enterprise data warehouse
The User Interface of a Data Warehouse
- Powerful, intuitive interface needed to help usrs easily access and analyze data
- Querying Tools (e.g. Extended SQL)
- Online Analytical Processing (OLAP) tools
- Graphical tools that provides users with multidimensional views of data and allows analysis using simple windowing techniques
- Data Mining Tools
- Knowledge discovery (looking for patterns or trends) using techniques from traditional statistics, artificial intelligence
- Data Visualization
- Representing data in graphical and multimedia formats for analysis
- To better observe trends and patterns