Final Exam Review for ISOM 3260 Database Design and Administration

Willis WAN, Chun Yu | 2019-05-11

This was my revision notes when I prepared for the final exam of ISOM 3260 Database Design and Administration. This is transcribed from my old website so the formatting might be off. The content might not be updated as well.
This is served as a reference only, and I have no intention to update this note.

Section 1 - Logical Database Design (Relational Database)

Relational Data Structure

  • A relation is a named, two-dimensional table of data. Each relation consists of a set of named columns and an arbitrary number of unnamed rows.
  • A primary key is an attribute or a combination of attributes that uniquely identifies each row in a relation.
    • A composite key is a primary key that consists of more than one attribute.
  • A foreign key is an attribute (possibly composite) in a relation that serves as the primary key of another relation.

Properties of Relations

  1. Each relation in a database must have a unique name.
  2. Each entry at the intersection of each row or column is atomic (or single valued). No multivalued attributes are allowed in a relation.
  3. Each row is unique.
  4. Each attribute within a table must have a unique name.
  5. The sequence of columns is insignificant.
  6. The sequence of wors is insignificant.

Integrity Constraints

  • The relational data model includes several types of constraints, or rules limiting acceptable values and actions, whose purpose is to facilitate maintaining the accuracy and integrity of data in the database.
  • Domain Constraints: to ensure the entries of an attribute are within a domain.
    • A domain definition may consist of the data type, data length, allowable values, or allowable range.
  • Entity Integrity: to ensure entity instances can be told apart from one another.
    • Each entity instance (row) has a valid primary key (with non-null components).
    • Null: a value that may be assigned to an attribute when the applicable value is unknown.
  • Referential Integrity: to ensure the consistency amongst rows of two relations.
    • Each foreign key value must match a primary key in the referred relations or the foreign key value must be null.
    • Note that when the relationship is compulsory, the foreign key value must not be null. A NOT NULL constraint should be specified when defining the table. (See section 3A)

Transforming (E)ER Diagrams Into Relations

  • Many CASE tools automatically perform the transformation steps. However, CASE tools do not automatically capture the correct business rules, and therefore a quality check must be performed manually.

Step 1: Mapping Strong Entities

  • Convert the strong entity into a relation.
  • Each simple attribute of the entity type is stored as an attribute of the relation.
  • Only the simple components of the composite attribute are included in the new relation as its attributes.
  • When the strong entity type contains a multi-valued attribute, two new relations are created.
    • The first relation contains all attributes except the multi-valued attribute.
    • The second contains the primary key of the previous relation (which is now a foreign key as well), and the multi-valued attribute. They form the primary key of the second relation.

Step 2: Mapping Weak Entities

  • Becomes a separate relation with a foreign key taken from the identifying entity
  • Primary key composed of
    • Partial identifier of weak entity
    • Primary key of the identifying relation (also a foreign key)

Step 3: Mapping Binary Relationships

  • Note that one of the goals of transforming entities into relations is to reduce data redundancy.
  • One-to-Many: primary key on the ‘one’ side becomes a foreign key on the ‘many’ side.
  • Many-to-Many: create a new relation with the primary keys of the two entities as its primary key.
  • One-to-One: primary key on the mandatory side becomes a foreign key of the optional side. (Reducing the need to store null values in the foreign key.)

Step 4: Mapping Associative Entities

  • Associative entity with no pre-defined identifier: default primary key of the associative relation is the primary keys of the two entities.
  • Associative entity has a pre-defined identifier:
    • When default primary key cannot uniquely identify instances of the associative entity
    • When it is natural and familiar to end-users
    • For the ease of programming

Step 5: Mapping Unary Relationships

  • The fundamental principles are the same as mapping binary relations.
  • One-to-One/One-to-Many:
    • Recursive Foreign Key in the same relation
    • A foreign key in a relation that references the primary key values of the same relation.
  • Many-to-Many:
    • one for the entity type
    • one for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity.

Step 6: Mapping nn-ary Relationships

  • Create n+1n+1 relations: nn for the original entities, 11 for the associative entity.
  • Associative entity has foreign keys to each entity in the relationship.
  • Notice the default primary key may not be sufficient in uniquely identifying entity instances. \rightarrow reconsider the primary key!

Step 7: Mapping Supertype/Subtype Relationships

  • Supertype/Subtype relationships are not directly supported by the relational data model. Therefore, this describes the most commonly employed strategy:
    • Create a separate relation for the supertype and each subtype.
    • Supertype attributes go into supertype relation.
    • Unique subtype attributes go into each subtype relation; primary key of supertype relation also becomes the primary key of subtype relation.

Well-structured Relations (Not Tested)

  • A well-structured relation is a relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies.
  • A high amount of redundancies in a relation may result in errors or inconsistencies (called anomalies) when a user attempts to manipulate the data in the relation.
    • Insertion Anomaly: the unnecessary insertion of duplicated data or null values into a poorly designed database.
    • Modification Anomaly: redundant data needs to be modified at the same time but the modifications are not applied on all of the concerned records or relations accordingly.
    • Deletion Anomaly: the deletion of records in which critical data is lost or data consistency is affected in a database.

Normalization

  • Normalization is the process of successively reducing relations with anomalies to produce smaller, well-structured relations.
  • Goals of normalization:
    • Minimize data redundancy, thereby avoiding anomalies and saving storage space.
    • Simplify the enforcement of referential integrity.
    • Make it easier to maintain data (insert, update, and delete).
    • Provide a better design that is an improved representation of the real world and a stronger basis for future growth.
  • Normalization makes no assumptions about how data will be used. It defines rules of business based on normal forms and functional dependencies.
  • Normalization is a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.

Functional Dependencies

  • A functional dependency is a constraint between two attributes or two sets of attributes.
    • For any relation RR, attribute BB is functionally dependent on attribute AA if, for every valid instance of AA, that value of AA uniquely determines the value of BB. Denoted ABA\rightarrow B.
    • In another word, if you know the value of AA, there can only be one value for BB.
    • The attribute on the left side of the arrow in a functional dependency (AA) is called a determinant.

Candidate Keys

  • A candidate key is an attribute, or a combination of attributes, that uniquely identifies a row in a relation. (They are qualified as the primary key.)
  • One of the candidate keys will become the primary key.
  • Each non-key attribute is functionally dependent on every candidate key.

First Normal Form (1NF)

  • A relation is in first normal form (1NF) if and only if -
    • there are no repeating groups in the relation (thus, there is a single fact at the intersection of each row and column of the table), and
    • a primary key has been defined, which uniquely identifies each row in the relation.
    • there are no multi-valued attributes, and
    • every attribute value is atomic.

Second Normal Form (2NF)

  • A relation is in second normal form (2NF) if and only if -
    • it is in first normal form, and
    • it contains no partial functional dependency. If such a dependency exists, decompose the table into smaller ones.
  • Partial functional dependency exists when a non-key attribute is functionally dependent on part (but not all) of the primary key.
  • If a relation has a single primary key, by definition, there cannot be a partial dependency in such a relation.
  • If there are no non-key attribute in a relation, by definition, there cannot be a partial dependency in such a relation.

Third Normal Form (3NF)

  • A relation is in third normal form (3NF) if and only if -
    • it is in second normal form, and
    • it contains no transitive functional dependency. If such a dependency exists, decompose the table into smaller ones.
  • Transitive functional dependency is a functional dependency between the primary key and one or more non-key attributes that are dependent on the primary key via another non-key attribute.

Section 2 - Physical Database Design (Oracle)

Purpose of Physical Database Design

  • translate the logical description of data into the technical specifications for storing and retrieving data
  • create a design for storing data that will provide adequate performance and ensure database integrity, security, and recoverability
  • balance between efficient storage space and processing speed
  • efficient processing tend to dominate as storage is getting cheaper

Physical Design Process

  • Inputs
    • Normalized Relations, Volume Estimates, Frequency of Use Estimates, Attribute Definitions, Response Time Expectations, Data Security, Backup, Recovery, Integrity Requirements, DBMS used, Regulatory Requirements
  • Key Decisions
    • Data Types
    • Indexes
    • Physical Record Descriptions, File Organizations, Database Architectures, Query Optimization

Designing Fields

  • Field
    • Smallest Unit in Data Hierarchy
    • A simple attribute in the relation
  • Field Design
    • Choosing Data Types
    • Controlling/Enforcing Data Integrity
    • Handling Missing Values

Choosing Data Types

  • A data type is considered suitable if it
    • represents all possible values,
    • improve data integrity,
    • support all data manipulations, and
    • minimize storage space.
  • Commonly Used Data Types in Oracle 12c
    • VARCHAR2
    • CHAR
    • CLOB
    • BLOB
    • NUMBER
    • DATE

Controlling Data Integrity

  • Default Value
    OrderDate DATE DEFAULT SYSDATE
  • Range Control
    ProductPrice NUMBER CHECK (ProductPrice > 0)
  • Null Value Control
    OrderID NUMBER(11,0) NOT NULL
  • Referential Integrity
    CONSTRAINT Order_FK FOREIGN KEY (CutomerID) REFERENCES CUSTOMER (CustomerID)

Handling Missing Data

  • Using a Default Value
  • Not Permitting the Use of Null Value
  • Substitute an Estimate of the Missing Value
  • Trigger a Report Listing Missing Values
  • Perform Sensitivity Analysis

Designing Physical Database Files

  • A physical file is a named portion of the secondary memory (commonly harddisk or magnetic tape) allocated for the purpose of storing physical records.
  • Two ways to retrieve data from secondary storage: sequential storage, pointers
  • File Organization
    • A technique for physically arranging the records of a file on secondary storage.
    • Aims at improving data retrieval speed, providing higher throughput, efficient use of storage space, etc.
    • Sequential File Organization: the storage of records in a file in sequence according to a primary key value. Inserting a record in between existing records require resorting. Due to its inflexibility, it is not commonly used in production databases but may be used for backing up data from a database.
    • Indexed File Organization: the storage of records either sequentially or non-sequentially with an index that allows the software to locate individual records. It is more common to create an index on a secondary key (which is a column in a table for primarily used for searching) to speed up data retrieval.
    • Hashed File Organization: a storage system in which the address for each record is determined using a hashing algorithm (a routine that converts a primary key value into a relative record number or relative file address). It improves data retrieval rate by ensuring each hash leads to a smaller number of records.

Section 3A - Structured Query Language - Data Definition Language (SQL - DDL)

What is SQL?

  • Strctured Query Language (Pronunced as ‘S-Q-L’ or ‘Sequel’)
  • Standards: SQL-92, SQL:2003, SQL:2011, SQL:2016
  • RDBMS has Partial Compliance with Standards

Benefits of a Standardized Relational Language

  • Application Longevity
  • Reduced Dependence on a Single Vendor
  • Cross-system Communication
  • Application Portability

Types of SQL Commands

  • Data Definition Language (DDL)
    • commands used to define a database, including those for creating, altering, and dropping tables and establishing constraints.
    • CREATE DATABASE, CREATE TABLE, ALTER TABLE, DROP, etc.
  • Data Manipulation Language (DML)
    • commands used to maintain and query a database, including those for updating, inserting, modifying, and querying data.
    • INSERT INTO, DELETE FROM, UPDATE, SELECT, etc.
  • Data Control Language (DCL)
    • commands used to control a database, including those for administering privileges.
    • GRANT, REVOKE, etc.
  • COMMIT and ROLLBACK are classified as transaction control language (TCL) which is not mentioned in the lecture.

Creating Tables (Using the CREATE Command)

CREATE TABLE {table_name} (
    {col_name}    {col_definition} [, ...]
    {constraints} [, ...]
);

Explanation

  • {table_name}: the name of the relation to be created.
    • Remember the name of the relation must be unique within a database.
    • Name cannot be separated with space.
  • {col_name}: the name of the column/attribute to be created.
    • Remember the name of the attribute must be unique within a relation.
    • Name cannot be separated with space.
  • {col_definition}: consists of the column data type and the column constraints.
    • Common data types: NUMBER(x, y), VARCHAR2(x), DATE, TIMESTAMP
    • Common Column Constraints: DEFAULT {default_value}, NOT NULL
    • You must put a NOT NULL statement for the attribute intended to be a primary key.
  • {constraints}: define the table constraints. Must be in the form CONSTRAINT {constraint_name} {constraint_definition}
    • By convention, constraint_name is the name of the table + _ + shorthand of the constraint to be applied. (e.g. STUDENT_FK2)
    • Common constraints:
      • PRIMARY KEY ({field_name}),
      • FOREIGN KEY ({field_name}) REFERENCES {table_name} ({field_name})
      • CHECK ({condition})

Example

CREATE TABLE department (
    deptID         VARCHAR2(10)     NOT NULL,
    deptName     VARCHAR2(100)     NOT NULL,
    CONSTRAINT department_PK PRIMARY KEY (deptID)
);
CREATE TABLE student (
    -- studentID are numbers but not for calculation
    -- so VARCHAR2 is used instead to save space
    sID         VARCHAR2(8)     NOT NULL,
    firstName     VARCHAR2(30)     NOT NULL,
    middleName     VARCHAR2(30),
    lastName     VARCHAR2(30)     NOT NULL,
    DOB         DATE             NOT NULL,
    -- Notice the student.deptID must have the same
    -- data type of department.deptID
    deptID         VARCHAR2(10)     NOT NULL,
    CONSTRAINT student_PK PRIMARY KEY (sID),
    CONSTRAINT student_FK FOREIGN KEY (deptID) 
                            REFERENCES department (deptID),
    CONSTRAINT student_CHECKNAME CHECK (lastName = UPPER(lastName))
);

Changing Table Definition (Using ALTER Command)

ALTER TABLE {table_name}
-- One of the following commands
[ADD {col_name} {col_definition}]
[ADD CONSTRAINT {constraint_name} {constraint_definition}]
[MODIFY {col_name} {changed_col_def}]
[DROP COLUMN {col_name}]
[DROP CONSTRAINT {constraint_name}]
[RENAME COLUMN {col_name} TO {new_col_name}];

Notes

  • When adding a column with existing data, you cannot specify the NOT NULL constraint on the new column because you cannot fill the value. Instead, add the column, insert the value, and then specify the constraint by using ALTER TABLE ... ADD CONSTRAINT ... statement.
  • By the same token, you cannot add a constraint of NOT NULL to an existing column, if the column has NULL values.
  • To remove a NOT NULL constraint, use ALTER TABLE ... MODIFY {col_name} NULL;.
  • To remove a default value, use ALTER TABLE ... MODIFY {col_name} DEFAULT NULL;
  • You cannot change the data type of a column that is referencing or being referenced by, another column in a FOREIGN KEY relationship. Instead, drop the foreign key constraint, modify the data type of both columns, and add the foreign key constraint again.
  • By the same token, you cannot drop a column that is being referenced by another column.

Removing Tables (Using the DROP TABLE Command)

DROP TABLE {table_name};

Section 3B - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 1)

Inserting Records (Using the INSERT INTO Command)

-- Inserting values from users
INSERT INTO {table_name} [({col_order})]
VALUES {val_order};
-- Inserting values from the result of a query
INSERT INTO {table_name}
{select_stmt};

Explanation

  • {col_order}: specifies the columns to be inserted (the unsepecified columns will get NULL)
  • Characters are case-sensitive inside ' '. ' ' can be omitted for numerical values.

Deleting Records (Using the DELETE Command)

DELETE FROM {table_name}
[{where_condition}];

Updating Records (Using the UPDATE Command)

UPDATE {table_name}
SET {field_name} = {new_val} [, ...]
[{where_condition}];

Querying Records (Using the SELECT Command)

SELECT {column_names}
FROM {table_names}
[{where_condition}]
[GROUP BY {column_names}]
[HAVING {condtion}]
[ORDER BY {column_name}, ...];

Illustration - PRODUCT Relation

PRO_IDPRO_NamePRO_DescPRO_PricePRO_MemberDiscount
\dots\dots\dots\dots\dots
  1. Write a SQL statement to select all product information.
    SELECT * FROM PRODUCT;
  2. Write a SQL statement to select all products which name starts with ‘Deluxe’.
    SELECT * FROM PRODUCT
    WHERE PRO_Name LIKE 'Deluxe%';
  3. Write a SQL statement to select the product ID and product Name that a member cannot get a discount for.
    SELECT PRO_ID, PRO_Name FROM PRODUCT
    WHERE PRO_MemberDiscount < 1;
  4. Write a SQL statement to select the product ID and the member price of the product, ordered by the member price in descending order.
    SELECT PRO_ID, PRO_Price * PRO_MemberDiscount AS MemberPrice
    FROM PRODUCT
    ORDER BY PRO_Price * PRO_MemberDiscount DESC;

Using Aggregate Function in Query

COUNT()
MIN()
MAX()
SUM()
AVG()

Illustration - PRODUCT Relation

  1. Write a SQL statement to select the number of products in the system.
    SELECT COUNT(*) AS totalProduct
    FROM PRODUCT;
  2. Write a SQL statement to select the lowest member price in the system.
    SELECT MIN(PRO_Price * PRO_MemberDiscount) AS minMemberPrice
    FROM PRODUCT;
  3. Write a SQL statement to the number of products in the system, where the name of the product ends with ‘Deluxe’.
    SELECT COUNT(*) AS totalEndDeluxe
    FROM PRODUCT
    WHERE PRO_Name LIKE '%Deluxe';

Using Wildcards

  • * to represent all columns in SELECT statement.
  • % to represent any collection of characters.
  • _ to represent exactly one character.

Using Boolean Operators

  • NOT > AND > OR

Using Ranges

SELECT ... FROM ...
WHERE {column_name} BETWEEN {start_val} AND {end_val};
-- Values are inclusive.

Using IN and NOT IN Clause

SELECT ... FROM ...
WHERE {column_name} [NOT] IN ({list_of_val});

Using DISTINCT

SELECT DISTINCT {column_name}
FROM {table_name}
...;

Illustration - PRODUCT Relation

  1. Write a SQL statement to select the product ID of all products with the word ‘Deluxe’ (case-insensitive) in the name, and all ‘desk’ product with the price lower than 200.
    SELECT PRO_ID FROM PRODUCT
    WHERE LOWER(PRO_Name) LIKE '%deluxe%'
    OR LOWER(PRO_Name) LIKE '%desk%'
    AND PRO_Price < 200;
  2. Write a SQL statement to select the product ID of all product which provides 10% off or 20% off to members.
    SELECT PRO_ID FROM PRODUCT
    WHERE PRO_MemberDiscount IN (0.8, 0.9);
  3. Write a SQL statement to select the product ID of all product which does not provide 50% off or 60% off to members.
    SELECT PRO_ID FROM PRODUCT
    WHERE PRO_MemberDiscount NOT IN (0.4, 0.5);

Aggregating Result with GROUP BY Clause and HAVING Clause

  • If you write queries using the following rule, your queries will work: Each column referenced in the SELECT statement must be referenced in the GROUP BY clause, unless the column is an argument for an aggregate function included in the SELECT clause.

Illustration - STUDENT Relation

STU_IDSTU_FirstNameSTU_LastNameSTU_AdmittedYearMAJ_IDNAT_ID
\dots\dots\dots\dots\dots\dots
  • Assume that each student has one major only.
  1. Write a SQL statement to select the number of student in each major. Sort the result in descending order of the major first, then in ascending order with the major ID.
    SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
    FROM STUDENT
    GROUP BY MAJ_ID
    ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
  2. Modify the query in Example 11 to show majors that have over 150 students only.
    SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
    FROM STUDENT
    GROUP BY MAJ_ID
    HAVING COUNT(MAJ_ID) > 150
    ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
  3. Modify the query in Example 11 to show majors except 'IS' and 'OM'.
    -- Method 1: Filtering with WHERE
    SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
    FROM STUDENT
    WHERE MAJ_ID NOT IN ('IS', 'OM')
    GROUP BY MAJ_ID
    ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
    -- Method 2: Filtering through HAVING
    SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
    FROM STUDENT
    GROUP BY MAJ_ID
    HAVING MAJ_ID NOT IN ('IS', 'OM')
    ORDER BY COUNT(MAJ_ID) DESC, MAJ_ID [ASC];
  4. Write a SQL statement to show the number of student in each major who is admitted in 2016.
    SELECT MAJ_ID, COUNT(MAJ_ID) AS totalStudent
    FROM STUDENT
    WHERE STU_AdmittedYear = '2016'
    GROUP BY MAJ_ID;
  5. Write a SQL statement to show the number of student in each major from each country. Order by MAJ_ID in ascending order, then NAT_ID in ascending order.
    SELECT MAJ_ID, NAT_ID, COUNT(MAJ_ID) AS totalStudent
    FROM STUDENT
    GROUP BY MAJ_ID, NAT_ID
    ORDER BY MAJ_ID, NAT_ID;

Section 3C - Structured Query Language - Data Manipulation Language (SQL - DML) (Part 2)

Processing Multiple Tables

  • Join
    • a relational operation that temporarily combines two or more tables with common columns
    • the common columns in joined tables are usually the primary key of the dominant table and the foreign key of the dependent table.
  • Equi-join
    • based on equality between values in the common columns.
  • Natural join
    • an equi-join in which one of the duplicated common columns is eliminated in the result table.
  • Outer Join
    • a join in which rows that do not have matching values in common columns are nevertheless included in the result table.

Subquery

  • Location of possible subquery
    • In WHERE or HAVING clauses for conditional checking
    • As a virtual table of the FROM clause
  • Non-correlated Subquery - execute inner query once for the entire outer query
    • Commonly using IN or = before subquery
  • Correlated Subquery - execute inner query once for each row returned by the outer query
    • Commonly using EXISTS to return a boolean value.
    • Commonly using ANY and ALL with logical operators such as =, >, and <.

Illustration - STUDENT relation, MAJOR relation, DEPARTMENT relation, and NATIONALITY relation

STU_IDSTU_FirstNameSTU_LastNameSTU_AdmittedYearSTU_GPAMAJ_IDNAT_ID
\dots\dots\dots\dots\dots\dots\dots
MAJ_IDMAJ_NameMAJ_OfficeRoomDEP_ID
\dots\dots\dots\dots
DEP_IDDEP_NameDEP_HeadName
\dots\dots\dots
NAT_IDNAT_Name
\dots\dots
  • Assume that each student has one major only.
  1. Write a SQL statement to show the name of the student and the name of their major.
    SELECT s.STU_FirstName, s.STU_LastName, m.MAJ_Name
    FROM STUDENT s, MAJOR m
    WHERE s.MAJ_ID = m.MAJ_ID;
  2. Write a SQL statement to show the name of the major and the number of student in each major. Only show those majors with more than 150 students in the system.
    SELECT m.MAJ_Name, COUNT(s.STU_ID) AS totalStudent
    FROM MAJOR m, STUDENT s
    WHERE m.MAJ_ID = s.MAJ_ID
    GROUP BY m.MAJ_Name
    HAVING COUNT(s.STU_ID) > 150;
  3. Write a SQL statement to show the difference of GPA for each student against the average of all students.
    SELECT STU_GPA - avgGPA diffGPA
    FROM STUDENT, (SELECT AVG(STU_GPA) avgGPA FROM STUDENT);
  4. Write a SQL to show the name of the student, and the name of the department head of their respective department.
    SELECT s.STU_FirstName, s.STU_LastName, d.DEP_HeadName
    FROM STUDENT s, MAJOR m, DEPARTMENT d
    WHERE s.MAJ_ID = m.MAJ_ID
    AND m.DEP_ID = d.DEP_ID;
  5. Write a SQL statement to show the name of the department and the number of student in each department. Show only the department with over 200 students. Order by the number of student in descending order.
    SELECT d.DEP_Name, COUNT(s.STU_ID) AS totalStudent
    FROM DEPARTMENT d, MAJOR m, STUDENT s
    WHERE d.DEP_ID = m.DEP_ID
    AND m.MAJ_ID = s.MAJ_ID
    GROUP BY d.DEP_NAME
    HAVING COUNT(s.STU_ID) > 200
    ORDER BY COUNT(s.STU_ID) DESC;
  6. Using a non-correlated subquery, select the name(s) of the student(s) who get(s) the highest GPA.
    SELECT STU_FirstName, STU_LastName
    FROM STUDENT
    WHERE STU_GPA = (SELECT MAX(STU_GPA) FROM STUDENT);
  7. Using a correlated subquery, select the name(s) of the student(s) who get(s) the highest GPA.
    SELECT STU_FirstName, STU_LastName
    FROM STUDENT a
    WHERE a.STU_GPA >= ALL (
        SELECT STU_GPA
        FROM STUDENT b
        WHERE b.STU_GPA != a.STU_GPA
    );
  8. Write a SQL statement to show all names of nationality, and the number of student in each nation. Show all nationalities regardless whether there are students in that nation or not.
    SELECT n.NAT_Name, COUNT(s.STU_ID)
    FROM NATIONALITY n
    LEFT OUTER JOIN STUDENT s
    ON n.NAT_ID = s.NAT_ID
    GROUP BY n.NAT_ID;
  9. Using a non-correlated subquery, select the name of nationality which no students belong to.
    SELECT NAT_Name
    FROM NATIONALITY
    WHERE NAT_ID NOT IN (
        SELECT DISTINCT NAT_ID
        FROM STUDENT
    );
  10. Using a correlated subquery, select the name of nationality which no students belong to.
    SELECT NAT_Name
    FROM NATIONALITY n
    WHERE NOT EXISTS (
        SELECT * FROM STUDENT s
        WHERE s.NAT_ID = n.NAT_ID
    );

Section 4 - Data and Database Administration

Importance of Data/Database Administration

  • Data are corporate assets
  • Important assets of the organization need to be managed properly
  • Data are stored in databases
  • Hence, effective data and database administration provide support for managerial decision-making

Data/Database Administration

  • Data and Database Administration Functions
  • Managing Data Security
  • Backing Up Databases

Data Administration Functions

  • Data Administrator: a senior manager
  • Data Policies, Procedures, Standards
    • Policies: e.g. ‘every user must have a password’
    • Procedures: e.g. backup and recovery procedures
    • Standards: e.g. naming conventions
  • (IS) Planning
    • understand the information requirements of the organization
  • Data Conflict Resolution
    • resolve data ownership issues
  • Internal Marketing of Data Standards
    • reduce resistance to change in data standards
  • Database Administrator (DBA): a technical personnel
    • Implement the data policies, procedures, standards decided by data administrator
    • Managing data security, privacy, and integrity
    • Performing data backup and recovery
    • Selection of hardware and software
    • Installing and upgrading the DBMS
    • Tuning database performance

Managing Data Security

  • Database Security
    • protection of the data against accidental or intentional loss, destructino, or misuse
  • Increasingly Difficult to Manage
    • multiple channels for data access
    • corporate intranets
    • the Internet

Threats to Data Security

  • Accidental Losses
    • human error, software failure, hardware failure
  • Theft
  • Improper Data Access
    • loss of privacy (personal data)
    • loss of confidentiality (corporate data)
  • Loss of Data Integrity
    • data becomes invalid or corrupted
  • Loss of Availability
    • destruction of hardware, networks, or applications
    • virus attack

Data Management Software Security Features

  • Integrity Controls
    • enforced by the DBMS during querying and updating
  • Authorization Rules
    • identify users and restrict the actions they make take against a database
    • GRANT SELECT, UPDATE(price) ON PRODUCT_T TO Smith
  • Encryption Procedures
    • encode data in an unrecognizable form
  • Authentication Schemes
    • positively identify a person attempting to gain access to a database
    • fingerprints, voiceprints, eye scan, signature, secrete keys, digital certificates
  • Backup Databases
    • facilitate recovery procedures

Backing Up Databases

  • We backup databases for facilitating databases recovery
  • Database recovery is the mechanism for restoring a database quickly and accurately after loss or damage
  • Recovery Manager (a DBMS module)
    • resotres the database to a correct condition when a failure occurs
    • resumes processing user requests

Backup Facilities

  • An automatic dump facility that produces backup copy of the entire database
  • Periodic Backup
  • Cold Backup: database is shut down during backup
  • Hot Backup
    • selected portion of the database is shut down and backed up at a given time
    • useful when it is a critical system that must always remain available
  • Backups are stored in secure, off-site location

Journalizing Facilities

  • An audit trail of transactions and database changes
  • Transaction
    • a discrete unit of work that must be completely processed or not processed at all
    • e.g. entering a customer order
  • Transaction Log
    • record of essential data for each transaction processed against the database
  • Database Change Log
    • before- and after-images of records that have been modified by transactions
    • before-image is a copy of record before modification
    • after-image is a copy of record after modification

Recovery and Restart Procedures

Switch

  • Mirrored Databases
    • at least two copies of the database must be kept and updated simultaneously
    • implemented in RAID 1 systems
  • Procedures
    • when a disk failure occurs, system switches to mirrored disk
    • defective disk can be removed and replaced with a new disk
  • Advantages
    • no disruption in service; fastest recovery
    • popular as cost of secondary storage has dropped
  • Disadvantages
    • does not protect against lost of power or damage to both databases

Restore/Rerun

  • reprocessing the transactions (retrieved from the transaction log) against the backup copy of the database
  • Advantages
    • does not need to create a database change log
    • no special restart procedures required
  • Disadvantages
    • time to reprocess transactions may be long
  • used only as a last resort

Forward Recovery (rollforward)

  • starts with the backup copy of the database
  • apply after-images (the results of good transactions) retrieved from database change log
  • preferable to restore/rerun
    • do not need to reprocess each transaction
    • only the most recent after-image of a record need to be applied

Backward Recovery (rollback)

  • apply before-images of record that have been changed to the database
  • restore the database to an earlier state
  • used to reverse the changes made by transactions that have been aborted to terminated abnormally (restore to an earlier state)

Section 5 - Data Warehousing

Data and Competitve Advantage

  • Businesses have a lot of data in their databases
  • Leveraging Business Data
    • market basket analysis (what products tend to be purchased together)
    • risk analysis
    • targeted marketing

Data Warehouse

  • a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
  • Subject-oriented: high-level entities, e.g. customers, patients, students, products, time
  • Integrated: consistent naming conventions, formats, encoding structures from multiple data sources (both internal and external)
  • Time-variant: data contain a time dimension, may be used to study trends and changes
  • Non-updatable: read-only by end users, data loaded and periodically refreshed from operational systems

Enterprise Data Warehouse

  • A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications.

Data Mart

  • A data warehouse that is limited in scope, i.e. a mini-warehouse

Independent Data Mart

  • Each data mart obtains data from separate extract, transform, and load (ETL) processes from source data systems directly
  • No enterprise data warehouse (EDW)
  • Benefits
    • Shorter time to production
    • Lower complexity to build than EDW
  • Limitations
    • Separate ETL processes required for each data mart, which can yield redundant data and efforts
    • Data access complexity when accessing to multiple data marts for an enterprise-wide view of data

Dependent Data Mart

  • a data mart obtains data exclusively from the enterprise data warehouse and its reconciled data

Operational Data Store (ODS)

  • An integrated, subject-oriented, updateable, current-valued, detailed database designed to serve operational uses as they do decision support processing

Logical Data Mart

  • A data mart created by a relational view of a data warehouse
  • not physically separate databases
  • new data marts can be created quickly
  • always up-to-date since data is created when the view is referenced

Real-time Data Warehouse

  • An enterprise data warehouse that accepts near real-time feeds of transactional data from the operational systems
  • No need for a separate ODS, simpler architecture
  • Immediately transforms and loads the appropriate data into the warehouse
  • Provides near real-time access for the transaction rpcessing systems to an enterprise data warehouse

The User Interface of a Data Warehouse

  • Powerful, intuitive interface needed to help usrs easily access and analyze data
  • Querying Tools (e.g. Extended SQL)
  • Online Analytical Processing (OLAP) tools
    • Graphical tools that provides users with multidimensional views of data and allows analysis using simple windowing techniques
  • Data Mining Tools
    • Knowledge discovery (looking for patterns or trends) using techniques from traditional statistics, artificial intelligence
  • Data Visualization
    • Representing data in graphical and multimedia formats for analysis
    • To better observe trends and patterns