Aims
To provide detailed theoretical and practical
knowledge of how database management systems (DBMS) are programmed in SQL, how
DBMSs may be linked to form distributed databases, and how DBMSs
operate and are tuned to improve performance.
To provide exposure to how core concepts in databases may be applied and
developed to solve problems such as handling Big Data and Temporal Data.
Learning Outcomes
By the end of the course, a student will understand:
- understand the logic semantics of SQL queries, and how SQL is a
implementation of the relational algebra, and how SQL queries may be
rewritten and executed in different operational ways whilst preserving
their logic semantics.
- understand how to write SQL queries (and design DBMS schemas), and how
to make SQL queries that run efficiently, and understand and tune the operation of DBMS systems.
- understand how distributed databases are implemented, and how applications
can be designed for those distributed databases, scaling up to Big Data sized databases.
- How to query data using US logic (the logic of until and since),
and hence how the relational algebra can be extended to handle new
data modelling requirements.
Syllabus
- Advanced relational algebra and SQL
- Relational algebra primitive and derived operators
- Set vs bag semantics, NULL values, Distinct operator
- Semi join, left join, right join
- SQL constraints and triggers
- Data mining and OLAP operators: Group By, Roll Up, Cube, Pivot
- Relationally complete SQL and temporary tables
Storage and Query Processing
- Data storage in spanned and unspanned pages
- B+-tree indexes
- Block nested loop joins
- Index nested loop joins
- Hash joins
Transactions and Concurrency Control
- Recoverability and serialisabily of transactions
- Anomalies in transactions
- Two-phased locking, and requirements to avoid all types of anomalies
- SQL isolation levels
Distributed databases
- Horizontal/vertical fragmentation, data replication
- Basic distributed query processing
- Semi-join query processing
- Big Data and Pig Latin
- Distributed concurrency control
- Temporal Databases
- Valid time and transaction time
- Discrete bounded linear flow of time and the temporal structure
- US logic and derived modal operators
- Representation of US logic as a temporal relational algebra