Information Technology: The Relational Model

The Relational Model:--

1. The first database systems were based on the network and hierarchical models. The relational model was first proposed by E.F. Codd in 1970 and the first such systems (notably INGRES and System/R) was developed in 1970s. The relational model is now the dominant model for commercial data processing applications.

Structure of Relational Database
1. A relational database consists of a collection of tables, each having a unique name.
A row in a table represents a relationship among a set of values.
Thus a table represents a collection of relationships.
2. There is a direct correspondence between the concept of a table and the mathematical concept of a relation. A substantial theory has been developed for relational databases.

Basic Structure
1. Following figure shows the deposit and customer tables for our banking example.

The deposit and customer relations.
o It has four attributes.
o For each attribute there is a permitted set of values, called the domain of that attribute.
o E.g. the domain of bname is the set of all branch names.
Let denote the domain of bname, and , and the remaining attributes' domains respectively.
Then, any row of deposit consists of a four-tuple where

In general, deposit contains a subset of the set of all possible rows.
That is, deposit is a subset of

In general, a table of n columns must be a subset of

2. Mathematicians define a relation to be a subset of a Cartesian product of a list of domains. We can see the correspondence with our tables.
We will use the terms relation and tuple in place of table and row from now on.
3. Some more formalities:
o let the tuple variable refer to a tuple of the relation .
o We say to denote that the tuple is in relation .
o Then [bname] = [1] = the value of on the bname attribute.
o So [bname] = [1] = ``Downtown'',
o and [cname] = [3] = ``Johnson''.

4. We'll also require that the domains of all attributes be indivisible units.
o A domain is atomic if its elements are indivisible units.
o For example, the set of integers is an atomic domain.
o The set of all sets of integers is not.
o Why? Integers do not have subparts, but sets do - the integers comprising them.
o We could consider integers non-atomic if we thought of them as ordered lists of digits.

Database Scheme
1. We distinguish between a database scheme (logical design) and a database instance (data in the database at a point in time).
2. A relation scheme is a list of attributes and their corresponding domains.
3. Here, we’ve used the following conventions:
o italics for all names
o lowercase names for relations and attributes
o names beginning with an uppercase for relation schemes
For example, the relation scheme for the deposit relation:
o Deposit-scheme = (bname, account#, cname, balance)
We may state that deposit is a relation on scheme Deposit-scheme by writing deposit(Deposit-scheme).
If we wish to specify domains, we can write:
o (bname: string, account#: integer, cname: string, balance: integer).

Following figure shows the E-R diagram for a banking enterprise.

E-R diagram for the banking enterprise
4. The relation schemes for the banking example used throughout the text are:
o Branch-scheme = (bname, assets, bcity)
o Customer-scheme = (cname, street, ccity)
o Deposit-scheme = (bname, account#, cname, balance)
o Borrow-scheme = (bname, loan#, cname, amount)
Note: some attributes appear in several relation schemes (e.g. bname, cname). This is legal, and provides a way of relating tuples of distinct relations.

5. Why not put all attributes in one relation?
Suppose we use one large relation instead of customer and deposit:
o Account-scheme = (bname, account#, cname, balance, street, ccity)
o If a customer has several accounts, we must duplicate her or his address for each account.
o If a customer has an account but no current address, we cannot build a tuple, as we have no values for the address.
o We would have to use null values for these fields.
o Null values cause difficulties in the database.
o By using two separate relations, we can do this without using null values

Keys
1. The notions of superkey, candidate key and primary key all apply to the relational model.
2. For example, in Branch-scheme,
o {bname} is a superkey.
o {bname, bcity} is a superkey.
o {bname, bcity} is not a candidate key, as the superkey {bname} is contained in it.
o {bname} is a candidate key.
o {bcity} is not a superkey, as branches may be in the same city.
o We will use {bname} as our primary key.
3. The primary key for Customer-scheme is {cname}.
4. More formally, if we say that a subset of is a superkey for , we are restricting consideration to relations in which no two distinct tuples have the same values on all attributes in . In other words,
o If and are in , and
o ,
o then .

The Relational Algebra
1. The relational algebra is a procedural query language.

o Six fundamental operations:
 select (unary)
 project (unary)
 rename (unary)
 cartesian product (binary)
 union (binary)
 set-difference (binary)

o Several other operations, defined in terms of the fundamental operations:
 set-intersection
 natural join
 division
 assignment

o Operations produce a new relation as a result.

Fundamental Operations
1. The Select Operation
Select selects tuples that satisfy a given predicate. Select is denoted by a lowercase Greek sigma ( ), with the predicate appearing as a subscript. The argument relation is given in parentheses following the .
For example, to select tuples (rows) of the borrow relation where the branch is ``SFU'', we would write

Let the following figures be the borrow and branch relations in the banking example.

The borrow and branch relations.
The new relation created as the result of this operation consists of one tuple: .
We allow comparisons using =, , <, , > and in the selection predicate.

We also allow the logical connectives (or) and (and). For example:

The client relation.
Suppose there is one more relation, client, shown in figure above, with the scheme

we might write

to find clients who have the same name as their banker.

2. The Project Operation
Project copies its argument relation for the specified attributes only. Since a relation is a set, duplicate rows are eliminated.
Projection is denoted by the Greek capital letter pi ( ). The attributes to be copied appear as subscripts.
For example, to obtain a relation showing customers and branches, but ignoring amount and loan#, we write

We can perform these operations on the relations resulting from other operations.
To get the names of customers having the same name as their bankers,

Think of select as taking rows of a relation, and project as taking columns of a relation.

3. The Cartesian Product Operation
The cartesian product of two relations is denoted by a cross ( ), written

The result of is a new relation with a tuple for each possible pairing of tuples from and .
In order to avoid ambiguity, the attribute names have attached to them the name of the relation from which they came. If no ambiguity will result, we drop the relation name.
The result is a very large relation. If has tuples, and has tuples, then will have tuples.
The resulting scheme is the concatenation of the schemes of and , with relation names added as mentioned.

To find the clients of banker Johnson and the city in which they live, we need information in both client and customer relations. We can get this by writing

However, the customer.cname column contains customers of bankers other than Johnson.
We want rows where client.cname = customer.cname. So we can write

to get just these tuples.
Finally, to get just the customer's name and city, we need a projection:

4. The Rename Operation
The rename operation solves the problems that occur with naming when performing the cartesian product of a relation with itself.
Suppose we want to find the names of all the customers who live on the same street and in the same city as Smith.
We can get the street and city of Smith by writing

To find other customers with the same information, we need to reference the customer relation again:

where is a selection predicate requiring street and ccity values to be equal.

Problem: how do we distinguish between the two street values appearing in the Cartesian product, as both come from a customer relation?
Solution: use the rename operator, denoted by the Greek letter rho ( ).

We write

to get the relation under the name of .
If we use this to rename one of the two customer relations we are using, the ambiguities will disappear.

5. The Union Operation
The union operation is denoted as in set theory. It returns the union (set union) of two compatible relations.
For a union operation to be legal, we require that
o and must have the same number of attributes.
o The domains of the corresponding attributes must be the same.
To find all customers of the SFU branch, we must find everyone who has a loan or an account or both at the branch.
We need both borrow and deposit relations for this:

6. The Set Difference Operation
Set difference is denoted by the minus sign ( ). It finds tuples that are in one relation, but not in another.
Thus results in a relation containing tuples that are in but not in .
To find customers of the SFU branch who have an account there but no loan, we write

We can do more with this operation. Suppose we want to find the largest account balance in the bank.
Strategy:
o Find a relation containing the balances not the largest.
o Compute the set difference of and the deposit relation.
To find , we write

This resulting relation contains all balances except the largest one.

Now we can finish our query by taking the set difference:

Formal Definition of Relational Algebra
1. A basic expression consists of either
o A relation in the database.
o A constant relation.
2. General expressions are formed out of smaller subexpressions using
o select (p a predicate)
o project (s a list of attributes)
o rename (x a relation name)
o union
o set difference
o cartesian product

Additional Operations
1. Additional operations are defined in terms of the fundamental operations. They do not add power to the algebra, but are useful to simplify common queries.

2. The Set Intersection Operation
Set intersection is denoted by , and returns a relation that contains tuples that are in both of its argument relations.
It does not add any power as

To find all customers having both a loan and an account at the SFU branch, we write

3. The Natural Join Operation
Often we want to simplify queries on a cartesian product.
For example, to find all customers having a loan at the bank and the cities in which they live, we need borrow and customer relations:

Our selection predicate obtains only those tuples pertaining to only one cname.
This type of operation is very common, so we have the natural join, denoted by a sign. Natural join combines a cartesian product and a selection into one operation. It performs a selection forcing equality on those attributes that appear in both relation schemes. Duplicates are removed as in all relation operations.
To illustrate, we can rewrite the previous query as

We can now make a more formal definition of natural join.
o Consider and to be sets of attributes.
o We denote attributes appearing in both relations by .
o We denote attributes in either or both relations by .
o Consider two relations and .
o The natural join of and , denoted by is a relation on scheme .
o It is a projection onto of a selection on where the predicate requires for each attribute in .
Formally,

where .
To find the assets and names of all branches which have depositors living in Stamford, we need customer, deposit and branch relations:

To find all customers who have both an account and a loan at the SFU branch:

This is equivalent to the set intersection version we wrote earlier. We see now that there can be several ways to write a query in the relational algebra.
If two relations and have no attributes in common, then , and .

4. The Division Operation
Division, denoted , is suited to queries that include the phrase ``for all''.
Suppose we want to find all the customers who have an account at all branches located in Brooklyn.
Strategy: think of it as three steps.
We can obtain the names of all branches located in Brooklyn by

We can also find all cname, bname pairs for which the customer has an account by

Now we need to find all customers who appear in with every branch name in .
The divide operation provides exactly those customers:

which is simply .

5. The Assignment Operation
Sometimes it is useful to be able to write a relational algebra expression in parts using a temporary relation variable (as we did with and in the division example).
The assignment operation, denoted , works like assignment in a programming language.

No extra relation is added to the database, but the relation variable created can be used in subsequent expressions. Assignment to a permanent relation would constitute a modification to the database.

Information Technology

Wednesday, September 1, 2010

The Relational Model

No comments:

Post a Comment