Overall System Structure:--
1. Database systems are partitioned into modules for different functions. Some functions (e.g. file systems) may be provided by the operating system.
2. Components include:
o File manager manages allocation of disk space and data structures used to represent information on disk.
o Database manager: The interface between low-level data and application programs and queries.
o Query processor translates statements in a query language into low-level instructions the database manager understands. (May also attempt to find an equivalent but more efficient form.)
o DML precompiler converts DML statements embedded in an application program to normal procedure calls in a host language. The precompiler interacts with the query processor.
o DDL compiler converts DDL statements to a set of tables containing metadata stored in a data dictionary.
In addition, several data structures are required for physical system implementation:
o Data files: store the database itself.
o Data dictionary: stores information about the structure of the database. It is used heavily. Great emphasis should be placed on developing a good design and efficient implementation of the dictionary.
o Indices: provide fast access to data items holding particular values.
3. Figure 1.6 shows these components.
The relational model does not use pointers or links, but relates records by the values they contain. This allows a formal mathematical foundation to be defined
Query Languages
1. A query language is a language in which a user requests information from a database. These are typically higher-level than programming languages.
They may be one of:
o Procedural, where the user instructs the system to perform a sequence of operations on the database. This will compute the desired information.
o Nonprocedural, where the user specifies the information desired without giving a procedure for obtaining the information.
2. A complete query language also contains facilities to insert and delete tuples as well as to modify parts of existing tuples.
Functional Dependencies
Basic Concepts
1. Functional dependencies.
o Functional dependencies are a constraint on the set of legal relations in a database.
o They allow us to express facts about the real world we are modeling.
o The notion generalizes the idea of a superkey.
o Let and .
o Then the functional dependency holds on R if in any legal relation r(R), for all pairs of tuples and in r such that , it is also the case that .
o Using this notation, we say K is a superkey of R if .
o In other words, K is a superkey of R if, whenever , then (and thus ).
2. Functional dependencies allow us to express constraints that cannot be expressed with superkeys.
3. Consider the scheme
4. Loan-info-schema = (bname, loan#, cname, amount)
5.
if a loan may be made jointly to several people (e.g. husband and wife) then we would not expect loan# to be a superkey. That is, there is no such dependency
loan# cname
We do however expect the functional dependency
loan# amount
loan# bname
to hold, as a loan number can only be associated with one amount and one branch.
6. A set F of functional dependencies can be used in two ways:
o To specify constraints on the set of legal relations. (Does F hold on R?)
o To test relations to see if they are legal under a given set of functional dependencies. (Does r satisfy F?)
7. Figure 6.2 shows a relation r that we can examine.
Figure 6.2: Sample relation r.
8. We can see that is satisfied (in this particular relation), but is not. is also satisfied.
9. Functional dependencies are called trivial if they are satisfied by all relations.
10. In general, a functional dependency is trivial if .
11. In the customer relation of figure 5.4, we see that is satisfied by this relation. However, as in the real world two cities can have streets with the same names (e.g. Main, Broadway, etc.), we would not include this functional dependency in our list meant to hold on Customer-scheme.
12. The list of functional dependencies for the example database is:
o On Branch-scheme:
o bname bcity
o
o bname assets
o
o On Customer-scheme:
o cname ccity
o
o cname street
o
o On Loan-scheme:
o loan# amount
o
o loan# bname
o
o On Account-scheme:
o account# balance
o
o account# bname
o
There are no functional dependencies for Borrower-schema, nor for Depositor-schema
Closure of a Set of Functional Dependencies
1. We need to consider all functional dependencies that hold. Given a set F of functional dependencies, we can prove that certain other ones also hold. We say these ones are logically implied by F.
2. Suppose we are given a relation scheme R=(A,B,C,G,H,I), and the set of functional dependencies:
A B
A C
CG H
CG I
B H
Then the functional dependency is logically implied.
3. To see why, let and be tuples such that
As we are given A B , it follows that we must also have
Further, since we also have B H , we must also have
Thus, whenever two tuples have the same value on A, they must also have the same value on H, and we can say that A H .
4. The closure of a set F of functional dependencies is the set of all functional dependencies logically implied by F.
5. We denote the closure of F by .
6. To compute , we can use some rules of inference called Armstrong's Axioms:
o Reflexivity rule: if is a set of attributes and , then holds.
o Augmentation rule: if holds, and is a set of attributes, then holds.
o Transitivity rule: if holds, and holds, then holds.
7. These rules are sound because they do not generate any incorrect functional dependencies. They are also complete as they generate all of .
8. To make life easier we can use some additional rules, derivable from Armstrong's Axioms:
o Union rule: if and , then holds.
o Decomposition rule: if holds, then and both hold.
o Pseudotransitivity rule: if holds, and holds, then holds.
9. Applying these rules to the scheme and set F mentioned above, we can derive the following:
o A H, as we saw by the transitivity rule.
o CG HI by the union rule.
o AG I by several steps:
Note that A C holds.
Then AG CG , by the augmentation rule.
Now by transitivity, AG I .
(You might notice that this is actually pseudotransivity if done in one step.)
Closure of Attribute Sets
1. To test whether a set of attributes is a superkey, we need to find the set of attributes functionally determined by .
2. Let be a set of attributes. We call the set of attributes determined by under a set F of functional dependencies the closure of under F, denoted .
3. The following algorithm computes :
4. result :=
5.
6. while (changes to result) do
7.
8. for each functional dependency
9.
10. in F do
11.
12. begin
13.
14. if result
15.
16. then result := result ;
17.
18. end
19.
20. If we use this algorithm on our example to calculate then we find:
o We start with result = AG.
o A B causes us to include B in result.
o A C causes result to become ABCG.
o CG H causes result to become ABCGH.
o CG I causes result to become ABCGHI.
o The next time we execute the while loop, no new attributes are added, and the algorithm terminates.
21. This algorithm has worst case behavior quadratic in the size of F. There is a linear algorithm that is more complicated.
Canonical Cover
1. To minimize the number of functional dependencies that need to be tested in case of an update we may restrict F to a canonical cover .
2. A canonical cover for F is a set of dependencies such that F logically implies all dependencies in , and vice versa.
3. must also have the following properties:
o Every functional dependency in contains no extraneous attributes in (ones that can be removed from without changing ). So A is extraneous in if and
logically implies .
o Every functional dependency in contains no extraneous attributes in (ones that can be removed from without changing ). So A is extraneous in if and
logically implies .
o Each left side of a functional dependency in is unique. That is there are no two dependencies and in such that .
4. To compute a canonical cover for F,
5. repeat
6.
7. Use the union rule to replace dependencies of the form
8.
9. and
10. with
11. .
12.
13. Find a functional dependency with an
14.
15. extraneous attribute in or in .
16.
17. If an extraneous attribute is found, delete it from
18.
19.
20. until F does not change.
21.
22. An example: for the relational scheme R=(A,B,C), and the set F of functional dependencies
23. A BC
24.
25. B C
26.
27. A B
28.
29. AB C
30.
we will compute .
o We have two dependencies with the same left hand side:
o A BC
o
o A B
o
We can replace these two with just A BC .
o A is extraneous in AB C because B C logically implies AB C .
o Then our set is
o A BC
o
o B C
o
o We still have an extraneous attribute on the right-hand side of the first dependency. C is extraneous in A BC because A B and B C logically imply that A BC .
o So we end up with
o A B
o
o B C
Dependency Preservation
1. Another desirable property in database design is dependency preservation.
o We would like to check easily that updates to the database do not result in illegal relations being created.
o It would be nice if our design allowed us to check updates without having to compute natural joins.
o To know whether joins must be computed, we need to determine what functional dependencies may be tested by checking each relation individually.
o Let F be a set of functional dependencies on schema R.
o Let be a decomposition of R.
o The restriction of F to is the set of all functional dependencies in that include only attributes of .
o Functional dependencies in a restriction can be tested in one relation, as they involve attributes in one relation schema.
o The set of restrictions is the set of dependencies that can be checked efficiently.
o We need to know whether testing only the restrictions is sufficient.
o Let .
o F' is a set of functional dependencies on schema R, but in general, .
o However, it may be that .
o If this is so, then every functional dependency in F is implied by F', and if F' is satisfied, then F must also be satisfied.
o A decomposition having the property that is a dependency-preserving decomposition.
2. The algorithm for testing dependency preservation follows this method:
compute
for each schema in D do
begin
:= the restriction of to ;
end
for each restriction do
begin
end
compute ;
if ( ) then return (true)
else return (false);
3. We can now show that our decomposition of Lending-schema is dependency preserving.
o The functional dependency
o bname assets bcity
o
can be tested in one relation on Branch-schema.
o The functional dependency
o loan# amount bname
o
can be tested in Loan-schema.
4. As the above example shows, it is often easier not to apply the algorithm shown to test dependency preservation, as computing takes exponential time.
5. An Easier Way To Test For Dependency Preservation
Really we only need to know whether the functional dependencies in F and not in F' are implied by those in F'.
In other words, are the functional dependencies not easily checkable logically implied by those that are?
Rather than compute and , and see whether they are equal, we can do this:
o Find F - F', the functional dependencies not checkable in one relation.
o See whether this set is obtainable from F' by using Armstrong's Axioms.
o This should take a great deal less work, as we have (usually) just a few functional dependencies to work on.
Use this simpler method on exams and assignments (unless you have exponential time available to you).
Repetition of Information
1. Our decomposition does not suffer from the repetition of information problem.
o Branch and loan data are separated into distinct relations.
o Thus we do not have to repeat branch data for each loan.
o If a single loan is made to several customers, we do not have to repeat the loan amount for each customer.
o This lack of redundancy is obviously desirable.
o We will see how this may be achieved through the use of normal forms.
Boyce-Codd Normal Form
1. A relation schema R is in Boyce-Codd Normal Form (BCNF) with respect to a set F of functional dependencies if for all functional dependencies in of the form , where and , at least one of the following holds:
o is a trivial functional dependency (i.e. ).
o is a superkey for schema R.
2. A database design is in BCNF if each member of the set of relation schemas is in BCNF.
3. Let's assess our example banking design:
Customer-schema = (cname, street, ccity)
cname street ccity
Branch-schema = (bname, assets, bcity)
bname assets bcity
Loan-info-schema = (bname, cname, loan#, amount)
loan# amount bname
Customer-schema and Branch-schema are in BCNF.
4. Let's look at Loan-info-schema:
o We have the non-trivial functional dependency loan# amount, and
o loan# is not a superkey.
o Thus Loan-info-schema is not in BCNF.
o We also have the repetition of information problem.
o For each customer associated with a loan, we must repeat the branch name and amount of the loan.
o We can eliminate this redundancy by decomposing into schemas that are all in BCNF.
5. If we decompose into
Loan-schema = (bname, loan#, amount)
Borrow-schema = (cname, loan#)
we have a lossless-join decomposition. (Remember why?)
To see whether these schemas are in BCNF, we need to know what functional dependencies apply to them.
For Loan-schema, we have loan# amount bname applying.
Only trivial functional dependencies apply to Borrow-schema.
Thus both schemas are in BCNF.
We also no longer have the repetition of information problem. Branch name and loan amount information are not repeated for each customer in this design.
6. Now we can give a general method to generate a collection of BCNF schemas.
result := ;
done := false;
compute ;
while (not done) do
if (there is a schema in result that is not in BCNF)
then begin
let be a nontrivial
functional dependency that holds on
such that is not in ,
and ;
result = (result ;
end
else done = true;
7. This algorithm generates a lossless-join BCNF decomposition. Why?
o We replace a schema with and .
o The dependency holds on .
o .
o So we have , and thus a lossless join.
8. Let's apply this algorithm to our earlier example of poor database design:
Lending-schema = (bname, assets, bcity, loan#, cname, amount)
The set of functional dependencies we require to hold on this schema are
bname assets bcity
loan# amount bname
A candidate key for this schema is {loan#, cname}.
We will now proceed to decompose:
The functional dependency
bname assets bcity
holds on Lending-schema, but bname is not a superkey.
We replace Lending-schema with
Branch-schema = (bname, assets, bcity)
Loan-info-schema = (bname, loan#, cname, amount)
Branch-schema is now in BCNF.
The functional dependency
loan# amount bname
holds on Loan-info-schema, but loan# is not a superkey.
We replace Loan-info-schema with
Loan-schema = (bname, loan#, amount)
Borrow-schema = (cname, loan#)
These are both now in BCNF.
We saw earlier that this decomposition is both lossless-join and dependency-preserving.
9. Not every decomposition is dependency-preserving.
o Consider the relation schema
o Banker-schema = (bname, cname, banker-name)
o
o The set F of functional dependencies is
o banker-name bname
o
o cname bname banker-name
o
o The schema is not in BCNF as banker-name is not a superkey.
o If we apply our algorithm, we may obtain the decomposition
o Banker-branch-schema = (bname, banker-name)
o
o Cust-banker-schema = (cname, banker-name)
o
o The decomposed schemas preserve only the first (and trivial) functional dependencies.
o The closure of this dependency does not include the second one.
o Thus a violation of cname bname banker-name cannot be detected unless a join is computed.
This shows us that not every BCNF decomposition is dependency-preserving.
10. It is not always possible to satisfy all three design goals:
o BCNF.
o Lossless join.
o Dependency preservation.
11. We can see that any BCNF decomposition of Banker-schema must fail to preserve
cname bname banker-name
12. Some Things To Note About BCNF
o There is sometimes more than one BCNF decomposition of a given schema.
o The algorithm given produces only one of these possible decompositions.
o Some of the BCNF decompositions may also yield dependency preservation, while others may not.
o Changing the order in which the functional dependencies are considered by the algorithm may change the decomposition.
o For example, try running the BCNF algorithm on
o
o
o
o
o
o
o
o
Then change the order of the last two functional dependencies and run the algorithm again. Check the two decompositions for dependency preservation.
Normalization Using Multivalued Dependencies
1. Suppose that in our banking example, we had an alternative design including the schema:
2. BC-schema = (loan#, cname, street, ccity)
3.
We can see this is not BCNF, as the functional dependency
cname street ccity
holds on this schema, and cname is not a superkey.
4. If we have customers who have several addresses, though, then we no longer wish to enforce this functional dependency, and the schema is in BCNF.
5. However, we now have the repetition of information problem. For each address, we must repeat the loan numbers for a customer, and vice versa.
Multivalued Dependencies
1. Functional dependencies rule out certain tuples from appearing in a relation.
If A B, then we cannot have two tuples with the same A value but different B values.
2. Multivalued dependencies do not rule out the existence of certain tuples.
Instead, they require that other tuples of a certain form be present in the relation.
3. Let R be a relation schema, and let and .
The multivalued dependency
holds on R if in any legal relation r(R), for all pairs of tuples and in r such that , there exist tuples and in r such that:
4. Figure 7.5 (textbook 6.10) shows a tabular representation of this. It looks horrendously complicated, but is really rather simple. A simple example is a table with the schema (name, address, car), as shown in Figure 7.6.
Figure 7.5: Tabular representation of .
Figure 7.6: (name, address, car) where and .
o Intuitively, says that the relationship between and is independent of the relationship between and .
o If the multivalued dependency is satisfied by all relations on schema R, then we say it is a trivial multivalued dependency on schema R.
o Thus is trivial if or .
5. Look at the example relation bc relation in Figure 7.7 (textbook 6.11).
Figure 7.7: Relation bc, an example of redundancy in a BCNF relation.
o We must repeat the loan number once for each address a customer has.
o We must repeat the address once for each loan the customer has.
o This repetition is pointless, as the relationship between a customer and a loan is independent of the relationship between a customer and his or her address.
o If a customer, say ``Smith'', has loan number 23, we want all of Smith's addresses to be associated with that loan.
o Thus the relation of Figure 7.8 (textbook 6.12) is illegal.
o If we look at our definition of multivalued dependency, we see that we want the multivalued dependency
o cname street ccity
o
to hold on BC-schema.
Figure 7.8: An illegal bc relation.
6. Note that if a relation r fails to satisfy a given multivalued dependency, we can construct a relation r' that does satisfy the multivalued dependency by adding tuples to r.
Theory of Multivalued Dependencies
1. We will need to compute all the multivalued dependencies that are logically implied by a given set of multivalued dependencies.
o Let D denote a set of functional and multivalued dependencies.
o The closure of D is the set of all functional and multivalued dependencies logically implied by D.
o We can compute from D using the formal definitions, but it is easier to use a set of inference rules.
2. The following set of inference rules is sound and complete. The first three rules are Armstrong's axioms from Chapter 5.
1. Reflexivity rule: if is a set of attributes and , then holds.
2. Augmentation rule: if holds, and is a set of attributes, then holds.
3. Transitivity rule: if holds, and holds, then holds.
4. Complementation rule: if holds, then holds.
5. Multivalued augmentation rule: if holds, and and , then holds.
6. Multivalued transitivity rule: if holds, and holds, then holds.
7. Replication rule: if holds, then .
8. Coalescence rule: if holds, and , and there is a such that and and , then holds.
3. An example of multivalued transitivity rule is as follows. and . Thus we have , where .
An example of coalescence rule is as follows. If we have , and , then we have .
4. Let's do an example:
o Let R=(A,B,C,G,H,I) be a relation schema.
o Suppose holds.
o The definition of multivalued dependencies implies that if , then there exists tuples and such that:
o
o
o
o
o
o
o
o
o
o The complementation rule states that if then .
o Tuples and satisfy if we simply change the subscripts.
5. We can simplify calculating , the closure of D by using the following rules, derivable from the previous ones:
o Multivalued union rule: if holds and holds, then holds.
o Intersection rule: if holds and holds, then holds.
o Difference rule: if holds and holds, then holds and holds.
6. An example will help:
Let R=(A,B,C,G,H,I) with the set of dependencies:
We list some members of :
o : since , complementation rule implies that , and R - B - A = CGHI.
o : Since and , multivalued transitivity rule implies that .
o : coalescence rule can be applied. holds, and and , so we can satisfy the coalescence rule with being B, being HI, being CG, and being H. We conclude that .
o : now we know that and . By the difference rule, .
Fourth Normal Form (4NF)
1. We saw that BC-schema was in BCNF, but still was not an ideal design as it suffered from repetition of information. We had the multivalued dependency cname street ccity, but no non-trivial functional dependencies.
2. We can use the given multivalued dependencies to improve the database design by decomposing it into fourth normal form.
3. A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in of the form , where and , at least one of the following hold:
o is a trivial multivalued dependency.
o is a superkey for schema R.
4. A database design is in 4NF if each member of the set of relation schemas is in 4NF.
5. The definition of 4NF differs from the BCNF definition only in the use of multivalued dependencies.
o Every 4NF schema is also in BCNF.
o To see why, note that if a schema is not in BCNF, there is a non-trivial functional dependency holding on R, where is not a superkey.
o Since implies , by the replication rule, R cannot be in 4NF.
6. We have an algorithm similar to the BCNF algorithm for decomposing a schema into 4NF:
result := ;
done := false;
compute ;
while (not done) do
if (there is a schema in result
that is not in 4NF)
then begin
let be a nontrivial multivalued
dependency that holds on such that
is not in , and
;
result =
end
else done = true;
7. If we apply this algorithm to BC-schema:
o cname loan# is a nontrivial multivalued dependency and cname is not a superkey for the schema.
o We then replace BC-schema by two schemas:
o Cust-loan-schema=(cname, loan#)
o
o Customer-schema=(cname, street, ccity)
o
o These two schemas are in 4NF.
8. We can show that our algorithm generates only lossless-join decompositions.
o Let R be a relation schema and D a set of functional and multivalued dependencies on R.
o Let and form a decomposition of R.
o This decomposition is lossless-join if and only if at least one of the following multivalued dependencies is in :
o
o
o
o We saw similar criteria for functional dependencies.
o This says that for every lossless-join decomposition of R into two schemas and , one of the two above dependencies must hold.
o You can see, by inspecting the algorithm, that this must be the case for every decomposition.
9. Dependency preservation is not as simple to determine as with functional dependencies.
o Let R be a relation schema.
o Let be a decomposition of R.
o Let D be the set of functional and multivalued dependencies holding on R.
o The restriction of D to is the set consisting of:
All functional dependencies in that include only attributes of .
All multivalued dependencies of the form where and is in .
o A decomposition of schema R is dependency preserving with respect to a set D of functional and multivalued dependencies if for every set of relations such that for all i, satisfies , there exists a relation r(R) that satisfies D and for which for all i.
10. What does this formal statement say? It says that a decomposition is dependency preserving if for every set of relations on the decomposition schema satisfying only the restrictions on D there exists a relation r on the entire schema R that the decomposed schemas can be derived from, and that r also satisfies the functional and multivalued dependencies.
11. We'll do an example using our decomposition algorithm and check the result for dependency preservation.
o Let R=(A,B,C,G,H,I).
o Let D be
o
o
o
o
o
o R is not in 4NF, as we have and A is not a superkey.
o The algorithm causes us to decompose using this dependency into
o
o
o
o is now in 4NF, but is not.
o Applying the multivalued dependency (how did we get this?), our algorithm then decomposes into
o
o
o
o is now in 4NF, but is not.
o Why? As is in (why?) then the restriction of this dependency to gives us .
o Applying this dependency in our algorithm finally decomposes into
o
o
o
o The algorithm terminates, and our decomposition is and .
12. Let's analyze the result.
Figure 7.9: Projection of relation r onto a 4NF decomposition of R.
o This decomposition is not dependency preserving as it fails to preserve .
o Figure 7.9 (textbook 6.14) shows four relations that may result from projecting a relation onto the four schemas of our decomposition.
o The restriction of D to (A,B) is and some trivial dependencies.
o We can see that satisfies as there are no pairs with the same A value.
o Also, satisfies all functional and multivalued dependencies since no two tuples have the same value on any attribute.
o We can say the same for and .
o So our decomposed version satisfies all the dependencies in the restriction of D.
o However, there is no relation r on (A,B,C,G,H,I) that satisfies D and decomposes into and .
o Figure 7.10 (textbook 6.15) shows .
o Relation r does not satisfy .
o Any relation s containing r and satisfying must include the tuple .
o However, includes a tuple that is not in .
o Thus our decomposition fails to detect a violation of .
Figure 7.10: A relation r(R) that does not satisfy .
13. We have seen that if we are given a set of functional and multivalued dependencies, it is best to find a database design that meets the three criteria:
o 4NF.
o Dependency Preservation.
o Lossless-join.
14. If we only have functional dependencies, the first criteria is just BCNF.
15. We cannot always meet all three criteria. When this occurs, we compromise on 4NF, and accept BCNF, or even 3NF if necessary, to ensure dependency preservation.
No comments:
Post a Comment