Data Normalization

761 Words2 Pages

Data Normalization

Data normalization is an important step in any database development process. Through this tedious process a developer can eliminate duplication and develop standards by which all data can be measured. This paper addresses the history and function of data normalization as it applies to the course at hand.

In 1970, Dr. E.F. Codd's seminal paper "A Relational Model for Large Shared Databanks" was published in Communications of the ACM. This paper introduced the topic of data normalization, so-named because, at the time, President Nixon was normalizing relations with China.

Data normalization is a technique used during logical data modeling to ensure that there is only one way to know a fact, by removing all structures that provide more than one way to know the same fact as represented in a database relation (table). The goal of normalization is to control and eliminate redundancy, and mitigate the effects of modification anomalies -- which are generally insertion and deletion anomalies. (Insertion anomalies occur when the storage of information about one attribute requires additional information about a second attribute. Deletion anomalies occur when the deletion of one fact results in the loss of a second fact).

Normalization

There are six generally recognized normal forms of a relation: first normal form, second normal form, third normal form, Boyce/Codd normal form, fourth normal form, and fifth normal form, also called projection/join normal form. Other normal forms (e.g., Domain/Key) exist but will not be discussed here. The normal forms are hierarchical, i.e., each normal form builds upon its predecessor. Although many people consider a relation to be normalized only when it is in third normal form, technically speaking, a relation in only first normal form can be considered normalized.

The Normal Forms

First normal form (1NF) - All attributes must be atomic. That is, there can exist no repeating groups in an attribute. For example, in a relation that describes a student, the student's classes should not be stored in one field, separated by commas. Rather, the classes should be moved to their own relation, which should include a link back to the student relation (called a foreign key).

Second normal form (2NF) - A relation is in second normal form if it is in first normal form and each attribute is fully functionally dependent on the entire primary key. That is, no subset of the key can determine an attribute's value.

Third normal form (3NF) - A relation is in third normal form if it is in second normal form and each non-key attribute is fully functionally dependent on the entire primary key, and not on any other non-key attribute.

Open Document