More Topics

More Topics
Read More, Do More and Earn More

NORMAL FORMS

Normalization is the process of efficiently organizing data in a database. Ultimately, there are two goals of the normalization process. The first is to remove the same data in more that one table (redundant data). The second is to make sure data dependencies makes sense by having only related data stored in the same table. The two ultimate goals highlighted are important as they help reduce the amount of space a database consumes and ensure that data is logically stored.

Normal Forms and Normalization

  • A normal form is a property of a database schema.
  • When a database schema is un-normalized (that is, does not satisfy the normal form), it allows redundancies of various types which can lead to anomalies and inconsistencies.
  • Normal forms can serve as basis for evaluating the quality of a database schema and constitutes a useful tool for database design.
  • Normalization is a procedure that transforms an un-normalized schema into a normalized one.

Examples of Redundancy

Anomalies

The value of the salary of an employee is repeated in every tuple where the employee is mentioned, leading to a redundancy. Redundancies lead to anomalies:

  • If salary of an employee changes, we have to modify the value in all corresponding tuples (update anomaly.)
  • If an employee ceases to work in projects, but stays with company, all corresponding tuples are deleted, leading to loss of information (deletion anomaly.)
  • A new employee cannot be inserted in the relation until the employee is assigned to a project (insertion anomaly.)

What’s Wrong???

  • We are using a single relation to represent data of very different types.
  • In particular, we are using a single relation to store the following types of entities, relationships and attributes:

=> Employees and their salaries;

=> Projects and their budgets;

=> Participation of employees in projects, along with their functions.

  • To set the problem on a formal footing, we introduce the notion of functional dependency (FD).

Functional Dependencies (FDs) in the Example

  • Each employee has a unique salary. We represent this dependency as

Employee → Salary

and  say  "Salary  functionally  depends  on Employee".

  • This means that everywhere we have the same Employee attribute value, we also get the same Salary attribute value.
  • Likewise,

Project → Budget

i.e., each project has a unique budget.

Another Example


How Do We Eliminate Redundancy?

  • Decomposition: Use two relations to store Person information:

=> Person1 (SI#, Name, Address)

=> Hobbies (SI#, Hobby)

The decomposition is more general: people with hobbies can now be described independently of their name and address.

No update anomalies:

=> Name and address stored once;

=> A hobby can be separately supplied or deleted;

=> We can represent persons with no hobbies.

 

Normalization Through Decomposition

  • A relation that is not in 3NF, can be replaced with one or more normalized relations using normalization.
  • We can eliminate redundancies and anomalies for the example relation

Emp(Employee,Salary,Project,Budget,Function)

if we replace it with the three relations obtained by projections on the sets of attributes corresponding to the three functional dependencies:

=> Employee → Salary;

=> Project → Budget.

=> Employee,Project → Function.


…Start with…

Result of Normalization



Desirable Qualities for Decompositions

Decompositions should always satisfy the properties of lossless decomposition and dependency preservation:

  • Lossless decomposition ensures that the information in the original relation can be accurately reconstructed based on the information represented in the decomposed relations.

  • Dependency preservation ensures that the decomposed relations have the same capacity to represent the integrity constraints as the original relations and therefore to reveal illegal updates.


Normalization Drawbacks


  • By limiting redundancy, normalization helps maintain consistency and saves space.
  • But performance of querying can suffer because related information that was stored in a single relation is now distributed among several


No comments:

Post a Comment