DB Design

engineering<--

Database Design

When a designer start a new project it can start with the user interface (UI) or with Database Design (DB). Most of the time this is done using a design document that describe the database. After the document is approved a development team build the scripts to implement the database, and the applications around it.

Requirements

If you work in a traditional software company probably you will work in a team. Depending how the project is organized (scrum, agile or kanban) you may be asked to create requirements and presentations for the project.

If the project use large amount of data requirements will include a database and will make recommendations about the database structure but will not go in much details. Sometimes requirements will include a "Conceptual Model" (CM).

Database Models

A database model is a graphical representation of database elements. This is used by database designers to communicate high level design of database structure. There are 3 model types you can make before you implement a large database. You need to understand how to read these models if you wish to create a large project.

Conceptual Model
Logical Model
Phisical Model

Each model has its own merits but the most valuable is the Conceptual Model. This is the only model that I will show in details how to make using entry level tools.

Drawing Tools

Most database designers are using tools to create a database model diagrams. These tool are based on IBM UML notation: (Unified Modeling Language) and can connect to existing databases. You can use them to generate code for you. However some companies that are less traditional and more efficient are using paper and pencil for design.

On-line tools are implemented as SaaS (Software as a Service) platform. Yes there are websites that enable you to design on-line for a small price.One of these is DB Designer. Honestly I have no intention to use this tool for my open source projects. It can not export your diagram as SVG so I will pass thank you.

Conceptual Symbols

To represent an entity use a rectangle. To represent relations use a line. If you wish to represent the relation cardinality, use a conceptual notation for the line end. This consist of a bar, a circle or a crow foot like in the picture below:

Conceptual Symbols

Normalization

One of the most strong principles in relational database design is the principle of normalization. This is the process of organizing data in a database. There are two goals of the normalization process:

eliminating redundant data
ensuring comprehensive dependencies

Edgar Codd is the inventor of the relational model (RM). He has introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970. Codd went on to define the Second normal form (2NF) and Third normal form (3NF) in 1971

References to Wikipedia

The normalization process is using several normalization rules:

First normal form (1NF): This is a property of a relation in a relational database. A relation is in first normal form if and only if the domain of each attribute contains only atomic indivisible values, and the value of each attribute contains only a single value from that domain.

First normal form enforces these criteria:

Eliminate repeating groups in individual tables;
Create a separate table for each set of related data;
Identify each set of related data with a primary key.

Second normal form (2NF): A table that is in first normal form (1NF) must meet additional criteria for second normal form. Specifically: a table is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table.

A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table. Attributes that are part of a candidate kye are called prime attributes.

Third normal form (3NF): is a normal form that is used in normalizing a database design to reduce the duplication of data and ensure referential integrity by ensuring that:

1. the entity is in second normal form

2. all the attributes in a table are determined only by the candidate keys of that table and not by any non-prime attributes.

3NF was designed to improve database processing while minimizing storage costs.
3NF data modeling was ideal for online transaction processing (OLTP) applications with heavy order entry type of needs.

Most of applications use Third Normal Form (3NF).

These principles are very scientific and difficult to understand. However a good designer must know how to apply these principles. This comes within experience, with many try and errors.

De-normalization: Highly normalized databases are not always the best choice. Having data spread over many tables require joining them back together to make comprehensive SQL. Sometimes you have to de-normalize a table to improve query performance.

Conceptual Models

In Relational Algebra the concepts are Logical or Abstract. In a Relational database, we implement the Logical or Conceptual model using Tables. So here are some example of conceptual diagrams patterns that you can use do make a database model:

1:1 Relations

Note: The most simple relation is 1:1. That is for one record in main entity table we have one record into related table. This relation can have variations when the relation key is optional (can be null). We call 1:1 a strong relation and 0:0 relation a week relation.

1:M Relations

Note: The most common relation is 1:M. That is for one record in master entity we have zero, one or more records in a detail table.We enforce the cardinality between master and detail table by making mandatory or optional keys.Also some relations require to use triggers that verify the relation when we add or remove records.

Association

Note: The relation M:M can not exist in a relational database. We need to create an associate third table that is containing foreign keys from both other tables.This table can have a surrogate ID or it can use both foreign keys to create a composite primary key.

Inheritance

Note: The inheritance is a complex relation that is very similar to M:M relation, except that we may have multiple detail tables. Between the detail tables there is an exclusivity constraint. This constraint allows only one of detail entity to have associated main record. If this constraint do not exist, the inheritance is not pure.

Extension

Note: The extension model is very curious. It is similar to inheritance, except there are no exclusive constraints. This is called extension but in fact is a non-pure Inheritance. Modern databases enable you to store objects. This may improve your way of implementing Inheritance.

Detail Design

A database design must contain complete and precise information about all database objects. This is necessary for software development team to be able to create DDL scripts for making the database structure for a data centric application.

Content

ER Diagrams
List of tables
List of constraints for each table
List of triggers
List of indexes
List of views
Metadata

Note: Detail design of database do not contain description of stored procedures. This design is too complex to be included in database design document. So most Architects let the development team to decide what packages to implement.

Scripts

For implementing a database, developers will create DDL scripts. This is a subset of SQL that means: "Data Definition Language". This language enable creation and modification of objects in a database.

The order of scripts is very important. You can not create indexes for a table that do not exist. So here is the natural order of making objects in a database. First you make all the tables with primary keys and check constraints. Second you make the indexes. Third you make the foreign key constraints.

Indexes

Warning: A database with no indexes is slow. You need to design every index with a purpose. Too many indexes can slow down a database. There are rules to help you design the indexes.

Primary key must have an associated unique index.
Alternate keys must have an associated unique indexes.
Foreign keys must have associated indexes that are not unique.
Frequent filtered fields with many values must have non unique indexes
Frequent filtered fields with few values must have bitmap index
Associative tables may be index organized tables

Storage

Warning: Storage specification can be part of detailed design document. If this is missing, database administrator will have no choice but to enable default database storage. This may be not good enough for critical performance applications.

Metadata should be stored on small but very fast devices (SSD)
Critical data must be stored on redundant storage. (Mirror, RAID10)
Non critical data can be stored on non redundant storage. (RAID0)

Density

Warning: Ignorance is the seed of disasters. A good database architect must pay attention to data density and update frequency. A compact table is harder to update and can quickly loose performance. A parsed table can become too large and therefore slower than should be.

Tables that are often updated will enable 10-30% provision for every data block.
Metadata tables that are infrequent updated should have maximum 5% free block space.

Security: Critical databases must have good design for data access rules. This is done using a special script that is using DCL (Data Control Language). You will learn this language in our Programming Language tutorial for SQL.

Maintenance: Database administrator will perform no optimizations if this is not required by software development team or database architect. Therefore a good design document should include directives for database maintenance tasks, like frequency for index rebuild, and frequency for compacting and reorganization of table data to reset the free space.

Design Example

Next we present an example of a simple database model for Agile methodology. Let's consider a project management tool for a company that uses Agile Methodology to manage projects. First we need to know how Agile methodology works. Then we design a conceptual model for representing the data. We will show you how to create the Logical and the Conceptual models.

Agile Rules:

In an agile company one team is assigned a single project. A project consist of several epics and each epic is represented by several stories. One story may have several tasks to be completed. A developer can work in a single team and can be assigned to resolve multiple stories. A story need a resolver, so when we create a story it must be assigned immediately, otherwise we can not create stories.

Logic Model

Logic model shows two main entity tables {PROJECT, DEVELOPER}, two secondary entities {EPIC, TASK} and 2 associative tables {TEAM, STORY}. Between the entities we draw references by following Agile rules. For a better understanding of relations we have given some names.

Other Rules:

An epic has a business value from 0 to 10;
A story has a Fibonacci number: 1,2,3,5,8,13;
A story has to be resolved into one sprint;

Advice: If possible do not cross references one over the other. This will clutter the diagram and make it difficult to read. On a complex database not all entities and relation need to be present in a single diagram.

Table Design

At first we design tables with primary key and foreign keys. This is the high level design of database table. Tables may have additional attributes that will be part of Physical Model.

Table Type	Name	Primary Key	Foreign Keys
Entity	PROJECT	ID	OWNER_ID
Entity	DEVELOPER	ID	PROJECT_ID
Association	TEAM	ID	MASTER_ID
Detail	EPIC	ID	PROJECT_ID
Association	STORY	ID	EPIC_ID
Association	STORY	ID	REZOLVER_ID
Detail	TASK	ID	STORY_ID
Detail	TASK	ID	REVIEWER_ID

Relations:

One project has one project owner: Relation 1:1
One project has at least one epic: Relation 1:M
One epic has one or many stories: Relation 1:0-M
Each developer is member into a single team: Relation 1:M
One team is assigned to a single project: Relation 1:1
One story must be assigned to a developer: Relation 1:0-M.
One story has several tasks: Relation 1:0-M.
One story has optional a reviewer: Relation 0-1:0-M.

These relations are implementing several fundamental Agile rules. For some of the rules we need to implement triggers and attribute constraints. This can be done in the physical model. We do not show you the physical model in this article.

ER Diagram

ER diagram is used to show attributes, primary keys and foreign keys. This is also called physical model. This diagram is used by developers to create the database structure. Also can be used by developers who use the database to create SQL queries for applications.

ER-Diagram

Note: I have used drawio app to make this diagram. A good diagram can bust morale in your team and help you avoid many defects. For better productivity I advice you to use better DB tools. There are tools that can generate SQL scripts from diagrams.

Conclusion:

This has demonstrate how to make a data model to resolve a real problem. We have shown the Logical Model, and the Physical Model. Using these models you can understand the design of a database. Having a model will help you build better SQL to analyze your data.

Read next:Cybersecurity