engineering<--

Database Systems

A database is a special software that run on servers as daemon or service. A database is the key ingredient to create: dynamic websites, single page APPs, digital maps or distributed APPs. Database developers are highly appreciated in any software team.

Database applications

A database applications is a large software system in which a database play the most important role. For building such an application we use two tire or three tire architecture. First tire is the database itself, second is the application server and third is the Internet browser for web-app or native app.

To design a data centric application the architect will design the database first. Once the database has a design model, one or more software developers will create scripts and programs for the database.

Design elements include tables, relations between tables, validation rules, domains, triggers and stored procedure requirements. This design is usually done using ER diagrams with a program that can be provided by the database vendor.

Data Manipulation

Data centric applications can use a relational database or a document oriented database. If the programs for the databases are written as stored procedures there is a native manipulation of the data inside the database. When data manipulation is performed using an external program then there is a data transfer between the database and the program.

Data Oriented Languages

Most of computer languages are general purpose language. However for data centric applications there are specific dedicated languages. Every relational database has a different language.

PL/SQL – for Oracle database
pgSQL – for posgreSQL database
Transact-SQL – for Microsoft SQL server database
Microsoft Access - VBa

Other data scientific languages:

R Programming Language
Octave
Julia programming Language
MathLab

Impedance Mismatch

In computer science there are Object Oriented Languages like Java, C++, Python and other languages that can have a data model created in memory using classes. Relational databases can have a different data model using relational tables. The two model are not matching perfectly and there is an "impedance mismatch" that need an adapter to function. This adapter is called Object Relational Mapping (ORM) application.

See also: Object-relational mapping

For most popular computer languages there are available several ORM frameworks available for architects to choose from. For example Java has Hibernate, TopLink and Athena Framework.

See also: ORM Software

The ORM software is a key in data centric applications. Many applications have a bottleneck that is the ORM. Therefore is a good idea to avoid impedance mismatch and use a disruptive technology to do so.

To Avoid Impedance Mismatch we can use 3 different techniques.

Store data as complex objects using an Object Oriented database
Store data as documents using JSON notation into document oriented databases
Use data – oriented computer languages that are not object – oriented.

Data communication

A data-centric application must communicate with the client application. Depending on the client a data-centric application can use a native data protocol or micro-services. Some databases have tools that can create micro-services automatically and expose functions to read and write data into databases.

The most common approach is to use a database driver for example ODBC or JDBC driver. Some databases provide native drivers for C/C++ and other computer languages. In this case the application will send SQL and will receive data packages.

Most of the time an architect must design the interface for the database. This include the protocol and the data format for data exchange. Data protocol for communication can be XML based or JSON based for services.

Examples:

There are diverse types of applications that can be connected to a database.

Internal stored procedures
Batch processing applications
Jobs scheduled applications
Desktop applications with dedicated user interface
Universal tools having standard user interface
Web based user interface or APP
Mobile APPS

All these applications are named OLTP applications. (On-line Transnational Processing). Sometimes a data-centric application has a role of backup or history. In this case the database in an OLAP database or on-line Analytical Processing Applications. These kind of applications are becoming obsolete and are replaced by Big Data Analytic applications.

See also: Data-centric Architecture

Database Features

Next we will describe database features for several database engine. To know what features of databases will help you to make a better decision when you have to select a database. Some databases sacrifice some of the features looking for better performance or better usability.

Depending on the features implemented, a programmer have an easy time programming or not. The end user do not care about the features of your database. End users care about his experience. To be fast, secure and free. So what you do as a developer? Here we tell you. Let's get started with data types:

Data Types

Hold on a minute. What is data in Computer Science? Everything your computer does is to manipulate data. Once you understand what data actually is you will next understand more easy how to create programs that can manipulate your data.

most frequent data types

simple small integer number: 1, 2, 3
a very large integer number: 123456789012345678901234567890
a fractional number: 1.5
a simple character: A, B, C … a, b, c
a string of characters: "This is a string"
a string of Unicode symbols like: "∑∏∫±√×≤≥≠≈"
a very large string or text: "I can't give you an example but you can imagine one"
a date: 30-05-2016
a time: 11:30 AM
a JSON document
an XML document
an Object
an Image
a video file
a group of numbers representing spatial coordinates (GIS)
a bunch of text that represents something for somebody
a bunch of bits that represents something for somebody.

Query Language (SQL)

This is a declarative computer language specific to databases. It has 2 most important dialects: DML=Data Manipulation Language and DDL = Data Definition Language. You can study SQL as a programming language in next section:

Using SQL and PL/SQL we can create code stored with database. This helps to improve code reliability and maintenance. By linking the code with database, we can detect incompatibilities when we update the database structure. Stored code can become invalid and unusable. This is a signal that we need ot update the code.

JSON = Java Script Object Notation

This is a data format that is inspired from JavaScript object representation. It is a text format for data and is better then XML. Check out the definition: json.org. NoSQL databases are using JSON for storing data but there are few exceptions that use other kind of storage.

Document oriented databases can store only JSON objects. This is encapsulating a complex structure that is metadata information and the data itself, stored together as a document.

JSON is easy to parse and light weight. It comes from JavaScript Object Notation. I uses curly brackets and coma separated pairs. It is human readable better then XML.

JSON Example:

{
    ID: 10,
    Name: "John Doe",
    Age: 45,
    Children: ["Ana", "Maria", "Miky"]
}

Document oriented databases are sometimes called big-data or no-sql. These databases are fast due to 2 important techniques:

Distribute data over several computers;
Store related data together so there is no need for a join operation.

No SQL databases are usually very good at data replication and are using distributed storage. That means a database is replicated on multiple nodes and do not have a central server. This technique is sometimes called Map-Reduce or big data.

Triggers (TRG)

Most of databases have rules that can verify database integrity when data is modified or first time introduced into the database. These rules are small programs or expressions that are hosted by the database server and started by a particular event. For example when data is updated, deleted or inserted.

Rules are implemented as triggers and constraints. These can slow down a database input significantly but in general do not have a negative effect over the data search or data output. Sometimes rules are disabled to improve batch processing performance.

Stored procedures (SP)

Older databases in general SQL databases are using a customized language for data manipulation. This language can be used to store a program in the database. Therefore these procedures are called also "stored procedures".

In latest design new databases can use a general purpose language like Java, JavaScript or even Python or C++ to create native programs that are working much faster on server side applications instead of stored procedures. These programs are hosted by an application server as separated services.

Spatial layer (SL)

This feature is one of the most difficult to implement and show the power and the dedication of the company that supports the database.

The spatial feature permits storing x,y and z geographic coordinates or points is space. The storage permits also spatial relations between the points to form geometrical shapes like circle, triangle square, simple lines or curved lines.

Multi-User (MU)

Usually a database is a service application or daemon. This is a program that stay in memory after computer is started and most of the time start automatically with the server host operating system.

Most of databases are multi-user. However SQLite is a single user database and this database do not have a service. It is an embedded database, and this means user program can use the database compiled into first program.

Replication (RPL)

Data replication is a distinguishing feature of professional databases. This is used to copy data from one main server to one or more backup servers. When the main server is not operational the backup server is used.

Sometimes replication is used to improve performance of applications by using distributed computation. Replication can be uni-directional or bi-directional and can have a master server or can be master-less.

Database engines

A database engine is a program that is installed on a server and run automatically when server is started. This is known as a service or daemon. Once loaded in the memory it will perform input-output operations on a set of data files and will listen to requests on a specific TCP-IP port. Applications are connecting to this port and request data using SQL or send data modifications using DML.

Next is list of databases and the most important features. In my opinion these are the top 12 database engines. One database engine can support: SQL Language, JSON Data, Triggers, Stored Procedures, Spatial Functionality, Multi-User access or Replication.

#	Feature	SQL	NoSQL	TRG	SP	SPA	MU	RPL
1	Oracle	Y		Y	Y	Y	Y	Y
2	PostgreSQL	Y		Y	Y	Y	Y	Y
3	SQL Server	Y		Y	Y	Y	Y	Y
4	SQLite	Y
5	MySQL	Y		Y	Y	Y	Y	Y
6	MariaDB	Y		Y	Y	Y	Y	Y
7	MongoDB		Y	Y	Y	Y
8	Casandra		Y	Y	Y	Y
9	CouchDB		Y	Y	Y	Y
10	Neo4j		Y	Y	Y
11	RethinkDB		Y	Y	Y	Y
12	IBM DB2	Y		Y	Y	Y	Y	Y
13	Redis		Y
14	Amazon Aurora	Y		Y	Y	Y	Y	Y
15	Microsoft Access	Y		Y	Y		Y
16	Firebase		Y				Y
17	IBM Informix	Y		Y	Y	Y	Y	Y
18	Sureall		Y
19	Scylla		Y	Y	Y

Relational Databases

A relational database is a collection of data items organized as a set of tables from which data can be accessed or reassembled in many different ways using SQL language. A relational database is usually normalized.

In this kind of databases data is stored in tables. These are database objects used to organize data in rows and columns. Relational databases RDBMS is based on relational algebra. When we design a relational database we deal with normalization rules. Relational databases use SQL query language for data definition and data manipulation.

Popular implementations:

Database	Company	OOP	JSON
MySQL	Oracle Corporation	Yes	Yes (since MySQL 5.7)
PostgreSQL	PostgreSQL Global Development Group	Yes	Yes (with JSON data type)
Microsoft SQL Server	Microsoft	Yes	Yes (with JSON functions)
Oracle Database	Oracle Corporation	Yes	Yes (with JSON functions)
IBM DB2	IBM	Yes (with object-relational features)	No
MariaDB	MariaDB Corporation	Yes	Yes (since MariaDB 10.2)
SQLite	D. Richard Hipp	Yes (with some object-relational features)	Yes (with JSON1 extension)
Firebird	Firebird Project	Yes (with object-relational features)	No
Altibase	Altibase Corp.	Yes (with object-relational features)	No
Virtuoso	OpenLink Software	Yes (with object-relational features)	Yes (with RDF data type)

RDBMS databases are preferred for most applications. Architects prefer to use quality components in the application architecture to create a robust system. Relational databases are robust have good performance and are secure. Since the database can be a single point of failure for the entire system a reliable database is the key for having the system function properly with no interruption.

NoSQL Databases

Also known as Document Oriented or Big Data. These databases are storing information into a JSON format. This is unstructured data or document oriented data. Each record can have it's own structure. Data and structure are stored into a collection of documents.

NoSQL Database	Company	License
MongoDB	MongoDB, Inc.	Free and commercial versions
Couchbase Server	Couchbase, Inc.	Free and commercial versions
Apache Cassandra	Apache Software Foundation	Free and open-source
Redis	Redis Labs	Free and commercial versions
ArangoDB	ArangoDB GmbH	Free and commercial versions
Elasticsearch	Elastic N.V.	Free and commercial versions
RethinkDB	RethinkDB	Free and open-source
Amazon DynamoDB	Amazon Web Services	Commercial
Neo4j	Neo4j, Inc.	Free and commercial versions
MarkLogic	MarkLogic Corporation	Commercial
MongoDB Atlas	MongoDB, Inc.	Free and commercial versions
FaunaDB	Fauna, Inc.	Free and commercial versions

File based

The most primitive databases are file based. You store data in files. Usually each file contain one table that has columns and rows. The most popular file based database are DBase and FoxPro. These are considered obsolete now and there are virtual no companies that are looking for these programmers. I used to be a FoxPro developer but Microsoft has bought this product and has closed it in favor of Microsoft Access and SQL Server.

Graph Databases

There are few databases that store data into a graph. These databases are very good in representing relations between data elements. Neo4j is the most representative in this category. One of the most significant graph example is a network of roads. I have seen once a demonstration on Nokia Hekaton about this. Very promising technology!

In-memory Databases

Most of the databases store data on disk (HDD) very efficient for update/delete/insert operations. However there is a very small number of databases that have data primary organized in memory and very few disk access operations.

In memory databases are very fast (real time fast) but limited in size due to reduce RAM capacity. However in some cases the database is small and has to be ultra fast. Redis database is the most representative in this category:

Object Oriented

This is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented.

These databases can store complex data types like images, video streams or objects. These databases are called object oriented because they borrow something from OOP (Object Oriented Programming).

Hybrid Databases

This kind of databases can store multiple data types and have support for complex types. Oracle and PostgreSQL are two of most representative hybrid databases. These databases can organize data on rows or columns and can use a combination of local and cloud storage to improve performance for local applications versus remote storage.

By using a hybrid design one can create data centric applications that can respond faster to data retrieval and can improve overall performance of a system between 50% up to 100%. The secret is to store complex objects and avoid SQL JOIN operations.

How to Choose

Choosing the right database for your project can feel like a daunting task, especially with so many options available. However, there are a few key factors to consider when making this decision:

Data Type and Volume

First, consider the type of data you'll be working with and how much of it you'll have. For example, will you be storing mostly structured data or unstructured documents? Will you have millions or billions of records? This information will help you determine which databases are best suited to your project's needs.

Querying Requirements

Next, consider the types of queries you'll be running against your data. Do you need complex joins, aggregations, and data analysis capabilities? Different databases excel at different types of queries, so it's important to choose a database that can handle your specific needs.

Scalability and Performance

Another important factor to consider is scalability and performance. How will your database perform as your project grows and more users access it? Do you need a database that can handle a high volume of reads and writes, or do you require high availability?

Budget and Licensing

Budget is also a consideration when choosing a database. Some databases require commercial licenses, while others are open source and free to use. Be sure to factor in any licensing or hosting costs associated with your chosen database.

Development Requirements

Finally, consider the tools and libraries available for the database you're considering. Are there libraries and APIs available for your chosen language? Do they have good documentation and community support? These factors can make a big difference in the ease and speed of development.

Conclusion: By considering these factors, you can evaluate your options and choose the database that best meets your project's needs.

Comparison

Next we list a comparison between relational databases and noSQL databases. Knowing these things will help you decide for better usability or better performance when you select a database for your project.

Relational Databases

Advantages

Relational databases are highly structured and organized, making it easy to access and manage data.
Relational databases provide a standardized way of accessing data through SQL, which is understood by most data professionals.
Relational databases ensure data integrity by enforcing data constraints and relationships through various types of keys and indexes.

Disadvantages

Relational databases can be restrictive in terms of data flexibility, making it difficult to store unstructured data.
Relational databases can become difficult to scale if the dataset grows beyond a certain size.
Relational databases can be slow when handling complex queries that require many table joins.

Document-Oriented Databases

Advantages

Document-oriented databases are highly flexible and can store unstructured and semi-structured data without a predefined schema.
Document-oriented databases are easy to scale horizontally by adding more nodes to the database cluster.
Document-oriented databases are generally a good fit for projects that require high write speeds and complex queries that need to analyze large amounts of data.

Disadvantages

Document-oriented databases do not enforce any data constraints, which means that there is a higher risk of data inconsistency.
Document-oriented databases require developers to learn a new query language to access data, rather than using SQL, which can be a disadvantage for some teams.
Document-oriented databases can sometimes become slower when the dataset grows too large, as data access times can increase due to indexing overhead.

In summary, while relational databases are highly structured, organized, and well-suited for handling structured data, document-oriented databases are more flexible, scalable, and better suited for unstructured and semi-structured data. Ultimately, the choice between a relational and document-oriented database will depend on the specific needs of your project.

Read next: Database Design