A database applications is a large software system in which a database play the most important role. For building such an application we use two tire or three tire architecture. First tire is the database itself, second is the application server and third is the Internet browser for web-app or native app.
To design a data centric application the architect will design the database first. Once the database has a design model, one or more software developers will create scripts and programs for the database.
Design elements include tables, relations between tables, validation rules, domains, triggers and stored procedure requirements. This design is usually done using ER diagrams with a program that can be provided by the database vendor.
Data centric applications can use a relational database or a document oriented database. If the programs for the databases are written as stored procedures there is a native manipulation of the data inside the database. When data manipulation is performed using an external program then there is a data transfer between the database and the program.
Most of computer languages are general purpose language. However for data centric applications there are specific dedicated languages. Every relational database has a different language.
In computer science there are Object Oriented Languages like Java, C++, Python and other languages that can have a data model created in memory using classes. Relational databases can have a different data model using relational tables. The two model are not matching perfectly and there is an "impedance mismatch" that need an adapter to function. This adapter is called Object Relational Mapping (ORM) application.
See also: Object-relational mapping
For most popular computer languages there are available several ORM frameworks available for architects to choose from. For example Java has Hibernate, TopLink and Athena Framework.
See also: ORM Software
The ORM software is a key in data centric applications. Many applications have a bottleneck that is the ORM. Therefore is a good idea to avoid impedance mismatch and use a disruptive technology to do so.
To Avoid Impedance Mismatch we can use 3 different techniques.
A data-centric application must communicate with the client application. Depending on the client a data-centric application can use a native data protocol or micro-services. Some databases have tools that can create micro-services automatically and expose functions to read and write data into databases.
The most common approach is to use a database driver for example ODBC or JDBC driver. Some databases provide native drivers for C/C++ and other computer languages. In this case the application will send SQL and will receive data packages.
Most of the time an architect must design the interface for the database. This include the protocol and the data format for data exchange. Data protocol for communication can be XML based or JSON based for services.
There are diverse types of applications that can be connected to a database.
All these applications are named OLTP applications. (On-line Transnational Processing). Sometimes a data-centric application has a role of backup or history. In this case the database in an OLAP database or on-line Analytical Processing Applications. These kind of applications are becoming obsolete and are replaced by Big Data Analytic applications.
See also: Data-centric Architecture
Next we will describe database features for several database engine. To know what features of databases will help you to make a better decision when you have to select a database. Some databases sacrifice some of the features looking for better performance or better usability.
Depending on the features implemented, a programmer have an easy time programming or not. The end user do not care about the features of your database. End users care about his experience. To be fast, secure and free. So what you do as a developer? Here we tell you. Let's get started with data types:
Hold on a minute. What is data in Computer Science? Everything your computer does is to manipulate data. Once you understand what data actually is you will next understand more easy how to create programs that can manipulate your data.
most frequent data types
This is a declarative computer language specific to databases. It has 2 most important dialects: DML=Data Manipulation Language and DDL = Data Definition Language. You can study SQL as a programming language in next section:
Using SQL and PL/SQL we can create code stored with database. This helps to improve code reliability and maintenance. By linking the code with database, we can detect incompatibilities when we update the database structure. Stored code can become invalid and unusable. This is a signal that we need ot update the code.
This is a data format that is inspired from JavaScript object representation. It is a text format for data and is better then XML. Check out the definition: json.org. NoSQL databases are using JSON for storing data but there are few exceptions that use other kind of storage.
Document oriented databases can store only JSON objects. This is encapsulating a complex structure that is metadata information and the data itself, stored together as a document.
JSON is easy to parse and light weight. It comes from JavaScript Object Notation. I uses curly brackets and coma separated pairs. It is human readable better then XML.
JSON Example:
{ ID: 10, Name: "John Doe", Age: 45, Children: ["Ana", "Maria", "Miky"] }
Document oriented databases are sometimes called big-data or no-sql. These databases are fast due to 2 important techniques:
Most of databases have rules that can verify database integrity when data is modified or first time introduced into the database. These rules are small programs or expressions that are hosted by the database server and started by a particular event. For example when data is updated, deleted or inserted.
Rules are implemented as triggers and constraints. These can slow down a database input significantly but in general do not have a negative effect over the data search or data output. Sometimes rules are disabled to improve batch processing performance.
Older databases in general SQL databases are using a customized language for data manipulation. This language can be used to store a program in the database. Therefore these procedures are called also "stored procedures".
In latest design new databases can use a general purpose language like Java, JavaScript or even Python or C++ to create native programs that are working much faster on server side applications instead of stored procedures. These programs are hosted by an application server as separated services.
This feature is one of the most difficult to implement and show the power and the dedication of the company that supports the database.
The spatial feature permits storing x,y and z geographic coordinates or points is space. The storage permits also spatial relations between the points to form geometrical shapes like circle, triangle square, simple lines or curved lines.
Usually a database is a service application or daemon. This is a program that stay in memory after computer is started and most of the time start automatically with the server host operating system.
Most of databases are multi-user. However SQLite is a single user database and this database do not have a service. It is an embedded database, and this means user program can use the database compiled into first program.
Data replication is a distinguishing feature of professional databases. This is used to copy data from one main server to one or more backup servers. When the main server is not operational the backup server is used.
Sometimes replication is used to improve performance of applications by using distributed computation. Replication can be uni-directional or bi-directional and can have a master server or can be master-less.
A database engine is a program that is installed on a server and run automatically when server is started. This is known as a service or daemon. Once loaded in the memory it will perform input-output operations on a set of data files and will listen to requests on a specific TCP-IP port. Applications are connecting to this port and request data using SQL or send data modifications using DML.
Next is list of databases and the most important features. In my opinion these are the top 12 database engines. One database engine can support: SQL Language, JSON Data, Triggers, Stored Procedures, Spatial Functionality, Multi-User access or Replication.
# | Feature | SQL | NoSQL | TRG | SP | SPA | MU | RPL |
---|---|---|---|---|---|---|---|---|
1 | Oracle | Y | Y | Y | Y | Y | Y | |
2 | PostgreSQL | Y | Y | Y | Y | Y | Y | |
3 | SQL Server | Y | Y | Y | Y | Y | Y | |
4 | SQLite | Y | ||||||
5 | MySQL | Y | Y | Y | Y | Y | Y | |
6 | MariaDB | Y | Y | Y | Y | Y | Y | |
7 | MongoDB | Y | Y | Y | Y | |||
8 | Casandra | Y | Y | Y | Y | |||
9 | CouchDB | Y | Y | Y | Y | |||
10 | Neo4j | Y | Y | Y | ||||
11 | RethinkDB | Y | Y | Y | Y | |||
12 | IBM DB2 | Y | Y | Y | Y | Y | Y | |
13 | Redis | Y | ||||||
14 | Amazon Aurora | Y | Y | Y | Y | Y | Y | |
15 | Microsoft Access | Y | Y | Y | Y | |||
16 | Firebase | Y | Y | |||||
17 | IBM Informix | Y | Y | Y | Y | Y | Y | |
18 | Sureall | Y | ||||||
19 | Scylla | Y | Y | Y |
In this kind of databases data is stored in tables. These are database objects used to organize data in rows and columns. Relational databases RDBMS is based on relational algebra. When we design a relational database we deal with normalization rules. Relational databases use SQL query language for data definition and data manipulation.
Database | Company | OOP | JSON |
---|---|---|---|
MySQL | Oracle Corporation | Yes | Yes (since MySQL 5.7) |
PostgreSQL | PostgreSQL Global Development Group | Yes | Yes (with JSON data type) |
Microsoft SQL Server | Microsoft | Yes | Yes (with JSON functions) |
Oracle Database | Oracle Corporation | Yes | Yes (with JSON functions) |
IBM DB2 | IBM | Yes (with object-relational features) | No |
MariaDB | MariaDB Corporation | Yes | Yes (since MariaDB 10.2) |
SQLite | D. Richard Hipp | Yes (with some object-relational features) | Yes (with JSON1 extension) |
Firebird | Firebird Project | Yes (with object-relational features) | No |
Altibase | Altibase Corp. | Yes (with object-relational features) | No |
Virtuoso | OpenLink Software | Yes (with object-relational features) | Yes (with RDF data type) |
Also known as Document Oriented or Big Data. These databases are storing information into a JSON format. This is unstructured data or document oriented data. Each record can have it's own structure. Data and structure are stored into a collection of documents.
NoSQL Database | Company | License |
---|---|---|
MongoDB | MongoDB, Inc. | Free and commercial versions |
Couchbase Server | Couchbase, Inc. | Free and commercial versions |
Apache Cassandra | Apache Software Foundation | Free and open-source |
Redis | Redis Labs | Free and commercial versions |
ArangoDB | ArangoDB GmbH | Free and commercial versions |
Elasticsearch | Elastic N.V. | Free and commercial versions |
RethinkDB | RethinkDB | Free and open-source |
Amazon DynamoDB | Amazon Web Services | Commercial |
Neo4j | Neo4j, Inc. | Free and commercial versions |
MarkLogic | MarkLogic Corporation | Commercial |
MongoDB Atlas | MongoDB, Inc. | Free and commercial versions |
FaunaDB | Fauna, Inc. | Free and commercial versions |
The most primitive databases are file based. You store data in files. Usually each file contain one table that has columns and rows. The most popular file based database are DBase and FoxPro. These are considered obsolete now and there are virtual no companies that are looking for these programmers. I used to be a FoxPro developer but Microsoft has bought this product and has closed it in favor of Microsoft Access and SQL Server.
There are few databases that store data into a graph. These databases are very good in representing relations between data elements. Neo4j is the most representative in this category. One of the most significant graph example is a network of roads. I have seen once a demonstration on Nokia Hekaton about this. Very promising technology!
Most of the databases store data on disk (HDD) very efficient for update/delete/insert operations. However there is a very small number of databases that have data primary organized in memory and very few disk access operations.
In memory databases are very fast (real time fast) but limited in size due to reduce RAM capacity. However in some cases the database is small and has to be ultra fast. Redis database is the most representative in this category:
These databases can store complex data types like images, video streams or objects. These databases are called object oriented because they borrow something from OOP (Object Oriented Programming).
This kind of databases can store multiple data types and have support for complex types. Oracle and PostgreSQL are two of most representative hybrid databases. These databases can organize data on rows or columns and can use a combination of local and cloud storage to improve performance for local applications versus remote storage.
By using a hybrid design one can create data centric applications that can respond faster to data retrieval and can improve overall performance of a system between 50% up to 100%. The secret is to store complex objects and avoid SQL JOIN operations.
First, consider the type of data you'll be working with and how much of it you'll have. For example, will you be storing mostly structured data or unstructured documents? Will you have millions or billions of records? This information will help you determine which databases are best suited to your project's needs.
Next, consider the types of queries you'll be running against your data. Do you need complex joins, aggregations, and data analysis capabilities? Different databases excel at different types of queries, so it's important to choose a database that can handle your specific needs.
Another important factor to consider is scalability and performance. How will your database perform as your project grows and more users access it? Do you need a database that can handle a high volume of reads and writes, or do you require high availability?
Budget is also a consideration when choosing a database. Some databases require commercial licenses, while others are open source and free to use. Be sure to factor in any licensing or hosting costs associated with your chosen database.
Finally, consider the tools and libraries available for the database you're considering. Are there libraries and APIs available for your chosen language? Do they have good documentation and community support? These factors can make a big difference in the ease and speed of development.
Conclusion: By considering these factors, you can evaluate your options and choose the database that best meets your project's needs.
Next we list a comparison between relational databases and noSQL databases. Knowing these things will help you decide for better usability or better performance when you select a database for your project.
Read next: Database Design