Data Structures

engineering<--

SC Data Structures

Data structures are specialized formats for organizing and storing data in a computer's memory or on disk. They are used to efficiently perform operations on data in various algorithms and applications. In computer science, data structures are divided into two main categories:

Primitive data types: These are the most basic types of data in computer science, and include simple values like integers, floating-point numbers, and characters.
Abstract data types: Also known as ADTs, these are complex structures that consist of multiple primitive data types and operations that can be performed on them. Examples of ADTs include linked lists, binary trees, hash tables, and stacks and queues.

Well known

The choice of data structure often depends on the specific problem being solved and the operations that are required to be performed on the data. Some structures are better suited to certain kinds of algorithms or data types, and may offer advantages in memory usage or processing time over others. These are most common data structures:

Arrays

An array is a collection of elements of same data type arranged in contiguous memory locations.

Features:

Stores a homogeneous collection of elements
Elements are stored in contiguous memory locations

Advantages:

Accessing elements has constant time complexity
Easy to implement and use

Use cases:

Storing and accessing a collection of specific size
Implementing dynamic programming algorithms

Disadvantages:

Insertion and deletion operations can be slow if the array needs to be resized
Insertion and deletion can be slow because items must often be shifted in memory.

Linked Lists

A list is a series of nodes, each containing some data and a pointer to the next node.

Features:

Stores a homogeneous or heterogeneous collection of elements
Elements are stored in non-contiguous memory locations and linked with pointers

Advantages:

Insertion and deletion have constant time complexity
Memory can be optimized by removing unused space

Use cases:

Implementing dynamic data structures
Implementing queues and stacks

Disadvantages:

Accessing elements has linear time complexity
Extra memory is required to store pointers

Stacks

A stack is a collection of elements, with push and pop operations that allow elements to be added or removed only from the top of the stack.

Features:

Stores a homogeneous or heterogeneous collection of elements
Elements are accessed using LIFO (last-in-first-out) ordering
Supports push and pop operations

Advantages:

Easy to implement and use
Can be used to reverse a sequence of elements

Use cases:

Implementing algorithms involving recursion
Implementing undo-redo functionality

Disadvantages:

Accessing elements other than the topmost element has linear time complexity
The stack size is limited by the available memory

Queues

A queue is a collection of elements, where elements are inserted at the back and removed from the front.

Features:

Stores a homogeneous or heterogeneous collection of elements
Elements are accessed using FIFO (first-in-first-out) ordering
Supports enqueue and dequeue operations

Advantages:

Easy to implement and use
Can be used to implement a buffering system

Use cases:

Implementing a waiting line system
Implementing a cache

Disadvantages:

Accessing elements other than the frontmost element has linear time complexity
The queue size is limited by the available memory

Heaps

A heaps is a complete binary tree in which every level of the tree is completely filled except for the last level, which is filled from left to right, and every node is greater than or equal to (or less than or equal to) its children, as per the type of heap.

Features:

Stores a homogeneous collection of elements
Elements are stored in a binary tree structure
Supports efficient insertion and removal of the minimum or maximum element

Advantages:

Efficient for finding the minimum or maximum element
Can be used to implement priority queues

Use cases:

Implementing algorithms such as heap sort and Dijkstra's algorithm
Implementing a priority queue

Disadvantages:

Accessing elements other than the topmost element has linear time complexity
The heap size is limited by the available memory

These data structures are widely used in computer science for various applications and algorithmic implementations. Let's analyze some features of these algorithms.

Trees

A tree is a hierarchical data structure consisting of nodes connected by edges, with the topmost node known as the root.

Features:

Stores a heterogeneous collection of elements
Elements are arranged in a hierarchical structure with a root node and children nodes

Advantages:

Efficient insertion, deletion, and search operations
Can be used to represent data with a hierarchical relationship

Use cases:

Implementing search algorithms such as binary search
Representing data with a parent-child relationship such as file systems

Disadvantages:

Can be difficult to implement and understand
The height of the tree can greatly affect the performance of operations

Graphs

A graph is a collection of vertices connected by edges, where each edge can have a weight or cost associated with it.

Features:

Stores a heterogeneous collection of elements
Elements are arranged in a network of vertices and edges

Advantages:

Can be used to represent complex relationships between data
Supports traversal algorithms such as DFS and BFS

Use cases:

Representing social network relationships
Implementing search algorithms such as Dijkstra's algorithm

Disadvantages:

Can be difficult to implement and optimize
Complexity can quickly become unmanageable with large graphs

Hash Tables

A Hash tables is an associative array data structure that maps keys to values, with a hash function used to compute an index into an array of buckets or slots.

Features:

Stores a homogeneous or heterogeneous collection of elements
Elements are accessed using key-value pairs
Uses a hash function to map keys to indexes in an array

Advantages:

Efficient insertion, deletion, and search operations on average
Supports dynamic resizing

Use cases:

Implementing a dictionary or mapping data structure
Speeding up search operations by using the right hash function

Disadvantages:

Worst-case time complexity can be slow if there are many hash collisions
Extra memory is required to handle collisions

Exotic Structures

Exotic data structures are less commonly used data structures that have specialized use-cases and are designed to solve specific problems. These data structures are not well-known or widely used like arrays, stacks, queues, linked lists, trees, and graphs, but they are still important concepts in computer science.

Bloom filters

Bloom filters are a probabilistic data structure used to determine whether an element is a member of a set. Bloom filters require minimal memory space and check false positives in return for false negatives.

Features:

Efficient data structure for probabilistic membership testing
Uses a hash table and multiple hash functions to check if an item is probably in a set
False positives are possible, but false negatives are not

Use cases:

Checking if an item is in a large set without storing the entire set
Detecting spam emails without checking every email against a list of known spam emails

HyperLogLog

HyperLogLog is another probabilistic data structure like Bloom filters. It uses a fixed amount of memory to estimate the number of distinct elements in a set with high accuracy even for large datasets.

Features:

Probabilistic data structure for estimating the cardinality of a set
Uses a hash function and bit manipulation to approximate the number of unique elements in a set

Use cases:

Counting distinct elements in a large dataset without having to store every element
Estimating the number of unique visitors to a website

Interval Trees

Interval trees are data structures used for searching intervals that contain or overlap a given point or interval. The Tree structures store intervals as leaf nodes and each internal Node stores the maximum end-point of the descendant intervals. This provides a fast way to find all the intervals in a tree that overlap with a given interval or point.

Features:

Data structure for quickly finding all intervals that overlap with a given interval
Uses a binary search tree and interval comparison to store and query intervals

Use cases:

Finding all schedule conflicts in a list of appointments
Finding all genomic intervals that contain a specific SNP

Tries

Tries are tree-like data structures used to store strings with keys that share common prefixes. Tries are useful for prefix matching and search facilities in a large amount of text data.

Features:

Data structure for storing a set of strings or other values with keys that share prefixes
Uses a tree structure with each node representing a prefix or a complete word

Use cases:

Implementing autocomplete functionality in a text editor or search engine
Storing dictionaries or other sets of words

Wavelet trees

Wavelet Trees are data structures that encode a large array of alphanumeric characters into a sequence of binary digits that can be queried using bitwise operations. Wavelet Trees exploit the repetition of characters to compress the data and to perform queries by multiple bitwise operations in log time.

Features:

Data structure for managing large amounts of data and answering various types of queries
Uses the wavelet matrix encoding for compressing data and representing queries as bit operations

Use cases:

Managing and querying large data sets such as genomic sequences or weblogs
Implementing full-text search functionality in a document database

String Representation

In programming, there are many alternative data structures that can be used as alternatives for internal representations of strings and large strings. Here are a few examples of suitable data structures:

Rope Data Structure: Ropes are a data structure that is used to efficiently store and manipulate large strings. Ropes are binary trees where each node contains a small string or a reference to another node. This data structure allows for fast string concatenation and substring operations.
Array of Strings: When dealing with a small number of fixed-size strings, an array of strings is a suitable data structure. This data structure stores each string as a separate element in an array. This method is straightforward and easy to implement and access.
Trie Data Structure: A Trie data structure is a useful alternative for storing large numbers of strings with common prefixes, like in a dictionary. Trie is tree data structure made of nodes that contains char/words as values or pointers to kids and end of word respectively. This data structure allows for fast retrieval of strings based on their prefix, but it can be memory-intensive if the strings have a large number of unique characters.
Hash Tables: Hash tables are another alternative for storing and manipulating large strings. Hash tables use a hash function to map strings to an array of buckets, with each bucket storing a linked list of strings. This data structure allows for fast string retrieval, but has a higher memory overhead than some other data structures.
Suffix Arrays: Suffix Arrays are useful when a large number of string lookups is required. It stores an array of suffixes of the input string, where each suffix is a sequence of characters that start from each index of the string. The suffixes are sorted into lexographic order. Exact sub-string search can be achieved using binary search. Suffix Arrays are useful for text compression and indexing.

The choice of which data structure to use depends on the specific requirements of the application, such as the size and number of strings, the operations needed to perform on the strings, and the available memory resources.

Read next: Algorithms