SC Data Structures
Data structures are specialized formats for organizing and storing data in a computer's memory or on disk. They are used to efficiently perform operations on data in various algorithms and applications. In computer science, data structures are divided into two main categories:
- Primitive data types: These are the most basic types of data in computer science, and include simple values like integers, floating-point numbers, and characters.
- Abstract data types: Also known as ADTs, these are complex structures that consist of multiple primitive data types and operations that can be performed on them. Examples of ADTs include linked lists, binary trees, hash tables, and stacks and queues.
Page bookmarks
Well known
The choice of data structure often depends on the specific problem being solved and the operations that are required to be performed on the data. Some structures are better suited to certain kinds of algorithms or data types, and may offer advantages in memory usage or processing time over others. These are most common data structures:
Arrays
An array is a collection of elements of same data type arranged in contiguous memory locations.
Features:
- Stores a homogeneous collection of elements
- Elements are stored in contiguous memory locations
Advantages:
- Accessing elements has constant time complexity
- Easy to implement and use
Use cases:
- Storing and accessing a collection of specific size
- Implementing dynamic programming algorithms
Disadvantages:
- Insertion and deletion operations can be slow if the array needs to be resized
- Insertion and deletion can be slow because items must often be shifted in memory.
Linked Lists
A list is a series of nodes, each containing some data and a pointer to the next node.
Features:
- Stores a homogeneous or heterogeneous collection of elements
- Elements are stored in non-contiguous memory locations and linked with pointers
Advantages:
- Insertion and deletion have constant time complexity
- Memory can be optimized by removing unused space
Use cases:
- Implementing dynamic data structures
- Implementing queues and stacks
Disadvantages:
- Accessing elements has linear time complexity
- Extra memory is required to store pointers
Stacks
A stack is a collection of elements, with push and pop operations that allow elements to be added or removed only from the top of the stack.
Features:
- Stores a homogeneous or heterogeneous collection of elements
- Elements are accessed using LIFO (last-in-first-out) ordering
- Supports push and pop operations
Advantages:
- Easy to implement and use
- Can be used to reverse a sequence of elements
Use cases:
- Implementing algorithms involving recursion
- Implementing undo-redo functionality
Disadvantages:
- Accessing elements other than the topmost element has linear time complexity
- The stack size is limited by the available memory
Queues
A queue is a collection of elements, where elements are inserted at the back and removed from the front.
Features:
- Stores a homogeneous or heterogeneous collection of elements
- Elements are accessed using FIFO (first-in-first-out) ordering
- Supports enqueue and dequeue operations
Advantages:
- Easy to implement and use
- Can be used to implement a buffering system
Use cases:
- Implementing a waiting line system
- Implementing a cache
Disadvantages:
- Accessing elements other than the frontmost element has linear time complexity
- The queue size is limited by the available memory
Heaps
A heaps is a complete binary tree in which every level of the tree is completely filled except for the last level, which is filled from left to right, and every node is greater than or equal to (or less than or equal to) its children, as per the type of heap.
Features:
- Stores a homogeneous collection of elements
- Elements are stored in a binary tree structure
- Supports efficient insertion and removal of the minimum or maximum element
Advantages:
- Efficient for finding the minimum or maximum element
- Can be used to implement priority queues
Use cases:
- Implementing algorithms such as heap sort and Dijkstra's algorithm
- Implementing a priority queue
Disadvantages:
- Accessing elements other than the topmost element has linear time complexity
- The heap size is limited by the available memory
These data structures are widely used in computer science for various applications and algorithmic implementations. Let's analyze some features of these algorithms.
Trees
A tree is a hierarchical data structure consisting of nodes connected by edges, with the topmost node known as the root.
Features:
- Stores a heterogeneous collection of elements
- Elements are arranged in a hierarchical structure with a root node and children nodes
Advantages:
- Efficient insertion, deletion, and search operations
- Can be used to represent data with a hierarchical relationship
Use cases:
- Implementing search algorithms such as binary search
- Representing data with a parent-child relationship such as file systems
Disadvantages:
- Can be difficult to implement and understand
- The height of the tree can greatly affect the performance of operations
Graphs
A graph is a collection of vertices connected by edges, where each edge can have a weight or cost associated with it.
Features:
- Stores a heterogeneous collection of elements
- Elements are arranged in a network of vertices and edges
Advantages:
- Can be used to represent complex relationships between data
- Supports traversal algorithms such as DFS and BFS
Use cases:
- Representing social network relationships
- Implementing search algorithms such as Dijkstra's algorithm
Disadvantages:
- Can be difficult to implement and optimize
- Complexity can quickly become unmanageable with large graphs
Hash Tables
A Hash tables is an associative array data structure that maps keys to values, with a hash function used to compute an index into an array of buckets or slots.
Features:
- Stores a homogeneous or heterogeneous collection of elements
- Elements are accessed using key-value pairs
- Uses a hash function to map keys to indexes in an array
Advantages:
- Efficient insertion, deletion, and search operations on average
- Supports dynamic resizing
Use cases:
- Implementing a dictionary or mapping data structure
- Speeding up search operations by using the right hash function
Disadvantages:
- Worst-case time complexity can be slow if there are many hash collisions
- Extra memory is required to handle collisions
Exotic Structures
Exotic data structures are less commonly used data structures that have specialized use-cases and are designed to solve specific problems. These data structures are not well-known or widely used like arrays, stacks, queues, linked lists, trees, and graphs, but they are still important concepts in computer science.
Bloom filters
Bloom filters are a probabilistic data structure used to determine whether an element is a member of a set. Bloom filters require minimal memory space and check false positives in return for false negatives.
Features:
- Efficient data structure for probabilistic membership testing
- Uses a hash table and multiple hash functions to check if an item is probably in a set
- False positives are possible, but false negatives are not
Use cases:
- Checking if an item is in a large set without storing the entire set
- Detecting spam emails without checking every email against a list of known spam emails
HyperLogLog
HyperLogLog is another probabilistic data structure like Bloom filters. It uses a fixed amount of memory to estimate the number of distinct elements in a set with high accuracy even for large datasets.
Features:
- Probabilistic data structure for estimating the cardinality of a set
- Uses a hash function and bit manipulation to approximate the number of unique elements in a set
Use cases:
- Counting distinct elements in a large dataset without having to store every element
- Estimating the number of unique visitors to a website
Interval Trees
Interval trees are data structures used for searching intervals that contain or overlap a given point or interval. The Tree structures store intervals as leaf nodes and each internal Node stores the maximum end-point of the descendant intervals. This provides a fast way to find all the intervals in a tree that overlap with a given interval or point.
Features:
- Data structure for quickly finding all intervals that overlap with a given interval
- Uses a binary search tree and interval comparison to store and query intervals
Use cases:
- Finding all schedule conflicts in a list of appointments
- Finding all genomic intervals that contain a specific SNP
Tries
Tries are tree-like data structures used to store strings with keys that share common prefixes. Tries are useful for prefix matching and search facilities in a large amount of text data.
Features:
- Data structure for storing a set of strings or other values with keys that share prefixes
- Uses a tree structure with each node representing a prefix or a complete word
Use cases:
- Implementing autocomplete functionality in a text editor or search engine
- Storing dictionaries or other sets of words
Wavelet trees
Wavelet Trees are data structures that encode a large array of alphanumeric characters into a sequence of binary digits that can be queried using bitwise operations. Wavelet Trees exploit the repetition of characters to compress the data and to perform queries by multiple bitwise operations in log time.
Features:
- Data structure for managing large amounts of data and answering various types of queries
- Uses the wavelet matrix encoding for compressing data and representing queries as bit operations
Use cases:
- Managing and querying large data sets such as genomic sequences or weblogs
- Implementing full-text search functionality in a document database
String Representation
In programming, there are many alternative data structures that can be used as alternatives for internal representations of strings and large strings. Here are a few examples of suitable data structures:
- Rope Data Structure: Ropes are a data structure that is used to efficiently store and manipulate large strings. Ropes are binary trees where each node contains a small string or a reference to another node. This data structure allows for fast string concatenation and substring operations.
- Array of Strings: When dealing with a small number of fixed-size strings, an array of strings is a suitable data structure. This data structure stores each string as a separate element in an array. This method is straightforward and easy to implement and access.
- Trie Data Structure: A Trie data structure is a useful alternative for storing large numbers of strings with common prefixes, like in a dictionary. Trie is tree data structure made of nodes that contains char/words as values or pointers to kids and end of word respectively. This data structure allows for fast retrieval of strings based on their prefix, but it can be memory-intensive if the strings have a large number of unique characters.
- Hash Tables: Hash tables are another alternative for storing and manipulating large strings. Hash tables use a hash function to map strings to an array of buckets, with each bucket storing a linked list of strings. This data structure allows for fast string retrieval, but has a higher memory overhead than some other data structures.
- Suffix Arrays: Suffix Arrays are useful when a large number of string lookups is required. It stores an array of suffixes of the input string, where each suffix is a sequence of characters that start from each index of the string. The suffixes are sorted into lexographic order. Exact sub-string search can be achieved using binary search. Suffix Arrays are useful for text compression and indexing.
The choice of which data structure to use depends on the specific requirements of the application, such as the size and number of strings, the operations needed to perform on the strings, and the available memory resources.
Read next:
Algorithms