# Data Structures & Algorithms

## Lecture 1: Algorithms and Analysis

### Psuedocode

• Selection Sort
• Input: A collection $$A = \{a_1, \ldots, a_n\}$$
• Output: $$A$$, Sorted
• For ($$i = 1\ldots n-1$$)
• min $$\gets$$ $$a_i$$
• For ($$j = i+1 \ldots n$$)
• if (min $$> a_j$$)
• min $$\gets a_j$$
• swap min, $$a_i$$
• output $$A$$

### Analysis

• Identify the input
• Identify the input size
• collection size
• tree/graph number of nodes/edges
• Identify the elementary operation(s)
• most common or most expensive
• comparisons, assignments, and incrementations ignored
• Analyze elementary op executed with regard to the input size
• Provide an asymptotic characterization from step 4

### Example

• Collection A
• Size or cardinality of $$A$$, $$n$$
• Comparison on line 4
• $$\sum_{i = 1}^{n-1} (n - i) = \sum_{i = 1}^{n - 1} n - \sum_{i = 1}^{n - 1} i = n(n - 1) - \frac{n(n - 1)}{2} = \frac{n(n-1)}{2}$$
• $$O(n^2)$$, $$\Theta(n^2)$$ (Prefer $$\Theta$$)

### Brute Force

• Given: a set of $$n$$ points, $$P = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}$$
• Find: the two points $$p$$ and $$q$$ that are closest to each other with regard to $$L_2$$ norm ($$\sqrt{(x_1-x_2)^2 + (y_1 - y_2)^2}$$)

#### Looping Method

• $$minDist \gets \infty$$
• For ($$i = 1 \ldots n-1$$)
• For($$j = i+1 \ldots n$$)
• If(distance($$P_i$$, $$P_j$$))
• $$p \gets P_i$$
• $$q \gets P_j$$
• $$minDist \gets distance(P_i, P_j)$$
• output p, q

#### Simpler

• $$minDist \gets \infty$$
• For Each Pair of Points $$p = (x, y)$$, $$q = (x^\prime, y^\prime)$$
• d ← distance(p, q)
• if (d < minDist)
• $$minDist \gets distance(p,q)$$
• Finish
##### Analysis
• $$\binom{n}{2} = \frac{n!}{(n - 2)!2!} = \frac{n(n-1)}{2} = \Theta(n^2)$$

### Common Big Os

• Constant
• Logarithmic
• Linear
• Iterated Linear
• Polynomial
• $$2^n$$
• Factorial

## Lecture 2: Brute Force Approach

• Examining all possibilities
• optimisation to find the best solution
• decision problems: find a solution, or determine none exists
• generally infeasable or inefficient
• may involve:
• generating combinatorial objects
• backtracking
• Ex: What percentage of 52 card shuffles has been generated by humans
• Ex: Compute $$a^n \mod m$$ for a very large $$n$$: Dumbly

int a, n, m;
int result = 1;

for(int i = 1 ; i <= n; i++) {
result = (result * a) % m;
}

• Pairs: 2 for loops
• triples: 3 for loops
• for each subset of k elements: write k for loops where k is a variable
• Given a set of n elements $$A = \{a_1, a_2, \ldots, a_n\}$$ and an integer $$k$$, $$1 \leq k \leq n$$, output all subsets of $$A$$ with cardinality $$k$$.
• $$\binom{n}{k} \in O(n^k)$$
• It suffices to consider combinations of integers $$1, 2, \ldots, n$$.
• Algorithm:
• Start with $$a_1, a_2, \ldots, a_k$$
• Assume we have $$a_1, a_2, \ldots, a_k$$, we want to generate the next combination
• Locate the last element $$a_i$$ in the current such that $$a_i \neq n - k + i$$
• Replace $$a_i$$ with $$a_i + 1$$, $$a_j$$ with $$a_i + j - i$$ for $$j = i+1, i+2, \ldots k$$
• Ex: $$k = 3$$ on $$\{1, 2, 3, 4, 5\}$$
• $$\{1, 4, 5\}$$
• $$\{2, 3, 4\}$$
• $$\{2, 3, 5\}$$
• $$\{2, 4, 5\}$$
• Goal: generate all permutations of a set of size $$n$$, $$A = \{a_1, a_2, \ldots, a_n\}$$
• Steinhouse-Jonson-Trotter Algorithm
• Generate all $$n!$$ permutations in lexicographic order
• Start with $$1, 2, 3, \ldots, n$$
• Given $$a_1, a_2, \ldots, a_n$$
• Log * 64 = 4

### Matrix Multiplication

• $$A \cdot B = C$$
• $$c_{ij} = \sum_{k = 1}^{n} a_{ik}\cdot b_{kj}$$
• Traditional Method is $$O(n^3)$$ for multiplications, $$O(n^3)$$ for additions
• Input: Matrix, size $$n^2 = N$$, then $$O(N^1.5)$$

### Strassen’s Matrix Multiplication Algorithm

• For $$2 \times 2$$:
• M1 + M4 - M5 + M7, M3 + M5, M2 + M4, M1 + M3 + M2 + M6
• M1 = A00 + A11 * B00 + B11
• M2 = A10 + A1 * B00
• M3 = A00 * B01 - B11
• M4 = A11 * B10 - B00
• M5 = A00 + A01 * B11
• M6 = A10 - A00 * B00 + B01
• M7 = A01 - A11 * B11 + B11
• 7 mults, 18 additions for strassens
• For an $$n \times n$$ matrix, split into Quadrants, $$A_00, A_01, A_10, A_11$$ which are each of $$\frac{n}{2} \times \frac{n}{2}$$ and the same for $$B$$
• Use the previous identities recursively
• $$M(n)$$ is the number of scalar multiplications by strassen on 2 matrices of $$n \times n$$
• $$= 7M(\frac{n}{2})$$
• $$M(2) = 7$$
• By Master Theorem:
• $$a = 7, b = 2, d = 0$$
• $$7 > 2^0$$
• $$\Theta(n^{\log_2 7})$$
• Alternatives:
• Winagard: $$O(n^{2.375477})$$
• Williams: $$O(n^{2.3727})$$ (2012)
• Algorithm Details
• design and implement splitting or your functions should handle arbitrary indices
• matrix combine
• $$M_1, \ldots, M_7$$ functions
• pad-with-zeros function, for when $$n$$ is not a power of 2 and unpad

## Lecture 6: Closest Pair, Convex Hull and Presorting & Gaussian Elimination and Applications

### Closest Pairs, Revisited

• Divide and Conquer
• Divide the points into 2 sets along the $$x$$ axis
• Conquer by recursively find the two closest points in each partition
• Combine by considering combinations that cross the split
• $$D(n)$$ the number of distance calculations for divide and conquer
• $$2D\left(\frac{n}{2}\right) + \frac{k}{2}n$$
• By Master Theorem $$a = 2$$, $$b = 2$$, $$d = 1$$
• $$a \equiv b^d$$, thus $$\mathcal{O}(n\log n)$$
• Use sorting to help reduce work

### Gaussian Elimination

• $$3x + 2y = 35$$
• $$x + 3y = 27$$
• $$a_{11} x_2 + a_{12} x_2 + \cdots a_{1n} x_n = b_1$$
• $$a_{n1} x_1 + a_{n2} x_2 + \cdots a_{nn} x_n = b_n$$
• Provides an $$n \times n$$ matrix, and with $$\overrightarrow{b}$$ added on the rhs, you get an $$n \times (n + 1)$$ augmented matrix
• Goal: transform augmented matrix into an upper diagonal matrix
• Tools: We can perform any operation (any op that does not change the solution)
• row multiply by any non-zero constant
• reorder rows
• add any constant multiple of one row to another
• This can be done with Gauss-Jordan elimination
• Gaussian Elimination Algorithm
• Input $$n \times n$$ matrix $$A$$, $$1 \times n$$ solution vector $$\overrightarrow{b}$$
• $$A^{\prime} \gets \left[ A | \overrightarrow{b} \right]$$
• For $$i = 1 \ldots n - 1$$
• $$\mathrm{pivotrow} \gets i$$
• for $$j = i + 1 \ldots n$$
• if $$|A^{\prime}_{ij}| > |A^{\prime}_{\mathrm{pivotrow}j}|$$
• $$\mathrm{pivotrow} \gets j$$
• for $$k = i \ldots n + 1$$
• swap $$A^{\prime}_{ik}$$, $$A^{\prime}_{\mathrm{pivotrow}j}$$
• for $$j = i + 1 \ldots n$$
• $$t \gets \frac{A^{\prime}_{ji}}{A^{\prime}_{i,i}}$$
• for $$k = i \ldots n + 1$$
• $$A^{\prime}_{jk} \gets A^{\prime}_{jk} - tA^{\prime}_{ik}$$
• return $$A^{\prime}$$
• perform a generalized left rotation, with $$c$$ as new root, and $T2 on right of $$r$$, and a new node on $$T_3$$ • Suppose that $$r$$ has $$c$$ on left, $$T_3$$ on right. $$c$$ has $$T_1$$ and $$T_2$$ on left and right respectively • Perform generalized right rotation, $$c$$ has $$T_1$$ on left, $$r$$ on right. $$r$$ has $$T_2$$ and $$T_3$$ on left and right respectively • (r (c T_1 (g T_2 T_3)) T_4) • (r (g (c T_1 T_2) T_3) T_4) -> (g (c T_1 T_2) (r T_3 T_4)) • Swinging over lets you cut one off and put it on the other side of the tree ## Lecture 9: AVL Trees, 2-3 Trees and Balancing ### AVL Trees • Insert 8, 4, 7, 3, 2, 5, 1, 10, 6 • (8) • (8 4) • (8 (4 nil 7)) • (8 (4 3 7)) -> (8 (7 4 3)) -> (7 (4 3) 8) • (7 (4 (3 2)) 8) -> (7 (3 2 4) 8) • (7 (3 2 (4 nil 5)) 8) -> (7 (4 (3 2) 5) 8) -> (4 (3 2) (7 5 8)) • (4 (3 (2 1)) (7 5 8)) -> (4 (2 1 3) (7 5 8)) • (4 (2 1 3) (7 5 (8 nil 10))) • (4 (2 1 3) (7 (5 nil 6) (8 nil 10))) • Observations: 1. Search is $$\mathcal{O}(d)$$ • $$d \in \mathcal{O}(\log n)$$ 2. Insertion is $$\mathcal{O}(d)$$ •$d ∈ $$\mathcal{O}(\log n)$$
3. A single rotation is $$\mathcal{O}(1)$$
4. Deletion is $$\mathcal{O}(d)$$
• $$\mathcal{O}(d)$$ to find replacement
• May need to rotate all the way up to the root. ($$\mathcal{O}(\log n)$$)
5. In worst case, $$\Theta{\log n}$$, but guaranteed to be $$\mathcal{O}(\log n)$$ in all cases

### B-Trees

• Each node has at most $$m$$ children
• Every node except the root has at least $$\left\lceil\frac{m}{2}\right\rceil$$
• The root has at least 2 children unless it is a leaf
• Non-leaf nodes with $$k$$ children have $$k - 1$$ keys
• All leaves are at the same level and non-empty
• commonly used in databases
• Ex:
• Each node has at most 3 children
• Each node has at least 2 children

### 2-3 trees

• This is a specialization of the B-tree where $$m = 3$$
• 2 kinds of nodes:
• 2-nodes: A single key $$k$$ and 2 children, $$\ell, r$$
• 3-nodes: 2 keys $$k_1, k_2$$ such that $$k_1 < k_2$$ and 3 children $$\ell, c, r$$
• Ex: (10 (5 (7 4) 7) ((15 20) 12 17 (21 25)))
• Search is simple, but varies based on node size
• Insertion is always done at a leaf
• sometimes requires changing node size and ordering
• sometimes requires reordering of node keys
• Ex: 8 4 7 3 2 5 1 10 6
• (8)
• ((4 8))
• ((4 7 8)) -> (7 4 8)
• (7 (3 4) 8)
• (7 (2 3 4) 8) -> ((3 7) (2 4) 8) -> ((3 7) 2 4 8)
• ((3 7) 2 (4 5) 8)
• ((3 7) (1 2) (4 5) 8)
• ((3 7) (1 2) (4 5) (8 10))
• ((3 5 7) (1 2) (4 6) (8 10)) -> (5 (3 (1 2) 4) (7 6 (8 10))
• Analysis:
• Min key count depth $$d$$
• all be 2-nodes (1 key per node)
• all nodes are present at all levels (always 2 children)
• $$2^{d + 1} - 1$$ nodes
• $$2^{d + 1} - 1$$ keys
• Max key count for depth $$d$$
• every node is 3-node (2 keys per node, 3 children per node)
• $$2(3^{d + 1} - 1)$$ nodes/keys
• $$d \leq \mathcal{O}(\log_2 (n + 1) - 1)$$
• $$\log_3(\frac{n}{2} + 1) - 1 \leq d$$
• Therefore $$d \in \Theta(\log n)$$
• Don’t even think about deletion. It’s a pain.

### Other Balanced BSTs

• Splay Trees
• Tries
• Red-Black Trees (default implementation in java for tree-set)
• A Trees
• Scapegoat Trees
• Treap

### Heaps

• A (min) heap is a binary tree data structure with the following properties
• all nodes are present at all levels except possibly the last level
• at the last level, it is full to the left
• the key of any node is less than bth its children
• The minimum is therefore always at the root
• The max is always at the bottom towards the left. It will only be at the very bottom if it’s full.
• 2 efficient operations
• remove min
• insert

Date: 2017-04-18 Tue 13:40

Created: 2017-07-20 Thu 16:58

Validate