# Data Structures & Algorithms

## Lecture 1: Algorithms and Analysis

### Psuedocode

• Selection Sort
• Input: A collection $$A = \{a_1, \ldots, a_n\}$$
• Output: $$A$$, Sorted
• For ($$i = 1\ldots n-1$$)
• min $$\gets$$ $$a_i$$
• For ($$j = i+1 \ldots n$$)
• if (min $$> a_j$$)
• min $$\gets a_j$$
• swap min, $$a_i$$
• output $$A$$

### Analysis

• Identify the input
• Identify the input size
• collection size
• tree/graph number of nodes/edges
• Identify the elementary operation(s)
• most common or most expensive
• comparisons, assignments, and incrementations ignored
• Analyze elementary op executed with regard to the input size
• Provide an asymptotic characterization from step 4

### Example

• Collection A
• Size or cardinality of $$A$$, $$n$$
• Comparison on line 4
• $$\sum_{i = 1}^{n-1} (n - i) = \sum_{i = 1}^{n - 1} n - \sum_{i = 1}^{n - 1} i = n(n - 1) - \frac{n(n - 1)}{2} = \frac{n(n-1)}{2}$$
• $$O(n^2)$$, $$\Theta(n^2)$$ (Prefer $$\Theta$$)

### Brute Force

• Given: a set of $$n$$ points, $$P = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}$$
• Find: the two points $$p$$ and $$q$$ that are closest to each other with regard to $$L_2$$ norm ($$\sqrt{(x_1-x_2)^2 + (y_1 - y_2)^2}$$)

#### Looping Method

• $$minDist \gets \infty$$
• For ($$i = 1 \ldots n-1$$)
• For($$j = i+1 \ldots n$$)
• If(distance($$P_i$$, $$P_j$$))
• $$p \gets P_i$$
• $$q \gets P_j$$
• $$minDist \gets distance(P_i, P_j)$$
• output p, q

#### Simpler

• $$minDist \gets \infty$$
• For Each Pair of Points $$p = (x, y)$$, $$q = (x^\prime, y^\prime)$$
• d ← distance(p, q)
• if (d < minDist)
• $$minDist \gets distance(p,q)$$
• Finish
##### Analysis
• $$\binom{n}{2} = \frac{n!}{(n - 2)!2!} = \frac{n(n-1)}{2} = \Theta(n^2)$$

### Common Big Os

• Constant
• Logarithmic
• Linear
• Iterated Linear
• Polynomial
• $$2^n$$
• Factorial

## Lecture 2: Brute Force Approach

• Examining all possibilities
• optimisation to find the best solution
• decision problems: find a solution, or determine none exists
• generally infeasable or inefficient
• may involve:
• generating combinatorial objects
• backtracking
• Ex: What percentage of 52 card shuffles has been generated by humans
• Ex: Compute $$a^n \mod m$$ for a very large $$n$$: Dumbly

int a, n, m;
int result = 1;

for(int i = 1 ; i <= n; i++) {
result = (result * a) % m;
}

• Pairs: 2 for loops
• triples: 3 for loops
• for each subset of k elements: write k for loops where k is a variable
• Given a set of n elements $$A = \{a_1, a_2, \ldots, a_n\}$$ and an integer $$k$$, $$1 \leq k \leq n$$, output all subsets of $$A$$ with cardinality $$k$$.
• $$\binom{n}{k} \in O(n^k)$$
• It suffices to consider combinations of integers $$1, 2, \ldots, n$$.
• Algorithm:
• Start with $$a_1, a_2, \ldots, a_k$$
• Assume we have $$a_1, a_2, \ldots, a_k$$, we want to generate the next combination
• Locate the last element $$a_i$$ in the current such that $$a_i \neq n - k + i$$
• Replace $$a_i$$ with $$a_i + 1$$, $$a_j$$ with $$a_i + j - i$$ for $$j = i+1, i+2, \ldots k$$
• Ex: $$k = 3$$ on $$\{1, 2, 3, 4, 5\}$$
• $$\{1, 4, 5\}$$
• $$\{2, 3, 4\}$$
• $$\{2, 3, 5\}$$
• $$\{2, 4, 5\}$$
• Goal: generate all permutations of a set of size $$n$$, $$A = \{a_1, a_2, \ldots, a_n\}$$
• Steinhouse-Jonson-Trotter Algorithm
• Generate all $$n!$$ permutations in lexicographic order
• Start with $$1, 2, 3, \ldots, n$$
• Given $$a_1, a_2, \ldots, a_n$$
• Log * 64 = 4

### Matrix Multiplication

• $$A \cdot B = C$$
• $$c_{ij} = \sum_{k = 1}^{n} a_{ik}\cdot b_{kj}$$
• Traditional Method is $$O(n^3)$$ for multiplications, $$O(n^3)$$ for additions
• Input: Matrix, size $$n^2 = N$$, then $$O(N^1.5)$$

### Strassen’s Matrix Multiplication Algorithm

• For $$2 \times 2$$:
• M1 + M4 - M5 + M7, M3 + M5, M2 + M4, M1 + M3 + M2 + M6
• M1 = A00 + A11 * B00 + B11
• M2 = A10 + A1 * B00
• M3 = A00 * B01 - B11
• M4 = A11 * B10 - B00
• M5 = A00 + A01 * B11
• M6 = A10 - A00 * B00 + B01
• M7 = A01 - A11 * B11 + B11
• 7 mults, 18 additions for strassens
• For an $$n \times n$$ matrix, split into Quadrants, $$A_00, A_01, A_10, A_11$$ which are each of $$\frac{n}{2} \times \frac{n}{2}$$ and the same for $$B$$
• Use the previous identities recursively
• $$M(n)$$ is the number of scalar multiplications by strassen on 2 matrices of $$n \times n$$
• $$= 7M(\frac{n}{2})$$
• $$M(2) = 7$$
• By Master Theorem:
• $$a = 7, b = 2, d = 0$$
• $$7 > 2^0$$
• $$\Theta(n^{\log_2 7})$$
• Alternatives:
• Winagard: $$O(n^{2.375477})$$
• Williams: $$O(n^{2.3727})$$ (2012)
• Algorithm Details
• design and implement splitting or your functions should handle arbitrary indices
• matrix combine
• $$M_1, \ldots, M_7$$ functions
• pad-with-zeros function, for when $$n$$ is not a power of 2 and unpad

## Lecture 6: Closest Pair, Convex Hull and Presorting & Gaussian Elimination and Applications

### Closest Pairs, Revisited

• Divide and Conquer
• Divide the points into 2 sets along the $$x$$ axis
• Conquer by recursively find the two closest points in each partition
• Combine by considering combinations that cross the split
• $$D(n)$$ the number of distance calculations for divide and conquer
• $$2D\left(\frac{n}{2}\right) + \frac{k}{2}n$$
• By Master Theorem $$a = 2$$, $$b = 2$$, $$d = 1$$
• $$a \equiv b^d$$, thus $$\mathcal{O}(n\log n)$$
• Use sorting to help reduce work

### Gaussian Elimination

• $$3x + 2y = 35$$
• $$x + 3y = 27$$
• $$a_{11} x_2 + a_{12} x_2 + \cdots a_{1n} x_n = b_1$$
• $$a_{n1} x_1 + a_{n2} x_2 + \cdots a_{nn} x_n = b_n$$
• Provides an $$n \times n$$ matrix, and with $$\overrightarrow{b}$$ added on the rhs, you get an $$n \times (n + 1)$$ augmented matrix
• Goal: transform augmented matrix into an upper diagonal matrix
• Tools: We can perform any operation (any op that does not change the solution)
• row multiply by any non-zero constant
• reorder rows
• add any constant multiple of one row to another
• This can be done with Gauss-Jordan elimination
• Gaussian Elimination Algorithm
• Input $$n \times n$$ matrix $$A$$, $$1 \times n$$ solution vector $$\overrightarrow{b}$$
• $$A^{\prime} \gets \left[ A | \overrightarrow{b} \right]$$
• For $$i = 1 \ldots n - 1$$
• $$\mathrm{pivotrow} \gets i$$
• for $$j = i + 1 \ldots n$$
• if $$|A^{\prime}_{ij}| > |A^{\prime}_{\mathrm{pivotrow}j}|$$
• $$\mathrm{pivotrow} \gets j$$
• for $$k = i \ldots n + 1$$
• swap $$A^{\prime}_{ik}$$, $$A^{\prime}_{\mathrm{pivotrow}j}$$
• for $$j = i + 1 \ldots n$$
• $$t \gets \frac{A^{\prime}_{ji}}{A^{\prime}_{i,i}}$$
• for $$k = i \ldots n + 1$$
• $$A^{\prime}_{jk} \gets A^{\prime}_{jk} - tA^{\prime}_{ik}$$
• return $$A^{\prime}$$
• perform a generalized left rotation, with $$c$$ as new root, and $T2 on right of $$r$$, and a new node on $$T_3$$ • Suppose that $$r$$ has $$c$$ on left, $$T_3$$ on right. $$c$$ has $$T_1$$ and $$T_2$$ on left and right respectively • Perform generalized right rotation, $$c$$ has $$T_1$$ on left, $$r$$ on right. $$r$$ has $$T_2$$ and $$T_3$$ on left and right respectively • (r (c T_1 (g T_2 T_3)) T_4) • (r (g (c T_1 T_2) T_3) T_4) -> (g (c T_1 T_2) (r T_3 T_4)) • Swinging over lets you cut one off and put it on the other side of the tree ## Lecture 9: AVL Trees, 2-3 Trees and Balancing ### AVL Trees • Insert 8, 4, 7, 3, 2, 5, 1, 10, 6 • (8) • (8 4) • (8 (4 nil 7)) • (8 (4 3 7)) -> (8 (7 4 3)) -> (7 (4 3) 8) • (7 (4 (3 2)) 8) -> (7 (3 2 4) 8) • (7 (3 2 (4 nil 5)) 8) -> (7 (4 (3 2) 5) 8) -> (4 (3 2) (7 5 8)) • (4 (3 (2 1)) (7 5 8)) -> (4 (2 1 3) (7 5 8)) • (4 (2 1 3) (7 5 (8 nil 10))) • (4 (2 1 3) (7 (5 nil 6) (8 nil 10))) • Observations: 1. Search is $$\mathcal{O}(d)$$ • $$d \in \mathcal{O}(\log n)$$ 2. Insertion is $$\mathcal{O}(d)$$ •$d ∈ $$\mathcal{O}(\log n)$$
3. A single rotation is $$\mathcal{O}(1)$$
4. Deletion is $$\mathcal{O}(d)$$
• $$\mathcal{O}(d)$$ to find replacement
• May need to rotate all the way up to the root. ($$\mathcal{O}(\log n)$$)
5. In worst case, $$\Theta{\log n}$$, but guaranteed to be $$\mathcal{O}(\log n)$$ in all cases

### B-Trees

• Each node has at most $$m$$ children
• Every node except the root has at least $$\left\lceil\frac{m}{2}\right\rceil$$
• The root has at least 2 children unless it is a leaf
• Non-leaf nodes with $$k$$ children have $$k - 1$$ keys
• All leaves are at the same level and non-empty
• commonly used in databases
• Ex:
• Each node has at most 3 children
• Each node has at least 2 children

### 2-3 trees

• This is a specialization of the B-tree where $$m = 3$$
• 2 kinds of nodes:
• 2-nodes: A single key $$k$$ and 2 children, $$\ell, r$$
• 3-nodes: 2 keys $$k_1, k_2$$ such that $$k_1 < k_2$$ and 3 children $$\ell, c, r$$
• Ex: (10 (5 (7 4) 7) ((15 20) 12 17 (21 25)))
• Search is simple, but varies based on node size
• Insertion is always done at a leaf
• sometimes requires changing node size and ordering
• sometimes requires reordering of node keys
• Ex: 8 4 7 3 2 5 1 10 6
• (8)
• ((4 8))
• ((4 7 8)) -> (7 4 8)
• (7 (3 4) 8)
• (7 (2 3 4) 8) -> ((3 7) (2 4) 8) -> ((3 7) 2 4 8)
• ((3 7) 2 (4 5) 8)
• ((3 7) (1 2) (4 5) 8)
• ((3 7) (1 2) (4 5) (8 10))
• ((3 5 7) (1 2) (4 6) (8 10)) -> (5 (3 (1 2) 4) (7 6 (8 10))
• Analysis:
• Min key count depth $$d$$
• all be 2-nodes (1 key per node)
• all nodes are present at all levels (always 2 children)
• $$2^{d + 1} - 1$$ nodes
• $$2^{d + 1} - 1$$ keys
• Max key count for depth $$d$$
• every node is 3-node (2 keys per node, 3 children per node)
• $$2(3^{d + 1} - 1)$$ nodes/keys
• $$d \leq \mathcal{O}(\log_2 (n + 1) - 1)$$
• $$\log_3(\frac{n}{2} + 1) - 1 \leq d$$
• Therefore $$d \in \Theta(\log n)$$
• Don’t even think about deletion. It’s a pain.

### Other Balanced BSTs

• Splay Trees
• Tries
• Red-Black Trees (default implementation in java for tree-set)
• A Trees
• Scapegoat Trees
• Treap

### Heaps

• A (min) heap is a binary tree data structure with the following properties
• all nodes are present at all levels except possibly the last level
• at the last level, it is full to the left
• the key of any node is less than bth its children
• The minimum is therefore always at the root
• The max is always at the bottom towards the left. It will only be at the very bottom if it’s full.
• 2 efficient operations
• remove min
• insert

## Lecture 10: More Heaps

• Insertion. Woo.
• Most operations are $$\mathcal{O}(\log n)$$
• Can implement using array.
• left is $$2i$$
• right is $$2i + 1$$
• parent is $$\left\lfloor \frac{i}{2} \right\rfloor$$
• last element is at index $$n$$
• Will have to occassionally resize array
• Suppose you have $$n$$ elements, inserting each one into a min heap, then take each out in turn. You get a sorted list.

## Lecture 11: Review

### AVL Tree Rotations

• rotate around any node which a balance factor of 2 or -2, or greater than or less than
• sign is positive, skews left, sign negative, right
• number is magnitude

Table 1: Determine what rotation to use for an AVL tree
First Second Rotation
left left R
left right LR
right left RL
right right L

### Collection, Find Median

• Input $$A = \{a_1, \ldots, a_n\}$$
• Output median of $$A$$
• Sort A
• Output $$a_{\frac{n}{2}}$$
• Input $$A$$, Size $$n$$
• Elementary operation, comparison in sorting algorithm, about $$n\log n$$
• $$\mathcal{O}(n \log n)$$

### Graph cycles of length $$k$$

• Input $$G = (V, E)$$, $$3 \leq k \leq n$$
• Output: True if G contains a cycle of length $$k$$, False otherwise
• For each $k$-combination of vertices $$v_1, v_2, \ldots v_k$$:
• For each permutation of $$v_1, \ldots, v_k$$, $$\pi = v_1^{\prime}, \ldots, v_k^{\prime}$$
• $$\mathrm{isCycle} \gets true$$
• For $$i = 1 \ldots k - 1$$
• if $$(v_i^{\prime}, v_{i + 1}^{\prime}) \not\in E$$
• $$\mathrm{isCycle} \gets false$$
• if $$(v_k^{\prime}, v_i^{\prime}) \not\in E$$
• $$\mathrm{isCycle} \gets false$$
• if $$\mathrm{isCycle}$$
• return True
• return False
• is $$\mathcal{O}(n!)$$, as outer loop is $$\binom{n}{k}$$ and inner is $$k!$$, which becomes $$\frac{n!}{(n - k)!}$$

### Develop an algorithm to finds the length of the longest cycle in a graph

• Input: $$G = (V, E)$$
• Output: $$\mathcal{l}$$
• For $$k = n \ldots 3$$
• if $$A(G, k)$$
• return $$k$$
• return “no cycle”

### Develop an algorithm to solve the hamiltonian cycle problem

• Input $$G = (V, E)$$
• Output True if $$G$$ contains a Hamiltonian Cycle, False otherwise
• Return $$A(G, n)$$
• Called a Turing Reduction

• $$A, B$$ are 10000 digits long
• 1 million FLOPS ($$10^6$$)
• Grade School ($$\frac{10000^2}{10^6}$$)
• Karatsuba ($$\frac{10000^{\log_2 3}}{10^6}$$)
• Karatsuba is faster

### Gaussian Elimination and Time

• 100 Million variables ($$10^8$$)
• 1 PetaFlop ($$10^15$$)
• Gaussian Elimination is $$\mathcal{O}(n^3)$$
• $$\frac{\left(10^8\right)^3}{10^15}$$

### Matrix Multiplications: 2 recursive on a constant time

• $$M(n) = 2M(n/2)$$
• By Master Theorem, $$\mathcal{O}(\log n)$$
• Trivial Lower Bound, however, is $$\Omega(n^2)$$
• Therefore, this is not feasible

### AVL Tree

• Label each node with balance factor
• Insert a sequence and re-balance as needed

### 2-3 Trees

• Be careful of insertions!

### 2-3 Tree, calc minimum depth

• 365 keys
• $$n = 2(3^{d + 1} - 1)$$
• $$d = \left\lciel\log_3 \left(\frac{n}{2} - 1\right) - 1\right\rciel$$

### Insert into a min-heap

• Remember to heapify on each insertion
• count comparisons and swaps

## Lecture 12: Graph Algorithms

### Review

• A Graph $$G = (V, E)$$ is a set of vertices $$V = \{v_1, \ldots v_n\}$$ and a set of edges $$E = \{e_1, \ldots e_m\}$$
• $$n$$ is the number of vertices
• $$m$$ is the number of edges, is $$|E| \in \mathcal{O}(n^2)$$
• Variations:
• $$e = (u, v) = (v, u)$$ is an undirected pair
• $$e = (u, v) \neq (v, u)$$ is an ordered pair
• Weighted:
• $$wt : V \to \mathbb{R}$$
• $$wt : E \to \mathbb{R}$$
• Labelled: Vertices and/or edges may or may not have a label
• Psuedo-graphs allow self-loops
• Implementations
• Library
• JGraphT for java
• Boost, LEMON for C++
• NetworkX, PyGraph for Python
• Graph.JS for JavaScript
• graphy for Ruby

### Depth First Search

• Explore a graph as deep as possible before backtracking to explore other areas
• Process each node at most once
• $$\mathcal{O}(n)$$ or more appropriately, $$\mathcal{O}(n + m)$$, max could then be $$\mathcal{O}(n^2)$$
• Is quite simple, must keep track of seen nodes
• Stacks help to keep track of location
• color vertices to show that they’ve been touched
white
unvisited (all vertices initially)
grey
discovered and (possibly) processed but not out of the stack
black
discovered, processed, out of stack and finished
• Also keep track of discovery and finish “times”
• start an integer counter at 1
• each node traversal or color a vertex black increments the counter
• when a node is discovered, color it grey and stamp with discovery time
• when a node is finished, color it black and stamp with finish time
• Algorithm (Non Recursive)
• Given $$G = (V, E)$$
• For each vertex $$v \in E$$
• color $$v$$ white
• $$\mathrm{count} \gets 1$$
• $$S \gets \mathrm{stack}$$
• $u ←$ start vertex
• color $$u$$ grey
• Push $$u$$ on to $$S$$
• stamp discovery time $$u$$ with $$\mathrm{count}$$
• While $$S$$ is not empty:
• $$\mathrm{count} \gets \mathrm{count} + 1$$
• $$x \gets \mathrm{peak}(S)$$
• $y ←$ next white vertex in $$N(x)$$
• Comment: If there is a grey vertex in $$N(y)$$, not equal to $$x$$, there’s a cycle
• If $$y = \phi$$:
• $$x \gets \mathrm{pop}(S)$$
• process $$x$$
• color $$x$$ black
• Stamp finish time $$x$$ with $$\mathrm{count}$$
• Else:
• $$\mathrm{push}(S, y)$$
• color $$y$$ grey
• stamp $$y$$ with discovery time $$\mathrm{count}$$
• If there are any white vertices, they are part of a disconnected graph
• The backtracking pattern is called a DFS tree, with the edges traversed in it called tree-edges. Other edges are back/forward edges or cross edges
• back/forward edges connect children to far ancestors (no distinction on an undirected graph, show cycles in undirected graph, back edges show cycles in directed graphs)
• cross edges connect sibling/cousin nodes (only possible on a directed graph)
• Recursive Algorithms:
• Main:
• For each $$v \in V$$, color $$v$$ white
• $$\mathrm{count} \gets 0$$
• For each $$v \in V$$:
• if $$v$$ is white:
• $$\mathrm{DFS}(G, v)$$
• Recursive Function:
• $$\mathrm{count} \gets \mathrm{count} + 1$$
• mark $$v$$ with discovery time
• color $$v$$ grey
• For each $$w \in N(v)$$:
• if $$w$$ is white:
• $$\mathrm{DFS}(G, w)$$
• $$\mathrm{count} \gets \mathrm{count} + 1$$
• mark $$v$$ with finish time
• process $$v$$
• color $$v$$ black

## Lecture 13: Graph Algorithms, Continued

### DFS Analysis

• Input is $$G = (v, E)$$
• Size $$n = |V|$$, $$m = |E| \in \mathcal{O}(n^2)$$
• Elementary Operation:
• Traversing to a node $$\in \mathcal{O}(n)$$
• Examining edges $$\in \mathcal{O}(m)$$
• Processing nodes $$\in \mathcal{O}(n)$$
• May have a total of $$\mathcal{O}(m + n)$$
• Dense graphs $$m \in \Omega(n^2)$$, thus $$\mathcal{O}(n^2)$$
• Sparse graphs $$m \in \mathcal{O}(n)$$
• ex Trees

• Explore closer vertices before distant vertices
• This is done with a queue
• Algorithm (Main)
• For each $$v \in V$$:
• color $$v$$ white
• $$\mathrm{count} \gets 0$$
• for each $$v \in V$$
• if $$v$$ is white:
• $$\mathrm{BFS}(G, v)$$
• Algorithm Subroutine $$\mathrm{BFS}$$
• Outputs a BFS traversal starting at $$v$$
• $$\mathrm{count} \gets \mathrm{count} + 1$$
• $Q ←$ empty queue
• Mark $$v$$ discover
• Color $$v$$ grey
• $$\mathrm{enqueue}(Q, v)$$
• While $$Q$$ is not empty
• $$x \gets \mathrm{peek}(Q)$$
• for each $$y \in N(x)$$
• if $$y$$ is white:
• $$\mathrm{count} \gets \mathrm{count} + 1$$
• mark $$y$$ with discover time
• color $$y$$ grey
• $$\mathrm{enqueue}(Q, y)$$
• $$z \gets \mathrm{dequeue}(Q)$$
• $$\mathrm{count} \gets \mathrm{count} + 1$$
• mark $$z$$ with finish time
• color $$z$$ black
• process $$z$$
• Discover and finishing timestamps are less useful, as both are sorted
• BFS trees will have tree edges, and may have cross edges denoting cycles, but will never have back/forward edges in an undirected graph, but may have back edges in a directed graph
• Provide shortest path from the initial vertex to add other connected vertices (but only in an unweighted graph, and only relative to the start vertex)
• Input size $$n = |V|$$, $$m = |E| \in \mathcal{O}(n^2)$$
• Always $$\mathm{O}(n + m)$$

### Applications of DFS/BFS

• Find Connectivity (directed or undirected) ()
• Given $$G = (V, E)$$, $$s, t \in V$$
• Output a path $$p$$ from $$s \to t$$ if one exists, $$\phi$$ if no such path exits (Functional Problem)
• Output yes if there exists a path (undirected or directed) from $$s \to t$$, no otherwise (Decision Problem (reduces functional problem))
• Suppose a Functional Path Builder that takes a graph $$G$$ and points $$s, t$$ and returns either a path $$p$$ or $$\phi$$ if none exists
• If $$\phi$$ return no, otherwise, return yes
• The Opposite can be done, but it’s a pain and involves an oracle and feedback, and is an adaptive reduction
• Cycle detection involves forward or back edges in DFS, cross edges in BFS
• Bipartite Testing:
• Given a Graph $$G$$
• Output yes if $$G$$ is bipartite, no if not
• Theorem: $$G$$ is bipartite iff $$G$$ contains no odd-length cycles.
• Add blue and red colors, start with red, alternate to blue, keep alternating. If at any point, you find a color in your neighborhood that is the same as yours, there’s a cycle of odd length
• Ex:
• in a directed graph, a sink has no paths going from it, sources have only paths going out from themselves
• In DFS of a directed graph with sinks, the finish times in ascending order
• In a directed graph $$G$$, a strongly connected component is a subset of vertices $$C \subset V$$ such that any pair of vertices $$x, y \in C$$ is connected by a path in $$G$$
• To generate a Condensation Graph
• Run DFS
• Compute $$G^{T}$$
• Perform another DFS on $$G^{T}$$ in descending order of the original finish times
• Each Restart defines a new strongly connected component
• If the condensation graph is one node, the original graph is a single strongly connected component

## Lecture 14: Graph Algorithms, Continued

### Condencation graphs

• Reduce a graph but preserve its connectivity
• Given a graph $$G$$ reduce “redundant” connectivity in an optimal way
• many different criteria
• min/max flow
• graph decomposition
• minimum spanning tree

### Minimum Spanning Tree

• Let $$G = (V, E)$$ be an undireected, weighted graph.
• A Spanning Tree $$T = (V, E^{\prime}), E^{\prime} \subseteq E$$ is a set of edges that span the vertex set. $$T$$ is minimal if $$\sum_{e \in E^{\prime}} \mathrm{wt}(e)$$ is not more than any other spanning tree.

### Kruskal’s Algorithm

• Greedy algorithm, start taking edges of minimal weight as long as they do not induce a cycle
• For each edge, ordered by minimum weight
• Take the edge if it does not induce a cycle
• Stop if all vertices are touched by the same tree and edge count equals n-1
• Algorithm, defined:
• Input: An undirected, weighted graph $$G = (V, E)$$
• Output: An MST of $$G$$
• Sort $$E$$ by weight, in non-decreasing order ($$\mathcal{O}(m \log m)$$)
• $$E_T \gets \emptyset$$
• $$k \gets 1$$
• While $$|K_T| < n - 1$$: ($$\mathcal{O}(m)$$)
• If $$(V, E_T \cup \{e_k\})$$ is acyclic ($$\mathcal{O}(n + m) \leq \mathcal{O}(n + n)$$, as on a sparse graph or forest)
• $$E_T \gets E_T \cup \{e_k\}$$
• $$k \gets k + 1$$
• Return $$(V, E_T)$$
• Is $$\mathcal{O}(m(n + \log n)) \in \mathcal{O}(mn) \in \mathcal{O}(n^3)$$, unless a disjoint set is used
• Works well with a disjoint set datastructure

## Lecture 15: Graph Algorithms, Continued

### Prim’s Algorithm

• greedy, using locally optimal decisions to produce a globally optimal solution
• start at an arbitrary vertex, iteratively build a tree by adding the least weighted edge out from the current tree
• Has three categories of vertices: tree vertices (in the tree), fringe vertices (connected to the tree vertices, but not in the tree, choose the least weighted edge next), undiscovered vertices (vertices not yet in the tree or on the fringe)
• Psuedocode:
• Input: Weighted Undirected Graph $$G = (V, E)$$
• Output: A minimum Spanning Tree $$T$$ of $$G$$
• $$E_{T} \gets \emptyset$$
• $$V_T \gets \{v_1\}$$
• $$E^{*} \gets N(v_1)$$
• For $$i = 1 \ldots n - 1$$
• $$e = (u, v) \gets$$ minimum weighted edge in $$E^*$$, with $$u$$ being the endopint in $$V_T$$ and $$v$$ being the endpoint not in $$V_T$$
• $$V_T \gets V_T \cup \{v\}$$
• $$E_T \gets E_T \cup \{e\}$$
• $$E^* \gets (E^* \cup N(v)) \setminus \{e = (x, y) | x \in V_T \land y \in V_T\}$$
• Output $$V_T$$, $$E_T$$
• Done $$n - 1$$ times, and if a heap is used, $$\mathcal{O}(n \log m)$$

### Dijkstra’s Algorithm

• Given a weighted graph (directed or not), $$G = (V, E)$$ and a source vertex $$s \in V$$.
• Output the shortest distance paths from $$s$$ to all other vertices in $$V$$.
• Caveat: Don’t consider graphs with negatively weighted cycles
• Start at $$s$$
• incoporate the new vertex’s neighborhood, updating weights and predecessors
• Pseudocode:
• Input: a weighted graph $$G = (V, E)$$ and $$s \in V$$, a source
• Output: the least weighted path $$s \to v \in V$$, and weights $$P_v$$
• $$Q \gets \mathrm{min queue}$$
• For each $$v \in V \setminus \{s\}$$
• $$d_v \gets \infty$$
• $$p_v \gets \phi$$
• $$enqueue(Q, v, d_v)$$
• $$d_s \gets 0$$
• $$p_s \gets \phi$$
• enqueue($$Q, s, d_s$$)
• $$V_T \gets \emptyset$$
• for $$i = 1 \ldots n$$
• $$u \gets dequeue(Q)$$
• $$V_T \gets V_T \cup \{u\}$$
• Foreach $$v \in N(u) \setminus V_T$$
• if $$d_u + wt(u, v) < d_v$$
• $$d_v \gets d_u + wt(u, v)$$
• $$p_v \gets u$$
• $$decreasePriority(Q, v, d_v)$$ (is $$\mathcal{O}(\log n)$$)
• Output $$d, p$$
• Total will be $$\mathcal{O}(m \log n)$$

### Floyd-Warshall’s

• Given a weighdet graph (directed or not), $$G = (V, E)$$
• Output the shortest distance and path for every pair of nodes
• Does the same of this, it works quickly, and goes from $$v_i \to v_j$$
• Works with matrices
• Algo:
• dist = $$n \ctimes n$$ matrix, defaulting to $$\infty$$
• for each $$v \in V$$
• dist[v,v] = 0
• for each edge $$(u, v) \in E$$
• dist[u, v] = wt(u, v)$• for k from 1 to n • for i from 1 to n • for j from 1 to n • if dist[i,j] > dist[i,k] + dist[k,j] • dist[i,j] = dist[i,k] + dist[k,j] ## Lecture 16: More Graph Algorithms and the start of Dynamic Programming ### Floyd-warshall’s Algorithm, Continued • Checks if there’s a midpoint that’s less than direct • Algo • Input: A Weighted Graph $$G = (V, E)$$ • Output: Shortest Distance Matrix and successor matrix • $$D \gets (n \times n)$$ matrix • $$S \gets (n \times n)$$ matrix • For $$i \leq n, j \leq n$$: • $$d_{ij} \gets wt(v_i, v_j)$$ ($$\infty$$ if there isn’t an edge) • $$S_{ij} \gets j$$ for all edges $$(v_i, v_j)$$, $$\phi$$ if no such edge. • output $$D$$ and $$S$$ • If one triangle (generally the lower), the graph is a dag ### Dynamic Programming • Motivation: Binomial coefficients $$(x + y)^n = \binom{n}(0)x^{n} + \ldots + \binom{n}{n}y^n$$ • $$\binom{n}{k} = \frac{n!}{k!(n - k)!}$$ • About $$(n - k - 2) + (k - 2)$$ multiplicatinos and a division • $$n - 4$$ multiplications, thus $$\mathcal{O}(n)$$ multiplications • Pascal’s Identity: $$\binom{n}{k} = \binom{n - 1}{n - k} + \binom{n - 1}{k}$$ • with the base condition $$\binom{n}{0} = \binom{n}{n} = 1$$ • Algo(RecursiveBinomial): • Takes $$n, k$$ • if $$k = 0 \lor k = 1$$ • return 1 • Else • return RecursiveBinomial($$n - 1$$, $$k - 1$$) + RecursiveBinomial($$n - 1$$, $$k$$) • Enter Memoization! • If you compute a value, cache it! • Use a map • Or use a tableau, this is dynamic programming • I.E. • Set up a recurrence that expresses an optimal solution in terms of subsolutions • Fill a tableou of values using previously computed values for future values • Work forwards, not backwards • No what can be ignored • Algo: • for $$i \in 0 \ldots n$$ • for $$j \in 0 \ldots min(i, k)$$ • if $$j = 0 \lor j = k$$ •$Cij ← 1
• else
• $$C_{ij} \gets C_{i-1 j_1} + C_{i-1 j}$$
• Return $$C_{nk}$$
• Requires $$\mathcal{O}{n k}$$, $$\mathcal{O}(n^2)$$

### Optimal Binary Search Tree

• Suppose you have $$n$$ keys, $$k_1, \ldots, k_n$$ and $$k_1 < k_2 < \cdots < k_n$$
• Suppose you have a probability distribution on the searches of keys, $$p(k_i)$$ is the probability that someone will search for $$k_i$$
• Optimize the BST for locality and probilities
• Definition: The Catalan Numbers: $$C_n = \frac{1}{n + 1} \binom{2n}{n}$$ corressponds to the number of binary tree configuration with $$n$$ nodes
• Find the best part to drape over

## Lecture 17: Dynamic Programming

### Optimal Binary Search Trees

• Given a set of keys $$k_1 < k_2 < \cdots < k_n$$
• with probabilities $$p(k_i)$$
• Let $$C_{ij}$$ be the number of key comparisons in the obst for keys from $$k_i$$ to $$k_j$$
• We want $$C_{1n}$$ to be minimized
• Specific cases:
• $$C_{i,i-1} = 0$$
• $$C_{ii} = p(k_i)$$
• A tree (kL ki-kl-i kl+1-j)
• Then $$C_{i,l-1} + C_{l+1,j} + \sum_{s = i}^{j}p(k_s)$$
• $$C_{ij} = \min_{i\leq \ell \leq j} \{C_{i\ell-1} + C_{\ell+1j}\} + \sum_{s=i}^{j}p(k_s)$$
• Algo:
• Input: A set of ordered keys $$k_1$$ to $$k_n$$ with search probabilities
• Output: A root tableau and a value tableau
• For $$i = 1 \ldots n$$
• $$C_{i, i-1} \gets 0$$
• $$C_{i,i} \gets p(k_i)$$
• $$R_{i,i} \gets i$$
• $$C_{n+1,n} \gets 0$$
• For $$d = 1 \ldots (n - 1)$$
• For $$i = 1 \ldots (n - d)$$
• $$j \gets i + d$$
• $$min \gets \infty$$
• for $$\ell = i \ldots j$$
• $$q \gets C_{i,\ell + 1} + C_{\ell+1,j}$$
• if $$q < min$$
• $$min \gets q$$
• $$R_{i,j} \gets \ell$$
• $$C_{i,j} \gets min + \sum_{s=i}^{j}p(k_s)$$
• Output C1, n, R

### 0-1 Knapsack Problem

• Given a set of $$n$$ elements in $$A$$, each with a weight $$W$$ and a value $$V$$ and a total capacity $$W^{\prime}$$
• Take as many as possible while not exceeding $$W^{\prime}$$ and maximizing $$V$$
• $$V_{ij}$$ is the value of the optimal knapsack involving elements from 1 to $$i$$ and having a weight no more than $$j$$
• $$V_{nW^{\prime}}$$ is the optimal solution
• An $$n \times W^{\prime}$$ tableau
• Special Cases
• $$V_{0j} = 0$$
• $$V_{i0} = 0$$
• based on max of a few sums
• Is top to bottom left to right
• Algorithm:
• Takes $$A, w, v, W$$
• Output: completed tableau
• For $$i = 0 \ldots n$$
• for $$j = 0 \ldots w$$
• If $$i = 0 \lor j = 0$$
• $$V_{ij} \gets 0$$
• Else If $$(j - w_i) \geq 0$$
• $$V_{ij} \gets max(V_{i-1j}, V_{i-1,j-w} + w_i)$$
• Else
• $$V_{ij} \gets V_{i-1j}$$
• Return $$V$$
• This is order $$\mathcal{O}(nW)$$ (psuedo polynomial)

## Lecture 18: Hash Tables, Hash Functions and Computability

### Hash Tables

• store things in arrays
• indices are determined by a key and a hash
• Amortized constant time for look-up

### Hash Functions

• map a large domain onto a small codomain
• involves cutting up a large domain into a small domain
• Often $$h: \mathbb{Z} \to \mathbb{Z}_m$$
• Not 1 to 1, can have collisions i.e. $$h(a) = h(b)$$
• $$\mathbb{Z}_m = \{0, 1, \ldots, m - 1\}$$ (a ring)
• ex: $$h(x) = x \mod 7$$
• Should be easy to compute
• uniformly dstribute elements
• Other applications:
• RC4, MD4, MD5, SHA suite, PBKDF2/scrypt/bcrypt
• salt makes more secure, helps to defeat rainbow attacks
• Crypto currencies
• blockchains
• integrity checking/unique identifiers
• derandomization

### Implementation stuff

• objects to be stored in the array
• integer representation of the object is hashed and used for the index in the array
• object is stored
• Ex $$h(k) = 7k + 13 \mod 8$$
• $$\mathbb{Z}_8 = \{0, 1, 2, 3, 4, 5, 6, 7\}$$
• Only 8 elements can be stored
linear probing
traverse the array to the next available spot
• will eventually fill up
• lookup/insertion will be worse ($$\mathcal{O}(n)$$)
linear function probing
$$h(k, i) = h(k) + c_1 i \mod m$$
$$h(k, i) = h(k) + c_1i + c_2 i^2 \mod m$$
Secondary hash probing
$$h(k, i) = h(k) + i s(k) \mod m$$
Chaining
each index is a reference to a linked list or other general storage data structure
• Strategies to remedy
• larger hash tables reduce collisions, but waste memory
• faster means more memory
• slower means less memory
• Load factor $$d = \frac{n}{m}$$, where $$n$$ is the number of elements and $$m$$ is the size of the array
• high load factors: less memory, slow performance
• low load factor: more memory, fast performance
• when load factor reaches a threshold, often 0.75, rehash with a new hash function and a larger array (larger m)
• increases table
• all elements however must be rehashed ($$\mathcal{O}(n)$$)
• uses
• caches
• sets

### Computability

• Can computers solve any problem?
• No!
• some problems are too hard
• too abstract
• do not halt
• Can algorithms solve any problem?
• Can Turing Machines decide any language?
• Algorithm $$\iff$$ Turing Machine
• Solve $$\iff$$ Decide
• Problem $$\iff$$ Language
• Language:
• An alphabet $$\Sigma$$, the characters, ex $$\Sigma = \{0, 1\}$$
• A string or word is a finite sequence of of characters, or $$\lambda$$, the empty string
• The set of all finite strings, $$\{\lambda, 0, 1, 00, 01, \ldots\}$$, or $$\Sigma^*$$, the Kleene Star
• A language is therefore some subset of $$L \subseteq \Sigma^*$$
• has a constraint of some form in most situations
• $$L$$ is the set of all binayr strings that encode a graph that is acyclic
•  is the encoding
• Huffman coding reduces any language to binary
• Simple Model:
• the Finite State Machine: Can be used to determine if in a language
• Starts at start, and based on the input, traverse an edge
• if you are at a reject state at the end, reject the string
• if at an accept state, accept the state

## Lecture 19: Computability, Continued

### Turing Machines

• finite state machines
• set of states
• some accepting
• some rejecting
• transitions (read a bit and go to a state)
• no extra memory
• accept the string if end at an accept state
• define regular languages
• can be expressed with a regular expression
• There exist non-regular languages
• context-free languages
• require a push-down automaton, an FSM plus an unbounded stack
• decidable languages
• requires a Turing Machine
• And the Turing Machine is:
• an FSM with an input tape and a work tape
• can read/write the work tape and go backwards and forwards on both
• $$Q$$ is a non-empty set of states
• has accept and reject states, the machine halts on either accept or reject
• and an initial state
• $$\Sigma$$ is the input alphabet
• $$\Gamma$$ is the work tape alphabet, can equal $$\Sigma$$
• Has a transition function $$\delta : Q \times \Sigma \times \Gamma \to Q \times \Gamma \times \{L,R,\mathbf{-}\}^2$$
• Church-Turing Thesis
• Any algorithm is equivalent to a Turing Machine or the $$\lambda$$ Calculus
• Can a Turing Machine Decide any language?
• No! Turing machines are finite objects
• you can encode not just problems, but machines, $$T \to \langle T \rangle \in \Sigma^*$$
• This set is countably infinite
• The set of all languages ($$L \subset \Sigma^*$$), is uncountable
• There are, therefore, languages for which there exists no Turing Machine

## Lecture 20: Computability, Continued

### Church-Turing Thesis

• “Algorithm Solves Problems” $$\equiv$$ “Turing machines decide languages”
• Turing Machines: finite state control plus a work tape
• Language: subset of $$\Sigma^{*}$$
• Decide: A TM decides language $$L$$ iff for all inputs it:
• halts
• ends in $$q_{\mathrm{acc}} \leftrightarrow x \in L$$

### The Halting Problem

• Does there exist an algorithm that:
• Given a TM $$M$$ and input $$x$$, outputs true if $$M(x)$$ halts, false if $$M(x)$$ inf-loops
• This is the Halting Problem
• Theorem: No such algorithm to solve the halting problem exists
• Proof (By way of contradiction):
• suppose $$\exists$$ a TM $$H(\langle M,x \rangle)$$ that decides (solves) the halting problem $H(\langle M,x \rangle) = \left\{\begin{matrix}1&M(x) \mathrm{halts}\\0&M(x) \mathrm{no halt}\end{matrix}\right.$
• Consider Running $$M$$ on itself, $$M(M)$$
• Consider the Encoding $$\langle M,M \rangle$$ $Q(\langle M \rangle) = \left\{\begin{matrix}H & H(\langle M,M \rangle) = 0\\\infty&H(\langle M,M \rangle) = 1\end{matrix}\right.$
• $$Q(\langle Q \rangle)$$ $Q(\langle Q \rangle) = \left\{\begin{matrix}H & H(\langle Q,Q \rangle) = 0\\\infty&H(\langle Q,Q \rangle) = 1\end{matrix}\right.$
• This halts iff $$H(\langle Q,Q \rangle) = 0$$ iff $$Q(\langle Q \rangle)$$ does not halt.
• Contradiction, $$H$$ does not exist
• Restricting to solvable problems
• $$M(X)$$ on an input may use some resources:
• time = each transition is 1 unit of time, $$T(|x|) = T(n)$$
• memory = number of sells written to on work tape, $$S(|x|) = S(n)$$
• $$x$$ is input, input size is $$|x|$$, or $$n$$
• Suppose a language $$L$$ is decided by a TM $$M$$ such that an input of length $$n$$, $$M$$ uses at least $$p(n)$$ transitions, where $$p$$ is some polynomial in $$n$$
• then $$L\in P$$
• $$L$$ is a language
• $$P$$ is a complexity class
• $$P$$ is deterministic polynomial time, the set of all languages $$L$$ such that there exists a plynomial time TM that decides $$L$$
• matrix multiplication $$\mathcal{O}(n^3)$$
• connectivity $$\mathcal{O}(n^3)$$ (Floyd-Warshall)
• Searching
• Not in:
• Hamiltonian Path / Cycle
• 0-1 Knapsack
• Satisfiability

### Non-determinism

• theoretical model of computation
• 2 phases, guessing and verification
• Ex: Ham Path
• guess a permutation of vertices, $$v_1, v_2, \ldots, v_n$$
• check if path is valid, if it is, accept, otherwise reject
• A nondeterministic TM accepts if there is any guess that ends up accepting
• like looking at a tree and being $$\mathcal{O}(n)$$ without having to look at the individual leaves
• $$\exists x_1, x_2, \ldots, x_n P(x_1, \ldots x_2)$$ or $$\forall x_1, \ldots, x_n P(x_1, \ldots x_n)$$
• NP, or non-deterministic polynomial time
• the set of all languages $$L$$ for which there exists a non-deterministic polynomial time Turing Machine that decides $$L$$
• $$P \subseteq NP$$
• Is $$P \stackrel{?}{=} NP$$
• And $$NP \subseteq EXP$$ (deterministic exponential time), and $$P \neq EXP$$
• Goal: Characterize the hard problems that we’ve seen
• NPC or NP-Complete is the set of all problems (languages $$L$$) such that the language $$L \in NP$$ and $$L$$ is at least as hard as all other languages in $$NP$$
• Show 0-1 Knapsack is in $$NP$$
• Map to a decision problem (Does there exist a possible knapsack?)
• $$(A, \overrightarrow{v}, \overrightarrow{w}, W, k), 0 \leq k \leq \sum_{q\in A} v_n$$
• output yes if the optimal solution is at least $$k$$, no otherwise
• Show that the decision version is in $$NP$$
• Algorithm
• ND guess a subset $$S \subseteq A$$ ($$\mathcal{O}(n)$$)
• Verify
• for easch $$s \in S$$ ($$\mathcal{O}(n)$$)
• $$\mathcal{V} \gets \mathcal{V} + v(s)$$
• $$\mathcal{W} \gets \mathcal{W} + w(s)$$
• if $$\mathcal{W} > W$$
• reject
• else if $$\mathcal{V} \geq k$$
• accept
• else, reject
• Is in $$NP$$

### Reductions

• a decision problem $$P_1$$ is polynomial-time reducible to a decision problem $$P_2$$ if:
• there exists a function $$f$$ such that $$f$$ maps all yes instances of $$P_1$$ to yes instances of $$P_2$$ and no to no
• $$f$$ is computable by a polynomial time algorithm
• Notation: $$P_1 \leq_{P} P_2$$
• $$P_2$$ is at least as hard as $$P_1$$
• $$P_1$$ is no harder than $$P_2$$
• Example:
• Linear System Solving, reduced to Gaussian Elimination, $$LSS \leq_{P} GE$$
• Network Problems: Find fastest route, reduced to shortest graph distance
• Compute the median, reduce to sort
• Want to solve problem $$X$$
• we know how to solve problem $$Y$$ (with algorithm $$A$$)
• we know that $$X \leq_{P} Y$$
• we have $$x$$ as an instance of $$X$$
• $$x \to f_{\leq_{P}}(x) \to y \to A(y)$$
• This is $$A^{\prime}$$
• Goal: show that a problem $$\mathcal{P}$$ is NP-complete
• it suffices to take a known NP-complete problem, $$X$$ and reduce to $$\mathcal{P}$$, $$X \leq_{P} \mathcal{P}$$
• Need at least 1 NP-complete problem to start any reduction
• In 1971, Stephen Cook showed the first NP complete problem: satisfiability is NP-complete

## Lecture 21: Computablity, Continued

### Reductions: Continued

• Algorithmic analysis only studies algorithms and solutions
• complexity studies the problems
• We use reductions.
• if we show $$P_1 \leq_{P} P_2$$
• $$P_2$$ is at least as hard as $$P_1$$
• $$P_1$$ is no harder than $$P_2$$
• if we show also that $$P_2 \leq_{P} P_1$$
• $$P_1$$ is equivalent to $$P_2$$
• $$L$$ is NP-complete if
• $$L \in NP$$
• All other languages in NP reduce to $$L$$, $$\forall L_2 $L_2 \leq_{P} L$$$
• All NP complete problems reduce to each other and have the same complexity
• A Trivial problem: $$L_{NP} = \left\{\langle M, x \rangle \biggr\vert \begin{matrix}\textrm{M is an NP Turing Machine}\\\textrm{that accepts X in polynomial time.}\end{matrix}\right\}$$
• This is shown as $$L_{NP} \leq_{P} SAT$$
• 3SAT is a restricted variation of SAT
• Given a boolean expression on $$n$$ variables, $$x_1, \ldots, x_n$$ in 3CNF
3CNF
Conjunctive Normal Form, conjunction of a bunch of disjunctions, called clauses, each with 3 variables or their negations
• Ex: (and (or x1 x2 (not x4)) (or (not x1) x2 x3) (or x1 (not x3) (not x4)))
• Clique Problem
• Given an undirected graph $$G = (V, E)$$ and an integer $$k$$
• Output true if $$G$$ contains a clique of size $$k$$ or greater
• A Clique is a complete subgraph
• No efficient solution, we will show that it is NP-complete
• Show that $$\mathrm{3SAT} \leq_{P} \mathrm{Clique}$$

### Clique and NP Completeness

1. Clique is in NP
• Given: $$\langle G, k \rangle$$
• Guess: a subset of vertices $$C \in V$$ of size $$k$$ (non-deterministic, $$\mathcal{O}(k) \leq \mathcal{O}(n)$$)
• Verify: Is the guessed subset $$C$$ a clique?
• For each pair $$(u, v) \in C$$
• if \$(u, v) ¬∈ E
• halt and reject
• halt and accept
2. Choose a known NP-complete problem to reduce from to Clique
• 3SAT
3. Define a mapping from instances of $$P_1$$ to $$P_2$$ that preserve solutions
• $$\phi \to G, k$$
• Mapping as follows:
• Let $$\phi = C_1 \land C_2 \land \cdots \land C_k$$ be a 3CNF formula of $$k$$ clauses
• For each $$C_i= x_1^i \lor x_2^i \lor x_3^i$$
• create 3 vertices $$v_1^i, v_2^i, v_3^i$$
• Edges connect vertices $$e = (v_a^i, v_b^j)$$ iff the following both hold
• the variables are from different clauses, $$i \neq j$$
• and the variables are consistent, $$v_a^i \neq \lnot v_b^j$$
• Example: (and (or x1 (not x2) x3) (or x1 (not x2) (not x3)) (or x1 x2 x3))
4. Prove that solutions are preserved
• That $$\phi$$ is satisfiable iff $$G$$ has a clique of size $$k$$ or greater
• Outline
• Let $$\phi$$ be satisfiable
• Each clause has a variable (or a negation) that evaluates to true.
• Each true variable corresponds to some vertex in $$G$$, $$\therefore k$$ vertices
• Observation:
• they must be in different clauses, thus must be in different vertex groups
• none of them are negations of each other
• therefore each of the $$k$$ vertices are connected.
• Then the $$k$$ vertices form a clique
• Let $$G$$ be a graph with a clique of size $$k$$
• show that $$\phi$$ is satisfiable
5. Show that the reduction can be computed in deterministic polynomial time.
• 3SAT $$\leq_{P}$$ Clique
• $$3k$$ to generate vertices, at most $$\mathcal{O}(n)$$
• $$\binom{3k}{2}$$ to generate edges, $$\mathcal{O}(n^2)$$
6. Thus, Clique is NP-Complete

### Reduction Tree

• $$L_{NP} \to SAT \to 3SAT \to Clique$$ are all NP-complete
• By composing mappings, it can be shown that polynomial time reductions in $$NP$$ are transitive, reflexive and symmetric.
• Thus an equivalence class for NP-complete problems

### Vertex Cover Reduction

1. Definitions
• Given an unweighted graph $$G = (V, E)$$ and an integer $$k$$, output yes if $$G$$ has a vertex cover of size at most $$k$$
• A vertex cover is $$V^{\prime} \subseteq V$$ such that every edge has at least one of its points in $$V^{\prime}$$
2. Show that the problem is in NP
• Guess a subset of size at most $$k$$
• for each $$e = (u, v) \in E$$
• if ($$u \not\in V^{\prime} \land v \not\in V^{\prime}$$)
• halt and reject
• halt, accept
3. Choose NP complete problem to reduce from
• Clique $$\leq_{P}$$ Vertex Cover
4. Define mapping
• $$\langle G, k \rangle$$ is a clique
• $$\langle G, k \rangle \to \langle \overline{G}, n - k \rangle$$
5. Prove that solutions are preserved
• Fact $$G$$ has a clique of size $$k$$ iff $$\overline{G}$$ has a vertex cover of $$n - k$$
• Let $$G$$ have a clique $$C \subseteq V$$ of size $$k$$
• Let $$e = (u, v) \in \overline{E}$$
• $$\forall e \in \overline{E} \to e \not\in E \to u \not\in C \lor v \not\in C \to u \in V \setminus C \lor v \in V \setminus C$$, thus $$(u, v) \in \overline{E}$$ is covered by $$V \setminus C$$, and therefore $$V \setminus C$$ is a vertex cover of size $$n - k$$
• Let $$C \subseteq V$$ be a vertex cover in $$\overline{G}$$ of size $$n - k$$
• $$\forall u, v \in V \left[ (u, v) \in \overline{E} \to u \in C \lor v \in C \right] \equiv \forall u, v \in V \left[ (u \not\in C \land v \not\in C) \to (u, v) \in E \right]$$, therefore every pair of vertices in $$V \setminus C$$ is connected, thus $$V \setminus C$$ is a clique of size $$k$$.
6. Show the reduction can be computed in deterministic polynomial time
• $$\mathcal{O}(n^2)$$ to flip adjacency matrix
• $$\mathcal{O}(1)$$ to calculate $$n - k$$
• Thus, $$\mathcal{O}(n^2)$$

### Overview of Complexity classes

• P is part of NP, NPC is part of NP, CoNP contains part of NP and all of P, CoNPC is part of CoNP, R (Recursive, all languages taht have a halting Turing Machine) contains all of this, and RE contains R, and CoRE containes part of RE

## Lecture 22: Final Review

• DFS
• BFS
• Djikstra’s
• MST

### Cut Edge

• Let $$G = (V, E)$$ be a connected, undirected graph
• A cut edge $$e \in E$$ is an edge such that if removed from $$G$$ would disconnect it
• Design an algorithm given $$G$$ that determines if it contains a cut edge
• Test each edge
• use DFS to check if disconnected
• if yes, stop and output true
• output false

### Hashing

• repeat quiz 4 for $$h(k) = (2k + 7) \mod 13$$ for 4, 42, 26, 5, 8, 18, 14 with linear probing
• Use chaining to resolve collisions

### OBST

• Suppose $$T$$ is an OBST with $$n$$ keys. What is max depth? – $$n - 1$$
• Suppose we have $$n$$ keys each with uniform probability $$\frac{1}{n}$$, what would the depth of the OBST be? – $$\lciel\log_2 n\rciel$$
• Given a root tableau and keys, build the OBST

### P and NP

• Let $$P_1, P_2, P_3$$ be problems
• suppose $$P_1 \leq_{P} P_2$$
• $$P_2$$ is NP complete
• can you conclude $$P_1$$ is NP complete
• No. Anything in NP reduces to an NP complete problem
• $$P_1 \leq_{P} P_2$$
• $$P_2 \in NP$$
• Is $$P_2$$ NP Complete
• No, both $$P_1$$ must be NP complete, but we don’t know that.
• $$P_1 \in P$$ and $$P_2 \in NPC$$
• Can we conclude that $$P_1 \leq_{P} P_2$$?
• Yes. Anything in NP reduces to NP complete
• $$P_1 \leq_{P} P_2$$ and $$P_2 \leq NPC$$
• Can we say that $$P_1 \in NP$$?
• Not necessarily. Not enough information is given.
• Although given this class, yes, as anything that reduces from NPC is in NP (for this class)