# Lecture Notes

CSE 332

## Table of contents

- Lecture 1: Intro, Stacks, and Queues
- Lecture 2: Algorithmic Analysis
- Lecture 3: Algorithmic Analysis II, Amortization
- Lecture 4: Priority Queues
- Lecture 5:
`buildHeap`

and Recurrences - Lecture 6: Tree Recurrence Method, Dictionary ADT
- Lecture 7: Binary Search Trees
- Lecture 8: AVL Trees
- Lecture 9: B-Trees
- Lecture 10: Hashing
- Lecture 11: More Hashing
- Lecture 12: Comparison Sorts
- Lecture 13: Non-comparison Sorts
- Lecture 14: Introduction to Graphs
- Lecture 15: Graph Traversals
- Lecture 16: Graph Traversals
- Lecture 17: Multirheading and Fork-Join
- Lecture 18: Analysis of Fork-Join Parallel Programs
- Lecture 19: Parallel Prefix, Pack, and Sorting
- Lecture 20: Shared-Memory Concurrency and Mutual Exclusion
- Lecture 21: Race Conditions and Deadlock
- Lecture 22: Topological Sort
- Lecture 23: Minimum Spanning Trees
- Lecture 24: P vs NP
- Lecture 25: More NP vs P

## Lecture 1: Intro, Stacks, and Queues

Data structures

- A clever way to organize information to enable efficient computation over that information
- Accessing elements in array –
`O(1)`

- Accessing elements in linked list –
`O(n)`

- Adding element at the front of a linked list –
`O(1)`

- Adding element at the front of an array –
`O(n)`

- Binary search tree, find max element –
`O(log n)`

- Accessing elements in array –
- Abstract Data Type (ADT) – mathematical description of a thing and set of oeprations
- Data structure – specific organization of data and algorithms for implementing an ADT
- Implementation – actual code implementation of a data structure
- Stack ADT – push, pop, isEmpty, etc.
- LIFO – last in first out

- Queue
- FIFO – first in first out
- Operations: enqueue, dequeue, isEmpty, etc.
- Circular array queue data structure – enqueue and dequeue are
`O(1)`

- Can also use a doubly linked list

## Lecture 2: Algorithmic Analysis

- What do we care about?
- Correctness
- Performance: speed (time complexity) and memory (space complexity)

- How to compare two algorithms? A lot of variability, doesn’t capture worst-case; what if \(n\) grows?
- We want to evaluate the algorithm rather than the implementation specifically used

Counting basic code constructs

- Assume basic operations take the same amount of time

Code Construct | Time |
---|---|

consecutive statement | sum of times |

loops | num iterations \(\times\) loop body time |

condiitionals | time of condition \(+\) slowest branch |

method calls | time of function’s body |

recursion | solve recurrence relation |

- Two scenarios: worst-case and best-case complexity
- Usually focus on worst-case complexity – max # steps the algorithm takes for the most challenging input of size \(n\); assume \(n\) is big.
- Why not use the ‘average’ case? Not sure what this means.
- Amortized-case complexity, e.g. resizing an array.
- Asymptotic analysis: getting the big-O
- Eliminate all low-order terms and eliminate constant coefficients.

- We use \(\mathcal{O{\) on a function to mean the set of functions with asymptotic behavior less than or equal to \(f(n)\).
- Suppose \(f: \mathbb{N} \to \mathbb{R}, g: \mathbb{N} \to \mathbb{R}\) are two functions. Then,

- \(n_0\) represents the threshold at which a function truly exceeds the other for the rest of the positive domain.
- Technically, \(\log n \in \mathcal{O}(n)\) is correct. But we want a tighter bound: \(\log n \in \mathcal{O}(\log n)\)
- We want to remain invariant to differences in scaling, so we have a \(c\).

## Lecture 3: Algorithmic Analysis II, Amortization

- Counting code constructs
- How to show that \(f(n) \in \mathcal{O}(f(n))\)? Pick \(c\) to account for constant factors and \(n_0\) to cover lower-order terms
- Informal way to understnd big O is in terms of droppable terms which characterize class (invariant to scaling)
- Big omega – same as Big-Oh, but with \(f(n) \ge c \cdot g_n\). Therefore \(f(n) \in \Omega (g(n))\) iff \(f(n)\) eventually constitutes a lower bound.
- Tight bound \(\Theta\)

- Little oh and omega: no ‘equals to’
- Big Theta – a function is in \(\Theta(g(n))\) if there is a tight bound, it is being squeezed between big oh and big omega.
- Worst and best – all about case / scenario
- Assuming a particular scenario, what happens as \(n \to \infty\)?
- Big-Oh: what’s the worst growth rate this algorithm could have?
- Big-Omega: what’s the best growth this algorithm could have?
- Although these two are technically incorrect.
- Usually concerned with worst and amortized for tight or upper asymptotic analysis
- Mtovation: the worst case is too pessimistic.
- Amortization: max total # steps an algorithm takes on \(M\) most challenging consecutive inputs of size \(n\) divided by \(m\): averages the running times of operations in a worst-case sequence over that sequence

## Lecture 4: Priority Queues

- “Highest priority, first out”
- Operations:
`insert`

(adds item at the end, enqueue equivalent)`deleteMin`

(finds, returns, and removes minimum element in priority queue, dequeue equivalent)- Break ties arbitrarily

- Implemented by a heap
- Holds comparable data, in the sense that every element has a priority

`insert` | `deleteMin` | |

unsorted array | `O(1)` | `O(n)` |

unsorted linked list | `O(1)` | `O(n)` |

sorted circular array | `O(n)` | `O(1)` |

sorted linked list | `O(n)` | `O(1)` |

binary search tree | `O(n)` | `O(n)` |

binary min heap | `O(log n)` | `O(log n)` |

- We pay for functionality needed
- Don’t want to scan all the items, but we also maintain a full sorted list
- Visualize heap as a tree

- Height: count arrows from node to deepest descendent
- Depth: count the arrows from root to node
- Binary tree: every node has at most 2 children
- n-ary tree: every node has at most \(n\) children
- Complete tree: every row is completely full except for the bottom row, bottom row is filled left to right
- Perfect tree: every row is completely full
- Height is \(\mathcal{P}(\log n)\)

- Binary min heaps
- Smallest element is at the root
- A complete binary tree
- Heap order priority: every non-root node has a priority value greater than or equal to its parent

- Strategy for every operation:
- Preserve complete tree structure property
- May break heap order property
- Percolate to restore heap order property

- Percolate up: swap with the parent until you reach the root or the heap order property is fulfilled
- Deletion:
- Grab minimum element (root) and delete it
- Only option is to move the bottom-rightmost element as the root
- This probably breaks heap order quality, so percolate down to restore heap order property
- When percolating down, swap with the
*smallest child*

- Array representation of heap
- Root at index 1, children at \(2i\) and \(2i + 1\) for any \(i\)
- Parent is given by
`i / 2`

(integer division)

- Array implementation:
- Minimal wasted space – using tree node objects requires pointers, which are expensive
- Fast lookups: index is very nice
- Resizing is an issue, but not by that much.

- Increase and decrease priority operations: change index value and then percolate up or down
- Can conceive of deleting any particular keep as decreasing key by infinity and then percolating to top, and then delete min.

## Lecture 5: `buildHeap`

and Recurrences

- Scenario: \(n\) elements into a blank heap. Calling insert \(n\) times gives \(n \log n\).
- We can do better with Floyd’s
`buildHeap`

(linear time)- Put \(n\) elements in the array, any order is OK
- Percolate down, starting from the lowest non-leaf node not in the last row, and working up to the root. (Leaves are already in the correct ordering.)

- Runtime efficiency calculations
- Given completeness, the total loop iterations is \(\frac{n}{2}\)
- For half of its iterations (bottom full row), percolate at most 1 step; \(\frac{1}{2}\) cost
- For a quarter of its iterations (second to bottom full row), percolate at most 2 steps, \(\frac{2}{4}\) cost
- Summed cost: \(\frac{1}{2} + \frac{2}{4} + \frac{3}{8} + ...) = 2\)
- \[2 \cdot \frac{n}{2} \in \mathcal{O}(n)\]

- Intuition for correctness: build a hierarchy of nodes that we promise are going to be correct.

Couhnting recursive code

- Analogy: performing computation recursively on a list of size \(n\)
- Each recursive call performs some non-recursive work \(w(n)\) and then calls the method on a smaller resource \(T(n - 1)\)
- We reach a base case with work \(b(n)\)
- Total work is given by \(T(n) = w(n) + T(n-1)\), \(T(1) = b(n)\)
- Recurrence function / relation: piecewise function which mathematical models the runtime of a recursive algorithm
- Unrolling method
- Expand via substitution
- Find the closed form – find when the base case occurs and get to the base case.

## Lecture 6: Tree Recurrence Method, Dictionary ADT

Last time – algorithmic analysis of recursive code

- Write recurrences
- Solve recurrences
- Perform asymptotic analysis or find big-theta.
- Unrolling method: keep on substituting until a pattern emerges.
- Closed form: get to the very base case and then rewrite it in terms of some \(n\).
- ‘Binary sum’ recurrent relation:

- Be careful: if you have something like
`return 2 * binarySum(arr, lo, mid)`

– the resulting recursive relation is not`2 \cdot T\left(\frac{n}{2})`

, because`2`

is just a constant factor: it is actually just`T\left(\frac{n}{2})`

because we are only doing the computation once. - Tree method: you want to find the general formula and then the closed form.
- General formula
- Intiailize table
- Draw actual tree
- Add miscellaneous details – recrsive calls, # nodes, sum work, etc.
- Find the base case
- Do work calculation

- The sum work is the number of nodes at each recursive level multiplied by the amount of work at each node

Common recurrences

Common recurrence relation | Order of growth | Example |
---|---|---|

\(T(n) = T(n/2) + c\) | \(\log n\) | binary search |

\(T(n) = 2T(n/2) + n\) | \(n \log n\) | merge sort |

\(T(n) = T(n/2) + n\) | \(n\) | |

\(T(n) = 2 T(n/2) + c\) | \(n\) | recursive binary sum |

\(T(n) = T(n-1) + c\) | \(n\) | recursive sum |

\(T(n) = T(n - 1) + n\) | \(n^2\) | |

\(T(n) = 2T(n-1) + c\) | \(2^n\) |

Dictionary ADTs

- ADTs so far: stack, queue, priority queue
- Dictionary/map: one of the most important ADTs
- A set of unique key-value pairs – the keyset must be unique
- A set ADT is similar to a dictionary ADT without any values
- A set: does a key exist or not? (no duplicates)

Dictionary runtimes with primitive data structures

insert | find | delete | |

unsorted linked list | \(\mathcal{O}(n)\) | \(\mathcal{O}(n)\) | \(\mathcal{O}(n)\) |

unsorted array | \(\mathcal{O}(n)\) | \(\mathcal{O}(n)\) | \(\mathcal{O}(n)\) |

sorted linked list | \(\mathcal{O}(n)\) | \(\mathcal{O}(n)\) | \(\mathcal{O}(n)\) |

sorted array | \(\mathcal{O}(n)\) | \(\mathcal{O}(\log n)\) | \(\mathcal{O}(n)\) |

## Lecture 7: Binary Search Trees

- Dictionary ADT: set of unique key-value pairs.
- Lazy deletion – instead of deleting the physical element in the array, include an extra bit in each field indicating whether the element is deleted or not.
- Advantages: simpler, can remove in batches, just unmark if re-added
- Disadvantages: extra space, wastes space, \(\log m\) time where \(m \ge n\)

- Height is noramlly \(\log n\), but we need guaranteed balancing of search trees to present worst case linear time.
- Trees offer speedups because of branching factors
- We can improve worst-case to \(\log n\) for all operations (search, add, delete) by using a binary search tree
- We need a data structure which uses a binary search mechanism
- Binary tree is either a root, a left subtree, or a right subtree. It is data with a left pointer and a right poitner. For a dictionary, the data will be a key-value pair.
- For a binary tree of height \(h\),
- the maximum number of leaves is \(2^h\)
- maximum number of nodes is \(2^{h+1} - 1\)
- minimum number of leaves is \(1\)
- minimum number of nodes is \(h + 1\)

- Height: distance from root to the furthest leaf – recursive code to check left and right child
- Pre-order, in-order, post-order
- Pre-order: root, left, right
- In-order: left, root, right
- Post-order: left, right, root

- Recursing through left subtree, root, right subtree
- A lot of code with trees is done through recursion
- Structure (each node has 2 or less children) and order (all keys in the left subtree are less than the node’s key, and all keys in the right subtree are larger)
- Keys must be comparable
- No duplicates are allowed
- No less than or equals to

`find`

method: recursive. If key is less than root key, return find for left subtree. If key is greater than root key, return find for right subtree. If key is equal to root key, return root.- Minimum node: keep on going left until you hit a null node
- Maximum node: keep on going right until you hit a null node

`insert`

: each insert is inserting a leaf node. Call`find`

, and then create a new leaf node.`delete`

: 3 cases. Important thing is to keep the ordering property met.- Node is a leaf node: just delete it
- Node has one child: replace node with child
- Node has two children: replace node with successor (minimum of right subtree) or predecessor (maximum of left subtree)

- Balancing conditions which both don’t work because you can carry on with linked-list problem
- Left and right subtree of just the root have the same number of nodes
- Left and right subtree of just root have the same height

- Balancing conditions – every node has the same number of nodes or every node has the same height, this is TOO STRONG – because it requires only perfect trees. Certain numbers of elements don’t even work.
- Instead, we meet in the middle. Left and right tree heights differ by at most one for every node.

AVL balance property: \(\left| \text{height of left subtree} - \text{height of right subtree} \right| \le 1\)

- Always ensures that the root height is \(\log n\), and is efficient to maintain – \(\Theta(1)\) rotations.
- AVL Tree stands for Adelson-Velskii and Landis
- AVL tree is a BST which is balanced
- With an AVL Tree node, we also add a height field into our node so we can keep our field value.

## Lecture 8: AVL Trees

- Use B-Trees and Quicksort sorting presented in class
- AVL Tree is a BST which is balanced
- P is the problme node where an imbalance occurs. There are four cases.
- Left-left: left child of left subtree
- Left-right: right child of left subtree
- Right-right: right child of right subtree
- Right-left: left child of right subtree

- Cases 1 and 3 are solved by a single rotation; cases 2 and 4 are solved by a double rotation
- When going back up, use backtracking and rebalance the deepest imbalanced node
- Single-rotation (left left and right right):
- Move the child of p to the position of p
- p becomes the ‘other’ child
- Other subtrees move based on what BST allows

Left-left:

```
p A
/ \ / \
A Z -> X p
/ \ / \
X Y Y Z
```

Right-right:

```
p A
/ \ / \
Z A -> p X
/ \ / \
Y X Z Y
```

- Double rotation (left-right and right-left)
- Rotate p’s child and grandchild
- Rotate p and p’s new child

```
p p
\ \ X
A -> X -> / \
/ \ p A
X A
```

- AVL efficiency:
- find is in \(\mathcal{O}(\log n)\)
- insert is in \(\mathcal{O}(\log n)\)
- rotation is in \(\mathcal{O}(1)\)
- Deepest descendant is in \(\mathcal{O}(\log n)\)

- buildTree is in \(\mathcal{O}(\log n)\)
- Lazy and nonlazy deletion are both in \(\mathcal{O}(\log n)\)

- DSA is all about maintaining invariants / assumptions
- Pros of AVL Trees:
- All operations are worst-case because trees are always balanced
- height balancing adds no more than a constant factor

- Cons of AVL Trees:
- Difficult to program and debug
- MOre space needed for the height field
- Asymptotically faster, but rebalancing does take time
- Mot large searches are done in data-base systems, where the data is stored on disk. Disk access is slow, so we want to minimize the number of disk accesses. AVL trees are not good for this because they are not cache-friendly.

- B-trees try to exploit the fact that disk access is slow with high branching and use block sizes (accessing contiguous chunks of memory at a time)

## Lecture 9: B-Trees

- Disk accesses are expensive, how can we minimize disk accesses?
- Moving data up the memory hierarchy is slow because of latency
- Nearby memory is sent because it’s easy and will be asked for soon, esp in the context of arrays
- Temporal locality – lcoaity in time
- Spatial locality – locality in sapce
- Arrays vs linked list, which can better take advantage of the spatial locality?
- Array, obv

- AVL has worst-case \(\mathcal{O}(\log n)\), but it’s not cache-friendly
- B-trees are better for large dictionaries, e.g. over 1 GB
- Disk to memory: block size, page size. Memory to cache: line size.
- Array benefits from linked list because of contiguous memory accessing
- BSTs: looking things up in binary search trees is log-
*n*, but still disk accesses matter. The tree might not be able to fit entirely in memory even if the tree is very shallow. - We want a balanced tree even shalower than AVL trees so we can minimize disk accesses and exploit disk block size by
*increasing the branching factor* - We have a search tree with a branching factor M and an array of sorted children; M is chosen to fit nicely into the disk block such that we have 1 access per array.
- Say that you have block size \(d\), pointer size \(p\), key size \(k\), key-value pair size \(v\). You need an internal node to fit in the block:

\((M - 1) \cdot k + M \cdot p \le d\) \(M \le \frac{d + k}{p + k}\) \(M = \lfloor \frac{d + k}{p + k} \rfloor\)$$

\[v \cdot L \le d \to L = \lfloor \frac{d}{v} \rfloor\]`find`

requires \(\log_M(n)\) finds- If it’s balanced, the runtime is actually \(\mathcal{O}(\log_2 M \log_M n\)
- \(\log_M n\) is the height we traverse
- \(\log_2 M\) is finding the correct child branch to take using binary search among the M options

## Lecture 10: Hashing

- Dictionary data structures: unsorted linked list, unsorted array, sorted linked list, sorted array, balanced tree
- Hash table: giant array. Fit as many key-value pairs as possible.
- A hash function converts a key into an integer. Modding it by the table size gives me my index.
- What if mutliple kye pairs match to the same value? This is a collision. We want to avoid collisions when possible. It is unavoidable to have coillissions though.
- How do we resolve collisions?
- Hash table worst runtimes are all going to be
`O(1)`

, assuming few collisions - We can no longer implement findMin, findMax, predecessor, successor, sort, etc. as opposed to trees
- Typically the size is smaller than possible key space

Hash Function

- An ideal hash function is fast to compute and rarely hashes two used keys to the same index
- Hash tables are generic
- The client implements the hash function, implementer mods it
- Using prime numbers for table size is common
- Client should aim for different ints for expected items, e.g. avoid wasting any part of E or the 32 bits for
`int`

- What to hash? We will focus on ints and strings
- Identifying fields should contribute tot he hash function to avoid collisions
- But don’t add too many to lengthen the hash function time too much
- Hashing time vs collision avoidance trade-off
- Simple hash function:
`h(x) = x`

, library`g(x) = h(x) % TableSize`

- Technique for prime: real-life data tends to follow a pattern, so we want to avoid that pattern
- What if the key is not an int?
- Strings: sum of ASCII, or sum of ASCII times the power of sum number to avoid permutation case
- JDK hashcode value: 31 exponent plus the ascii value, again using 31 as a prime number
- Use all 32 bits, which includes negative numbers
- If keys are known ahead of time, choose a perfect hash
- You can hash each of the fields and combine with a string-like formula
- Linear probing: if there is a collision, go to the next available spot
- Quadratic probing: if there is a collision, go to the next available spot, but skip by a quadratic function
- Double hashing: if there is a collision, go to the next available spot, but skip by a function of the key
- Separate chaining: all keys mapping to the same table location are kept in a list
- Some data structure engineering can improve constant factors, such as move-to-front or linked list or array, etc.
- Lambda: load factor, \(\lambda = \frac{n}{m}\) where \(n\) is the number of items and \(m\) is the table size
- Unsuccessful
`find`

compares against \(\lambda\) items, successful`find`

compares against \(\frac{\lambda}{2}\) items (on average) - To keep runtime constant, resize TableSize to keep \(\lambda\) constant

## Lecture 11: More Hashing

- Hashing has very fast average runtimes – an array is a hash table, use a hash function
- A lot of different choices: hash function, tabl size, collision resolution strategy
- Open addressing with linear probing: try the next available spot mod table size
- Open addressing: resolve collisions by trying a sequence of other positions in the table
- Probe function \(f\):
`i`

th probe is`(h(key) + f(i)) % TableSize`

- Open addressing does poorly with a high load factor \(\lambda\)
- For the
`find`

operation, we store values with keys, and we keep going along the probe sequence until we hit an empty slot - For
`delete`

, we risk pausing the`find`

probe early with empty slots. Use lazy deletion to mark there used to be a key there - LInear probing is a bad idea because it tends to form clusters, which lead to long probe sequences. Also called primary clustering. Each insert operation either appends to the end of a cluster or creates a new cluster.
- Quadratic probing: \(f(i) = i^2\)
- You might end up with an infinite loop though
- Proof: use a prime TableSize, so quadratic probing will find an empty slot in at most Table Size divided by 2 probes. SO if lambda is less than half, no need to detect cycles.
- Quadratic probing does not resolve collisions between keys that initially hash to the same index: secondary clustering
- Double hashing: \(f(i) = i \cdot h'(key)\) where \(h'\) is a second hash function. If two keys hash to the same index, it is unlikely that they will follow the same probe sequence.
- While we prevent secondary hashing (possibly), we run the risk of infinite cycles
- We no longer have a convenient theorem for double hashing
- Some functions
`h`

and`g`

have proven resistance to infintite cycles - As lambda approaches 1, resize the table because you don’t want to do too much probing
- Advantage is less memory allocation – less chaining, it’s just in one array
- Separate chaining is more intuitive and easier to implement, also linear rather than exponentially increasing as lambda approaches one
- Can optimize the structure of storage within a single bucket
- Rehashing: when you resize the table, you need to rehash all the keys
- Separate chaining: you can decide what too full means, probably keeping the load factor reasonable
- Open addressing: half full is a good rule of thumb

- Good idea to double. But you should choose a prime if you can.

## Lecture 12: Comparison Sorts

- Why sorting? – common to need data to be sorted
- Goals: stable (preserve original ordering in the case of ties), in-place (no extra memory), fast (typically \(\mathcal{O}(n \log n)\))
- Insertion sort: beginning is sorted, end is unsorted, loop through and find the best way to put it
- Selection sort: find the smallest element and append it to the end of the sorted part.
- Stable and in place, but not fast – best case and worst case are both \(\mathcal{O}(n^2)\)

- Selection sort, most work is done in the unsorted part (selecting)
- Insertion sort, most work is done in the sorted part (inserting)
- Bubble Sort – pretend it doesn’t exist, really bad asymptotic complexity and bad constant factors, literally should not be used ever.
- Heap Sort – use a heap, put all element into a heap with build-heap (\(O(n)\)) and then remove elements one by one and put them back in the heap (\(O(n \log n)\))
- This is stable: treat the initial array as a heap, when you delete the
`i`

th element, you put it in`arr[n-i]`

. When you`deleteMin`

, there’s always an empty block at the beginning, because heaps are a complete tree.

- This is stable: treat the initial array as a heap, when you delete the
- AVL Sort: not good, insert all elements into balanced tree and then do an in-order traversal. It’s not in-place and also has worse constant factors, with all the balancing etc. – holding an entire data structure just to sort something. Heap sort is just better than AVL sort.
- Merge sort: sort the left half of the elements recursively, sort the right half of the elements recursively, merge the two sorted halves into a sorted whole (linear-time). Runs in \(\mathcal{O}(n \log n)\)
- Stable, not in-place
- Merge step is linear time
- Recursive calls are \(\mathcal{O}(n \log n)\)
- Merge sort is asymptotically optimal for comparison sorts
- Use 3 pointers and 1 more array and copy into an array, and then copy from the array back to the original.
- Stable? Yes, prioritize left array.
- In place? No, \(O(n)\) space.
- Fast? Yes, it has worse constant factors though. Copying, merging, etc.
- Why is it \(O(n \log n)\)? The merge sort case, we have \(2T(n/2) + c_1 n + c_2\).

- Quick sort: Pick a pivot element, which is hopefully the median element. Divide the elements into 2 halves, less than pivot and greater than pivot, and recursively sort the two halves.
- The pivot you pick can make the difference between \(O(n^2)\) and \(O(n \log n)\)
- Warning, many different versions of quick sort on the internet.
- How do we pick a good pivot? This is the majority of the runtime.
- Pick lowest or highest element. This can be disasterous. Fast but likely worst-case
- Ideally we want to picka random element, but pseudorandomness is actually very expensive in terms of constant factors.
- In practice we use median of 3, choose the median of the low, middle, and high index of the array.

- Parittioning issue – given a pivot, how do you split the array into two? This is a problem – ideally we want it to be fast \(\mathcal{O}(n)\) linear time and for it to also be an in-place operation. Of course linear time runtime is more important.
- “Hoare” partitioning approach: swap pivot with
`arr[lo]`

, move it out of the way. Use two pointers`l`

and`r`

starting at`lo + 1`

and`hi - 1`

. Move`l`

and`r`

such that`arr[l]`

should be on the right of the pivot and`arr[r]`

should be on the left of the pivot. - Quicksort is not stable because of the partitioning set – doing a bunch of swaps where we don’t know where elements end up at
- In term sof asymptotics, kind of – best case and average case is \(\mathcal{O}(n \log n)\), worst case is \(\mathcal{O}(n^2)\)
- Worst constant factors but in practice it’s faster and used often, e.g. in Python

## Lecture 13: Non-comparison Sorts

- Insertion and selection sort have good constant time; heap sort, merge sort, and quick sort have good asymptotic time
- When the array is small, sort with insertion sort; when it is large, sort with quick sort
- \(\Omega (n \log n\) is the best runtime for a sorting algorithm which only interacts with tis input by comparing elements.
- Tree “proof” intuition: comparisons can be mapped to a tree structure, height and node counts contribute to \(n \log n\) factor

- Bucket Sort / Bin Sert
- Given an array of numbers spanning a small range of integers
- Find the min and max value
- Make an aux array to represent range between min and max
- Go through original array and tally each number
- Copy aux into original array
- Stable: can be made stable
- In place: no, need aux array
- Fast: yes, \(\mathcal{O}(n + k)\)

- Radix sort
- Intuition: exploit integer digits
- Sort by least significant digit, then second least significant digit, etc. Works because bucket sort is stable.
- Stable? Yes.
- In place? No.
- Fast? Yes, worst case is \(\mathcal{O}(d \cdot (n + k))\)

## Lecture 14: Introduction to Graphs

- A graph \(G\) is a pair of sets (\(V\), \(E\)) where
- \(V = \{v_1, v_2, ..., v_n\}\) is a set of vertices
- \(E = \{e_1, e_2, ..., e_m\}\) is a set of edges
- where each edge is a pair of vertices \(e_i = (v_j, v_k)\)

- Undirected graphs: edges have no specific direction, edges are “two-way”

- Degree of a vertex: the number of edges containing that vertex, i.e. the number of adjacent vertices
- Directed graphs: edges have a direction; a pair \((v, u) \in E\) means \(v\) is the source and \(u\) is the destination
- In-degree: number of in-bound edges where \(w\) is the destination
- Out-degree: number of out-bound edges

- Self-edges: we pretend they don’t exist
- Weighted graphs: each edge has a weight or a cost, typically a number

## Lecture 15: Graph Traversals

- A walk: a sequence of adjacent vertices
- Path, simiple path: a walk that doesn’t repeat a vertex
- Cycle: a walk that doesn’t repeat a vertex except the first and alst
- Length: number of edges
- Cost: sum of weights from each edge
- Undirected graph
- Connected if for all pairs of vertices there exists a path from one to the other
- Complete (fully-connected) of for all pairs of vertices there exists an edge from one to the other, including self edges

- Directed graph
- Strongly connected: path from every vertex to every other vertex
- Weakly connected if there is a path from every vertex to another ignoring direction
- Fully connected / complete if for every vertex there is a pointer from it to every other vertex

- Every tree is a graph – which is undirected, acyclic, and connected.
- Rooted trees – just pick a root. So it becomes directed.
- Directed Acyclic Graphs: DAGs. A directed graph with no cycles. Come up very often in graph prolbems. Every DAG is a directed graph but not vice versa.
- Undirected: \(0 \le \vert E \vert < \vert V \vert^2\)
- Directed: \(0 \le \vert E \vert \le \vert V \vert^2\)
- Given \(\vert V \vert\) vertices, the
- minimum number of edges is 0
- Maximum number of undirected edges is \(\frac{\vert V \vert (\vert V \vert - 1)}{2}\)
- Maximum number of directed edges is \(V^2\)

- Sparse: \(\vert E \vert \in \Theta \left( \vert V \vert \right)\), few edges
- Dense: \(\vert E \vert \in \Theta \left( \vert V \vert^2 \right)\), many edges
- Graphs, the data structure – many data structures na dtrade-offs, exploits graph properties
- Common operations: is (\(v, u\)) an edge?? What are neighbors of \(v\)?

- Adjacency matrix: 2-d array of booleans
- Better for dense graphs
- Large space requirements: \(\Theta \left( \vert V \vert^2 \right)\)
- Fast edge insert, delete, and search
- Slow vertex add and delete
- Pretty fast neighbor search

- Adjacency List
- Assign each node a number from 0 to \(\vert V \vert - 1\)
- Array of length \(\vert V \vert\), each element is a linked list of neighbors
- To decide if some edge exists, is \(\mathcal{O}(d)\), where \(d\) is the out-degree of the source vertex
- Space requirements: \(\Theta \left( \vert V \vert + \vert E \vert \right)\)
- Better for sparse graphs

- Graphs are often very sparse, so adjacency lists are often better
- Graphs: Algorithms.
- Depth-first graph search and breadth-first graph search.
- Shortest path
- Topological search

- Traversals
- In a graph, find all nodes from a node source, is there a path from the source to a specific node?
- Basic idea: keep following nodes, mark nodes after visiting them such that it processes each node once
- Processing (print it) vs visiting (just getting there)
- Visiting always happens before processing

## Lecture 16: Graph Traversals

- Depth-First Search
- Uses a stack
- Recursively explore far away from the source node first
- Add source to stack. While stock is not empty, pop from the stack, for each adjacent node, if the node is not marked, mark it as visited and push it onto the stack.

- Breadth-First Search
- Uses a queue
- Explore everything near the source first
- Add source to queue. While queue is not empty, dequeue from the queue, for each adjacent node, if the node is not marked, mark it as visited and enqueue it.

- DFS tends to use less memory compared to BFS because it only needs to store nodes along the current branch. Applications in topological sorting and cycle detection.
- BFS tends to use more memory, needs to store all nodes at the current level before moving to the next level. Applications for shortest paths
- Third option: iterative deep DFS (IDDFS), uses DFS with increasing depth limits
- Traversal, savin gthe path: store the predecessor node along the path. When you’re done searching, follow the predecessor backwards to where you started.
- Problem: BFS does not work because the shortest path may not have the fewest edges when you consider a weighted graph.
- Doesn’t work with negative weights because if there are cycles you can have negative cost cycles or positive cost cycles
- Dijkstra’s algorithm
- Start node A has cost 0 and all other nodes have cost infinity
- At each setep
- Pick closest unvisited vertex V
- Add it to a cloud of visited vertices
- Update distances for nodes with edges from V

- Terminate when all nodes are visited
- Dijkstra’s, comparing the current best, once you’ve expanded the cloud you have a lot of good information about the current best

## Lecture 17: Multirheading and Fork-Join

Recap of Dijkstra’s

- Dijkstra’s is worst case \(\mathcal{O}(\vert V \vert^2 + \vert E \vert)\)
- Unoptimized part is how to find the lowest cost?
- If we use a heap, we can speed this up because we have a priority queue. We can get the lowest cost in \(\mathcal{O}(\log \vert V \vert)\)
- Steps:
- Initialization is \(\mathcal{O}(\vert V \vert)\)
- Iteration and getting lowest cost is \(\mathcal{O}(\vert V \vert \log \vert V \vert)\)
- Updating the cloud of points and visiting neighbors is \(\mathcal{O}(\vert E \vert \log \vert V \vert)\)
Dense graph: $$\mathcal{O}(\vert V ^2 \log V )$$ - Sparse graph: \(\mathcal{O}(\vert V \vert \log \vert V \vert)\)

- Total: \(\mathcal{O}(\vert V \vert \log \vert V \vert + \vert E \vert \log \vert V \vert)\)

Dijkstra’s Algorithm Pseudocode:

```
Dijkstra's(Graph G, Node src):
for each node v: v.cost = \infty
src.cost = 0
heap = buildHeap(every node)
while (heap is not empty):
v = heap.deleteMin()
mark v as visited
for each edge (v, u) with weight w in G:
if (u is not marked):
potentialBest = v.cost + w
currBest = u.cost
if (potentialBest < currBest):
changePriority(u, potentialBest)
u.pred = v
```

Introduction to Multithreading and Fork-Join Parallelism

- Major assumption: sequential programming, one thing happened at a time
- Removing this assumption creates major challenges
- What to do with multiple processers
- We will do multiple things in one program – how do we implement a HashMap or rethink algorithmic complexity?
- Parallelism: use extra resources to solve a problem faster
- Concurrency: correctly and efficiently manage access to shared resources.
- A program is like a recipe for a cook.
- Parallelism: let’s cook faster! Concurrency: how can you access the fridge without fighting?
- Concurrency for a shared chaning hashtable: prevent bad interleavings, lock different parts of the hash table
- Shared memory with threads
- A set of threads with their own program counter and call stack
- No access to other threads’ local variables
- Threads can implicitly share static fields and objects
- To communicate, write values to a shared location that another thread can read from.
- Heap: shared memory that independent stacks with call stacks can write to

- Other models of shared memory: message-passing, dataflow, data parallelism.
- Message-passing is expensive
`java.lang.Thread`

is a class that represents a thread of execution- To start a thread:
- Define a subclass C of
`java.lang.Thread`

, overriding run - Create an object of class C
- Call that object’s start method, which starts a new thread

- Define a subclass C of
- Summing an array:
- Create 4 thred objects
- Call start on each thread object and run it in parallel
- Wait for threads to finish with
`join`

- Add together final answers

## Lecture 18: Analysis of Fork-Join Parallel Programs

- To get a new thread running:
- Define a subclass overriding
`run`

- Create an object of that subclass
- Call
`start`

on that object

- Define a subclass overriding
- Parallelism idea: split the problem size into separate pieces, do the work separately, and combine it

```
class SumThread extends java.lang.Thread {
int lo;
int hi;
int[] arr;
int ans = 0;
SumThread(int[] a, int l, int h) {
lo = l; hi = h; arr = a;
}
public void run() {
for (int i = lo; i < hi; i++) {
ans += arr[i];
}
}
}
```

```
int sum(int[] arr) {
int len = arr.length;
int ans = 0;
SumThread[] ts = new SumThread[4];
for (int i = 0; i < 4; i++) {
ts[i] = new SumThread(arr, i*len/4, (i+1)*len/4);
ts[i].start();
}
for (int i = 0; i < 4; i++) {
ts[i].join();
ans += ts[i].ans;
}
return ans
}
```

- Problems:
- Limited mobility (not generalizable)
- Load imbalance

- A better approach: cut up our work into many many more pieces, split it off into a smaller number of processors. This requires changing our algorithm and abandoning Java threads for constant factor reasons.
- Better approach: mergesort-like algorithm which recursively splits
- Better better appracoh: instead of: big thread, small left thread, and small right thread, we can have: big thread (also right), and small left thread. Saves about half the threads.

```
left.start();
right.run()
left.join();
ans = left.ans + right.ans;
// NOT
left.start();
right.start();
left.join();
right.join();
ans = left.ans + right.ans;
```

- Java’s threads are too heavyweight
- ForkJoin framework does it better.
- Subclass
`RecursiveTask<V>`

- Override
`compute`

- Call
`join`

, which returns the answer

- Subclass
- Call
`.compute()`

– gets compute directly, but`.join()`

waits. If we don’t do that, we end up just waiting. `POOl.invoke`

: starts a thread and waits for it to finish- We can do reduce operations in \(\mathcal{O}(\log n)\) time!
- Finding the max element in an array
- Finding the sum of an array
- Check if elements are in sorted order
- Check if an element satisfies a condition
- Get the max of an array

- Reduce operations:
- How to compute the answer at the cut-off?
- How to merge the results of two subarrays?

- Map operation: operates on each element of the collection independently to create a new collection of the same size, e.g. vector addition
- Extend
`RecursiveAction`

as opposed to`recursiveTask`

(for reduce operations)

- Extend

Analyzing Algorithms

- Let \(T_P\) be the running time if there are \(P\) processors available
- Work: How long it would take 1 processor to do. Just sequentialize the recursive forking. \(T_1\)
- Span: How long it would take infinity processors. \(T_\infty\)
- Hypothetical ideal for parallelization, fully parallel.
- Longest dependence chain in the computation.

Directed Acyclic Graph

- A program execution using
`fork`

and`join`

can be seen as a DAG. - Nodes are pieces of work.
- Edges: source must finish before destination starts.
- A
`fork`

ends a node and makes two outgoing edges, a new thread and continuation of the current thread - A
`join`

ends a node and makes a node with two incoming edges.

## Lecture 19: Parallel Prefix, Pack, and Sorting

- Speed-up on \(P\) procesors: \(T_1 / T_P\)
- If speedup if \(P\) as we vary \(P\), it is perfect linear speed-up, which means doubling \(P\) halves running time.

- parallelism, maximum possible speedup: \(T_1 / T_\infty\)
- Parallel algorithms decrease span without increasing work too much
- At some point, adding processes doesn’t help
- The optimal \(T_P\) is given by

- Work plus span is the total time
- Work term dominates for small \(P\), span dominates for large \(P\)
- The framework writer must assign work to available processes to avoid idling
- Some things don’t parallelize well, like reading from a linked list, printing, etc.

Amdahl’s law: the theoretical overall speedup with \(P\) processors is

\[\text{speedup} = \frac{T_1}{T_P} = \frac{1}{S + (1 - S) / P}\]- \(S\) is the fraction of the program that is serial (not parallelizable)
- \[T_1 = S + (1 - S) = 1\]

Parallel patterns

- Prefix sum problem: given an array of numbers, compute the sum of all prefixes
- Parallel-prefix algorithm does two passes, each with \(\mathcal{O}(\log n)\) span but \(\mathcal{O}(n)\) work
- We’re building a tree
- Up-path: building a binary tree, propagate the sum up. Build a binary tree where the root has the sum across a specific range

- Parallel pack, filter
- Given an array input, produce an array output containing only elements such that it passes some condition.
- Step 1: parallel map to compute a bit-vector for true elements
- Step 2: parallel-prefix sum on the bit vector to compute the output indices
- Step 3: parallel map to produce the output (map again and then go to the output
- Pack is \(\mathcal{O}(\log n)\) span and \(\mathcal{O}(n)\) work

- It is possible to do filtering in \(\log n\) time.
- Sorting: quick sort and merge sort can be parallelized very easily
- We can reduce sorting from \(\mathcal{O}(n \log n)\) to \(\mathcal{O}(n)\) with parallelization
- Do the two recursive calls in parallel.
- We can do better than \(\mathcal{O}(n)\) if we change it form in-place to using auxilliary arrays. Just do filtering
- Partition all data into elements less than pivot, and then elements bigger than the pivot
- Repeat and we’re done

- This gives us \(\mathcal{O}(\log^2 n)\) span and \(\mathcal{O}(n \log n)\) work
- Parallelizing the merge step: you are given two sorted subarrays and yo uhave three pointers. But we can use a totally different algorithm here
- Pick the median element of the larger array.
- Use binary search to find the frist element \(\ge\) that median.
- In parallel, merge half of the larger array from the median upwards with the upper part of the shorter array; merge the other half of the larger array from the median downwards with the lower part of the shorter array.

## Lecture 20: Shared-Memory Concurrency and Mutual Exclusion

- Parallel code is accessing heap memory
- Concurrency: correctly managing access to shared resourcs.
- Highly nondeterministic. There’s a potential for different results.
- Interleaving: a series of executions vs time which two threads can execute.
- Bad interleaving: bad execution sequencewhich causes unexpected behavior
- Values can become stale.
- Mutual exclusion: rewrite code so at most one thread can use a resource at a time. We need to identify the critical section.
- Locks: a single operation which checks if busy is false, sets it to true if it is, and where no other thread can interrupt us
- Operations are atomic if no other threads can interrupt or interleave with them
- Must use the same lock: mutual exclusion only works when using the same lock
- Using the same lock on every account
- When yo uthrow exceptions, you hold onto the lock! So you need to release it when you throw an exception.
`java.util.concurrent.locks.ReentrantLock`

– has`lock()`

and`unlock()`

`synchronized`

keyword: a method or block is atomic and mutually exclusive, basically a re-entrant lock. Synchronized on an expression, which must be an object, objects are locks in java.`

## Lecture 21: Race Conditions and Deadlock

- A race condition: a mistake in your program such that whether the program behaves correctly or not depedns on the order in which the threads execute
- Results when a computation result depends on scheduling (how threads are interleaved)
- Race condition: category, but not necessarily a data race.
- Bad interleaving: a bug where because it’s a race condition (the result depends on scheduling) and being manipulated, we get a different result than what we expected.
- Data races can crash the code, bad interleavings give a bad result.
- We need synchronization to disallow interleavings. We need a larger critical section, such that the intermediate state of peek needs to be protected. Use re-entrant locks which allow calls to push and pop
- Bad interleavings are defined by the spec and exposes bad intermediate states in other threads, leading to behavior we find incorrect.
- Data races are simultaneous read/write or write/write to the same location.
- For every memory location, you must obey at least:
- thread-calol
- immutable
- shared and mutable

- Lock granularity
- Coarse grained: fewer locks, more objects per lock. Simpler to implement.
- Fine grained: more locks, fewer objects per lock. More simultaneous objects.
- Coarse (correctness) over fine (efficiency)

- You want your critical section to be as big as possible, and then go smaller.
- Use thread-safe library as much as you can

## Lecture 22: Topological Sort

- Given a DAG, output all the vertices in an order such that no vertex appears before any other vertex that has an edge to it
- Topological Sort: intuition is to pick a vertex with zero in-degree
- Find the in-degrees of all the vertices. The time-complexity of finding the in-degree of a vertex in an adjacency list is \(\mathcal{O}(V + E)\). Finding the indegree for all vertices is \(\mathcal{O}(V + E)\) too.
- Choose an arbitrary vertex of in-degree zero, output it, and remove it in concept from the graph. So all corresponding nodes lose one in-degree.
- Repeat with all other zero in-degree nodes

- You need a vertex with in-degree 0 to start, so no cycles are possible.
- Ties between in-degree zero nodes can be arbitrarily broken
- Any flat graph (e.g. linked list) only have one topological ordering
- Topological sort runtime: \(\mathcal{O}(E + V^2) \equiv \mathcal{O}(V^2)\)
- Doing better: avoid searching for a zero-degree node every time, keep pending zero-degree nodes in a stack or something, so add/remove at \(\mathcal{O}(1)\)
- Optimized topological sort: \(\mathcal{O}(V + E)\)
- Topological sort is used for dependency graphs and order of execution

## Lecture 23: Minimum Spanning Trees

- Given an undirected graph, find a graph such that each edge is in the original graph, there are no cycles, the graph is connected, and the sum of the weights is minimized
- Prim’s algorithm: picking the vertex with the lowest cost and expand set to get to the MST. No reconsidering choices.
- Both work with negative edges and works with negative cost cycles.
- Edge-based greedy algorithm, buidls MST amy greedily adding edges
- Initialize with empty mst; pick the lowest cose edge and mark it
- Disjoint set ADT, Union and Find across elements.
Union can be done in constant time. Find is amortized constant time. OWrst case $$ mathcal{O} \log n$$ - Runtime is \(\mathcal{O}(E \log E) = \mathcal{O}(E \log V)\)
- Both Kruskal’s and Prim’s have the same runtime

## Lecture 24: P vs NP

- Decision problem: a problem that takes in some input and the output is either true or false.
- Why talk about decision problems?
- Easier to analyze than the set of all problems.
- Most decision problems can be reduced to a (series of) decision problem(s).

- P: polynomial. A problem is in P iff
- it’s a decision problem
- there is a polynomial time algorithm that solves it

- Euler circuit problem: given an undirected graph, return true iff there is a cycle in G that visits everys ingle edge exactly one.
- Euler showed that a graph has an euler circuit iff the graph is connected and the degree of every vertex is even.

- NP: Nondeterministic Polynomial. A problem is in NP iff
- it’s a decision problem
- For every input where the output is true (i.e. solutions), here exists some certificate that we can use to verify that the output is true in polynomial time.
- Rephrased, NP is interested in quickly verifying the truth of a solution.

- Euler Circuit is also in NP because the certificate for every solution input
- It turns out that every problem in P is also in NP; that is, P is a subset of NP.
- Certificate: the input itself.
- Verification: there is a polynomial time algorithm which can solve it. And this can be used in the verification process itself. Run the algorithm on the input.

- The big question is: is P = NP? Most people think that P \(\neq\) NP.
- Hamiltonian Circuit Problem. Input: a graph G. Output: True if there exists a cycle in G that visits every vertex exactly once.
- Much harder than the Euler circuit problem
- Is NP-complete.

- NP-Complete: the hardest problems in NP.
- What deos it mean for a problem to be hard?
- Reduction. A reduces to B if we can solve A using a solution to problem B.
- A reduces to B in polynomial time if we can solve A using a polynomial number of calls to B. What numbers is we call it a polynomial number of times.
- If \(B \in P\) and \(A \le_P B\) (reduces to B in Polynomial time), then \(A \in P\)

- NP complete: A problem is NP-complete iff
- It’s in NP
- Every problem in NP is polynomial-time reducible to it

- How to show P = NP.
- Take an NP-complete problem
- Find a polynomial time algorithm for it

- There are thousands of NP-complete problems… and we can’t find a polynomial time algorithm for any of them
- NP-hard problems are a superset of NP-complete problems, do not need to be in NP.
- NP-hard problem: \(n\) by \(n\) chessboard and determine if it’s the best possible move.
- Not necessarily verifiable in polynomial time. But every problem in NP can be reduced to that problem.

- Confirm that a problem is hard:
- Take an NP-complete problem A
- Take your problem B
- Show that A is reducible in polynomial time to B
- Solving this problem is equivlaent to solving P = NP.

- Workarounds
- Consider using an approximation algorithm
- Consdier using a randomized algorithm

## Lecture 25: More NP vs P

- NP-hard is the same as NP-complete, but there’s no requirement that the problem itself is NP.
- A problem is NP-hard iff it’s a decision problem and every problem in NP is polynomial-time reducible to it.
- A problem is NP-complete iff it is NP and NP-hard.
- Three-colorable problem
- Input is an undirected graph
- Output is true iff you can color each vertex of the graph one of three colors such that no two adjacent vertices have the same color.
- This problem is NP-complete. It’s in NP, because I can verify the three-coloring scheme is vlaid in polynomial time.
- It’s also NP-hard

- Two-colorable problem
- Solution: in general, all vertices at an even distance from the start must be colored differently from the vertices at an odd distance.
- Polynomial time solution. In \(P\).
- Two-colorable \(\le_P\) three-colorable.

- Two-colorable reducible to three-colorable in polynomial time: add a dummy vertex and add edge from the dummy to all other vertices. Return true iff 3 colorable outputs true.
- SAT problem. 3-SAT. NP-complete.
- literal: a boolean variable or its negation
- clause: a series of literals which are all OR’d
- Input: a series of clauses AND’d together , each clause has at most three literals.
- Output: True iff there are boolean variables such that the expression evaluates to true.
- The general problem of satisfiability (any number of literals and clauses) is reducible to 3-SAT.

- Vertex Cover problem.
- Input: a graph G, integer \(k\).
- Output: true iff there is a set of vertices of size \(k\) so that every edge has at least one vertex in the set.
- If a graph is 2-colorable, there is a known polynomial time algorithm, max-flow / min-cut
- If the graph is a tree, there is a known linear time solution
- In a general graph, there is an approximation algorithm for finding the minimum vertex cover, which gives you a vertex cover that is at most 2 times bigger than the optimal one.
- While there are edges in your graph, pick an arbitrary one. You have two nodes, one of them has to be in there. Put both of them in the vertex color. Then delete the edge and any incident edges.