CSE 351

• A numeral is a number represented using a series of digits in a particular base.
• Convert to decimal using $$d \times b^i$$ for all indices $$i$$, base $$b$$, and digit $$d$$ at index $$i$$.
• Convert from decimal into another base by recursive division.
• Common bases - binary (base-2), decimal (base-10), hex (base-16).
• You can directly convert from binary to hexadecimal and vice versa.

## 2. Memory and Data I

• CPU - executes computer instructions.
• Word size - fixed width of data used by the CPU to execute instructions.
• Memory (RAM) - stores data for the CPU while instructionsa re being executed. Memory is a large array of bytes.
• Data is moved and manipulated in fixed-length chunks. Leading zeros are therefore significant.
• Leftmost bit - most-significant bit (MSB)
• Rightmost bit - least-significant bit (LSB)
• Multibyte data in memory covers several addresses; the address of the first byte (smallest) is used by convention.
• Any chunk of data can be specified by its address and its size.
• Endianness - choice of ordering.
• Big-endian - LS byte stored in the highest address
• Little-endian - LS byte stored in the lowest address
• Pointers are variables that store addresses; the size of the pointer is the word size.
• The pointer must also encode the size information.

## 3. Memory & Data II

• Pointers are special variables of word size which store addresses.
type* ptr;

• The address-of operator & goes in front of a variable, e.g. int q; int*p = &q;.
• Dereference operator * used to access the data pointed to by the pointer.
• Null - symbolic constant used for pointers.
• Pointers can be represented with bow-and-arrow memory diagrams. Every box has a value, address, and variable name.
• Pointer arithmetic - automatically scale the arithmetic operation across the minimum data size (4 bits for ints, 8 bits for longs).
• Arrays: sets of contiguous locations in memory.array[n] referenced through *(array + n). There is no bounds checking in C.
• C does not support an explicit string dataset. In C, strings are arrays of characters terminated by the null character.
• 1 byte per character.

## 4. Memory & Data III

• Bitwise operators apply boolean operations across bits of the operand(s) and can only be used on primitive/integral data types.
• Logical operators (&&, ||, !) apply boolean operations to the value of the operands; 0 is False and anything else is true.
• C uses short-circuit evaluation
• Unsigned integers - nonnegative integers.
 unsigned char 1 byte unsigned short 2 bytes unsigned int 4 bytes unsigned long 8 bytes
• Signed integers represent positive and negative integers. The most common encoding is two’s complement: the most significant bit is kept as negative; all other bits are unsigned. Can represent from $$-2^{n-1}$$ to $$2^{n-1} - 1$$.
• To negate, flip all the bits and add one.

## 5. Integers II

• Signed and unsigned integers are different interpretations of data; nothing about the data itself is changed.
• Literals and constants can be changed into unsigned by appending u to the end.

• The data type of a variable determines the data behavior.
• Type casting - conversion of data from one data type into another. Implicit cast - done automatically by the compiler to avoid problems.
• An explicit cast can be done with (data_type) expression.
• Casting does not change the data, it changes the representation.
• Zero extension - pads unsigned data with more zeros; sign extension - pads signed data with copies of the most significant bit
• When mixing signed and unsigned values, implicit casting to unsigned is done.
• Arithmetic on fixed-width binary numbers is modular; all bits after the end are dropped.
$x-y = x + (-y) = x + (\sim y + 1)$
• Arithmetic overflow - a calculation cannot be represented by the encoding scheme.
• Unsigned overview vs signed overflow
• Shift operators - shift a bit vector by a specified number of digits, which falls the cut-off bits to ‘fall off’ and be lost; the remaining bits are filled by $$\forall$$ 1s or 0s.
• left shift - x << n, zero padding
• logical right shift - x >> n, x unsigned, zero padding
• arithmetic right shift - x >> n, x signed, MSB padding
• Shifting can be interpreted as multiplying or dividing by powers of 2.
• Right shifting always rounds down (floor operation).

## 6. Floating Point I

• Scientific notation: we can express small and large numbers without writing large digits.
• Three different parts - sign, mantissa, exponent.
• Float: 1-bit sign, 8-bit exponent, 23-bit mantissa.
• Double: 1-bit sign, 11-bit exponent, 52-bit mantissa.
• Sign - 0 is positive, 1 is negative
• Exponent - biased notation, shifted by $$2^{w-1}-1$$ where $$w$$ is the width of the exponent field (0b011...11); convert to unsigned; this lets us represent the same number of postiive and negative exponents.
• To encode, add the bias
• To decode, subtract the bias
• Mantissa: must be of form 1.bbbb$$_2$$. Implicit leading one, but this is not stored.

## 7. Floating Point II

• Floating point allows for special cases, given as combinations to the exponent and mantissa fields.
• 0 exponent: denormalized number; all-1s exponent and all-0s mantissa: $$+/- \infty$$, all 1s-exponent and nonzero mantissa: NaN
• Denormalized number: uses an implicit leading 0. Allows for encoding smaller numbers near 0.
• If a number is too large, it results in overfow; the result is stored as infinity. If it is too small, it results in underflow.
• We have limited precision; we cannot represent numbers between representable numbers.
• The largest normalized exponent is 127.
• Arithmetic operaitons on infinity and NaN will work but result in bizarre ways.
• Rounding breaks down certain mathematical properties of floating point arithmetic.
• Casting an integral data tpe into a floating point data type changes the bit representation.

## 8. x86-64 Programming I

• Instruction Set Architecture - ISA. Parts of the processor design needed to write assembly code.
• ISA - state, instruction set, effect.
• Complex Instruction Set Computer vs Reduced Instruction Set Computer.
• x86-64 is far on the end of CISC; we will look at integral data and use AT&T syntax.
• Instructions are given with an instruction name followed by operands.
• Assembly instructions: data transfer, arithmetic and logic, control flow
• Size specifier - byte, word, long, quad (1, 2, 4, 8 bytes)
• Operands are three types - immediates (\$), registers (%), memory ().
• An immediate cannot be used as a destination operand
• You cannot use a memory to memory operation
• Register - location in CPU to store a small amount of data which can be accessed quickly. Registers are referred to by name. 16 registers; 8-byte register names.
• How to express a memory operand? 4 parts: D(Rb, Ri, S). Displacement - immediate or constant; base register - name of register whose value will be the base; index register - value will be scaled and added to the base; scale factor - scales the index register by a specified number - 1, 2, 4, 8.

## 9. x86-64 Programming II

• Load effective address - lea. Source operand must be a memory operand; destination operand must be a register operand. Allows us to manipulate addresses rather than dereferencing memory.
lea D(Rb,Ri,S), R  # stores Reg[Rb]+Reg[Ri]*S+D in Reg[R]

• Condition codes - status bits of the CPU which give information about the history of instruction executions. Flags are set implicitly and automatically by operations, but can also be done so explicitly - cmp and test. These only update condition codes and are never stored.
• jump and set - implement all control flow.
• Flags: Carry Flag (CF), Zero Flag (ZF), Sign Flag (SF), and Overflow Flag (OF).
• jump - jump program to a specified target if a condition is met.
• set - set the value of register dst (1-byte) to the value of the condition.

## 10. x86-64 Programming III

• Extension instructions - like mov, but the source operand is smaller than the destination operand. Therefore, values can be extended with zero extension (movz) or sign extension (movs).
• Conditionals in assembly are made of a condition code instructor and a conditional jump instruction.
• Operand to jump instruction - label. Symbolic representation of an instruction’s address.
• Loops can be freely introduced into assembly; if the name of the label is used as a target, the program will jump to the instruction.
• If/Else can be constructed by using labels and jump statements.
• Jump targets go to the beginning of the loop body to construct loops.

## 11. The Stack and Procedures

• Jump table - data structure used to branch to different parts of the program. Array of pointers - pointers to code blocks.
• Program counter %rip - holds the address of the next instruction.
• We need to update the program counter in addresses stored in the table with indirect jumps - jmp *Loc.
• The memory address space is arranged to optimize data usage. From low to high addresses:
• Instructions
• Literals
• Static data
• Dynamic data (heap)
• Stack
• Stack - takes up the highest useable addresses.
• End/top of stack is stored in the stack pointer %rsp.
• The stack pointer can be changed through subq and addq.
• push and pop will allocate and deallocate data from the stack.
• Calling conventions - rules to guarantee procedures and pass data and control.
• Return address - address of the caller’s next instruction to execute.
• call is a label instruction, will push the return address onto the stack and update the program counter to the address of the label.
• ret will pop the return address of the stack and update the program counter to that address.
• Return values are placed in %rax.
• We want to support instantiation of individual procedures to enable recursion.
• Stack frames - hold the local state of each procedure instantiation.
• LIFO stack - the caller’s stack frame cannot be deallocated until the calee’s stack frame is.

## 12. Procedures and Recursion

• Stack frame - has a return address marking the beginning of the stack frame.
• Arguments 7+ are part of the caller’s argument, put on the stack before call.
• x86-64 Register Saving Conventions - describe how we deal with register reuse.
• Registers are callee-saved or caller-saved.
• Callee-saved: callee’s responsibility to restore the old value before returning.
• Caller-saved: the caller must save the existing state before it passes control to the callee. The old value is restored after the callee returns.
• Recursion works without additional work due to the stack frame layout.

## 13. Executables, Arrays

• Compiler: translate a text file (bytes interpreted as characters) into an assembly text file.
• Assembler - convert assembly code into a binary object file.
• Object code: ‘incomplete’, we do not have addresses for labels yet.
• Linker - put together the object and static library files.
• Loader - take an executable file and start a running process from it.
• Arrays in assembly: the name is a placeholder for the starting address of the array.
• Array subscript notation is sugar for address dereferencing.
• C uses row-major ordering for multidimensional arrays.
• Multilevel arrays are created by adding additional levels of arrays of pointers to arrays.

## 14. Structs and Alignment

• A struct in C is a structured group of variables with various fields.
• Purpose of a struct definition is to define the size and layout of a struct.
• Fields are accessed using . or -> for pointers.
• typedef - allows you to create aliases to other data types, e.g. typedef unsigned int uint;.
• A primitive object is aligned if its address is a multiple of its size.
• Unused space between fields - internal fragmentation.
• External fragmentation: the overall size of a struct also must follow alignment requirements.

## 15. Buffer Overflow

• Buffer: a region of the memory (usually array) used for temporary data storage.
• Buffer overflow: writing data past the end of the buffer.
• Sometimes buffer overflow causes program execution to break or goes unnoticed.
• Stack smashing - write past the end of a local array in the stack. The buffer moves towards higher addresses. Stack smashing overwrites the return address in previous stack frames.
• If a user enters a string longer than can be held in the buffer, we will get buffer overflow.
• Code injection: can be used to take advantage of the fact that all data is stored in memory. Code injections: use buffer overflow to overwrite instructions and change the returna ddress to execdute the injected code.

## 16. Memory and Caches I

• IEC prefixes - refer to powers of 1024.
• Caches - memory with short access time, used to store frequently or recently used data, including instructions.
• Data is transferred in blocks - machine-specific fixed units of data transfer.
• The CPU will always check the caches first when accessing memroy.
• Cache hit - data is found in the cache. Cache miss - not found, fetch from real memory adn copy into cache.
• Principle of locality: programs tend to use data at addresses near those which have been used recently.
• Temporal locality - recently referenced items likely to be referenced again
• Spatial locality - items in nearby addresses likely to be refereced again
• Copy data into the cache on a cache miss to maximize cache hits
• Metrics: hit time (how long a cache hit takes), miss penalty (how long it takes to find a blcok fo data from memory, hit rate (fraction of memory accesses resulting in cache hits).
• Average memory access time (AMAT): Hit time $$+$$ Miss Rate $$\times$$ Miss Penalty
• Timing reported in clock cycles

## 17. Memory and Caches II

• The memory hierarchy: local data storage and remote/external data storage.
• Each level is a cache of the level below it - faster way to access a subset of available data.
• Blocks are formed from 0 to $$K - 1$$, $$K$$ to $$2K - 1$$, and so on.
• We can determine which block an address is in by taking the floor of $$\frac{A}{K}$$: this is the block number.
• $$A % K$$ is the block offset.
• Cache size - measures capacity of the cache. Defined as a multiple of block size.
• Where to place blocks on a cache miss?
• Direct-mapped cache placement - use a hash function, block number % (C/K) to determine where to put a block.

## 18. Memory and Caches III

• $$m$$-bit address: TIO. $$t$$ bits for the tag, $$s$$ bits for the index selection, $$k$$ bits for the byte offset selection.
• Widths: $$k = \log_2(K)$$, $$s = \log_2(C / K)$$, $$t = m - s - k$$.
• A direct-mapped cache always maps blocks to a single index. What if we alternate between different blocks which map to the same index? We lose temporal locality’s benefits.
• Associative cache - each block fits in a set of locations.
• $$E$$-way set associative cache: each block can be put in a set $$E$$ ways.
• Direct-mapped cache: 1-way
• Fully-associative cache: $$C/K$$ way
• Caches - there will always be data present. Is it program data or mystery data? Valid bit lets us tell the difference. Management bit - stored in the cache for each block.
• Cache line - block and management bits.
• Cold cache: no valid program data.
• Cache: number of sets $$S$$, associativity $$E$$, block size $$K$$. Cache size is the product of these three.
• Cache miss categorization:
• Compulsory misses: we have to have a cache miss if we access something for the first time.
• Conflict misses: more references map to the same set than we can allow to coexist.
• Capacity misses: we don’t have enough space to keep all the data.
• Larger block size $$\to$$ fewer compulsory misses
• Higher associativity $$\to$$ fewer conflict misses
• Larger cache size $$\to$$ fewer capacity misses

## 19. Cache Writes and Data Consistency

• Writes change data, not just reading it.
• Write-hit and write-miss
• Write-hit:
• Write-through cache writes the change into the block in the cache and in the level below
• Write-back cache writes the change only to the block in the cache but makes a note that it changed; dirty bit must be stored
• Write miss:
• Write allocate cache loads the block into cache before executing a write-hit
• No-write allocate cache skips the cache and sends the write to the level below
• Write-back, write-allocate: avoid going to the lower level as much as possible
• How to optimize code based on memory hierarchy?
• Spatial and temporal locality
• Cache blocking - block operations on a large data structure so chunks fit into the cache.

## 20. Processes I

• Control flow - your computer is executing multiple programs concurrently.
• Exceptional control flow - trasnfer control and react to external signals.
• Exception - transfer of control to the kernel in response to an event. An event handler will deal with the event.
• Three possible outcomes:
• Re-execute the instruction
• Execute the next instruction
• Abort the process
• Asynchronous Exceptions - interrupts, external to the processor
• Synchronous exceptions - traps, faults, aborts
• Processes - instance of a running program/executable.
• Logical control flow - each process ‘thinks’ it has exclusive use of the CPU
• Private address space - each process ‘thinks’ it has exclusive memory usage
• Program vs process
• Multiple processes run on the computer with few CPUs.
• Processes run concurrently if their instructions overlap in time.
• Context switching - pause the currently executing process, restore the process to another state
• Fork-exec model. Fork is a system call which duplicates a process; exec overlays the current data with a new instance. Starting a new process in linux is forking, then calling exec. Processes are assigned PIDs to keep track.

## 21. Processes II, Virtual Memory I

• exec*() - overlays current process data with new instances of the given program
• Stack, Heap, Data, Code, register values
• Where does the starting state of a new program come from? Data, code, heap, stack, registers
• A process ends/terminates by executing the return statement from main, calling the library function exit, or by aborting an exception handler.
• Reaped - the system resources are removed
• Process terminated but not reaped - zombie process
• Parent process - responsible for reaping a child. Implicity (parent terminates) or explicitly (parent invokes wait(pid)() system call
• An orphaned child is passed to a background process called init with a PID of 1.
• Virtual memory - process abstraction of a private address space. Hides the actual amount of RAM installed on the machine.
• Virtual vs physical address space.
• Swap space: a part of the disk to temporarily hold additional memory.

## 22. Virtual Memory II

• Virtual memory must be able to make efficient use of limited physical memory.
• Page: $$P = 2^p$$, page size denoted in bytes.
• Physical page numbers and virtual page numbers.
• Most processes do not need to use most of the virtual address space. Unused virtual pages never have space allocated.
• Physical memory as cache
• Page tables - maps VPN to PPN
• Page table entry - valid bit, dirty bit, access rights bits
• Memory protection - protect and share memory between processes when desired. Page tables (managed by OS) do this for us.
• Access rights - mimic Linux file access perms

## 23. Virtual Memory III

• Page hit - if the requested page does live in physical memory.
• Page fault - the requested page is not in physical mmeory (in swap space on the disk)
• Translation lookaside buffer (TLB) - hardware cache, reduce the number of memory accesses needed during address translation
• TLB hit - page entry table found, MMU can do address translation without ever accessing the page table.
• Virtual address: first, TLB lookup
• TLB hit: protection check
• Check cache: miss or hit
• TLB miss: check page table
• Page in memory: update TLB and do protection check
• Page not in memory: find in the swap space and do protection check

## 24. Memory Allocation I

• Dynamically allocated data - memory whose size can only be determined at runtime. vs. automatically allocated data (local variables tied to stack frames), statically allocated data (global variables).
• The heap is managed by a dynamic memory allocator.
• Heap is organized into heap blocks, which are either allocated or freed.
• Implicit allocator: programmer only responsible for allocations; explicit: also handles de-allocations.
• Dynamic memory allocators – want to maximize throughput and memory utilization, but this often comes at the cost of each other
• How to track free blocks? Implicit free list - uses arithmetic by using the size of each block. Explicit: travels free blocks using a linked list.
• void* malloc(size_t size) - will initialize at least size bytes of uninitalized memory
int* ptr = (int*) malloc(n * sizeof(int));

• To free memory, use void free(void* p). Must hold the same address as was returned by a malloc`.
• Heap fragmentation – internal, wasted space inside of heap blocks; external, wasted space between heap blocks
• Payload – the requested space.
• Additional space at the end doesn’t count as external fragmentation because you can add more to the heap.

## 25. Memory Allocation II

• First fit - search from the beginning and return the first large enough free block.
• Next fit - search starting from the last search and return the first free block (wrapping around the end)
• Best fit: search through the entire list and return the best free block.
• Steps to fulfill an allocation request:
1. Compute necessary block size
2. Search for a free block using the allocation strategy.
3. Compare the block size against the size of the block. If not equal, split off the excess and convert into a new free blcok.
4. Allocate the block and return the beginning of the payload.
• The allocator follows alignment procedures.
• Minimum block size
• You can free a block by flipping its allocatted flag.
• Coalescing: combining neighboring free blocks into a larger free block.
• Footer: a copy of the header but at the end of the block so it can be read by the next block.
• Boundary tags – header and footer.
• Explicit free list - uses a doubly linked data structure.

## 26. Memory Allocation III

• Garbage collection - automatic memory management employed by implicit dynamic memory allocators.
• When to free a piece of dynmaically allocated memory?
• Memory is a directed graph: each heap block is a node on the graph, each pointer is an edge, locations not in the heap are root nodes.
• Any unreachable heap node is not in use by a process and is garbage.
• Mark and sweep. Mark bit used to track freeing.
• How to deal with memory in C?