# CSCE 230: Computer Organization

## Lecture 1: Binary Numbers

• a number with only two possible symbols, 0 or 1, a bit
• we study binary numbers because computers use binary numbers
• computers use binary numbers because digital circutits only support two states, on and off
• we frequently speak of n-bit binary numbers, a word
• a byte is a part of a word, frequently 8-bits, but historically has been 6
• two types of binary numbers,
• unsigned
• positive numbers or 0
• a 4 bit unsigned binary number has up to 16 possibilities
• $$b_3b_2b_1b_0$$
• $$b_0$$ is the lsb, $$b_3$$ is the msb
• in decimal, it is $$b_0 2^0 + b_1 2^1 + b_2 2^2 + b_3 2^3$$
• an $$n$$ bit unsigned binary number has up to $$2^n - 1$$ possibilities, $$[0, 2^n - 1]$$
• Decimal conversion, use divide by two and remainders, first remainder is lsb
• Addition of two binary numbers:
• When sum is within the range $$[0, 2^n - 1]$$
• $$1 + 1 = 10$$ where $$0$$ is the one-bit sum and $$1$$ is the carry-out.
• When sum is out of the range $$[0, 2^n - 1]$$ (overflow)
• When possible, just add a bit.
• You loose information.
• To detect overflow, if the msb has a carry-out, there is overflow.
• Subtraction of two numbers:
• $$a - b = a + (- b)$$, becomes the addition of two signed binary numbers
• signed
• msb is the sign, 1 is negative, 0 is positive
• In all, the unsigned representation of the positive numbers is the same
• In signed magnitude and 1’s complement, there is negative zero
• signed magnitude
• change the sign, that’s it.
• $$[-(2^{n - 1} - 1), 2^{n - 1} - 1]$$
• 1’s complement
• Complement each bit of the unsigned postive number (flip each bit)
• $$[-(2^{n - 1} - 1), 2^{n - 1} - 1]$$
• 2’s complement
• Take the ones complement and add 1
• this has slightly greater range
• $$[-(2^{n - 1}), 2^{n - 1} - 1]$$
• Within range:
• Start at the LSB, summing as before
• Overflows:
• overflow if MSB carry-over and MSB-1 carry-over are not the same
• overflow if both have same sign, and the sign is different in the final result

## Lecture 2: More Binary Numbers

• Subtraction of signed binary numbers (2’s complement)
• $$a - b = a + (- b)$$
• first negate b by flipping all bits, and adding 1
• $$a - b = a + b^{\prime} + 1$$
• add as normal, be careful of overflow
• subtraction of unsigned binary numbers:
• see above.
• if the final overflow bit is different from the sign of both numbers, then there is overflow.
• sign extension: Add the correct number of “1”s before the MSB or the correct number of “0”s to the same position, depending on the sign bit.
• to extend an unsigned number, just and the correct number of “0”s before the MSB
• Hex!
• 0-15, as 0-9, A-F
• 1 hex digit is 1 nybble (4 bits)
• $$1111110 = FE$$

## Recitation 1: Welcome

• format
• announcements
• new material review
• common mistakes
• intro to lab
• read and start lab prior to coming to recitation!

### Binary Numbers

• 0b for binary numbers, nothing special for decimal, and 0x for Hex.
• Place-value review.
• binary is on $$2^n$$
• remember that index is alwasy 0-based

### Lab

• THE HORROR THAT IS ASSEMBLY

## Lecture 3: Assembly Language

• assembly is very low level
• made for a specific processor
• high level languags include java, c, c++, python and lisps
• can be executed on a variety of processors
• In 230, we do assembly for NIOS2, and for the expiremental, we also deal with x86
• there is an almost 1-to-1 relationship between an assembly language and an isa, conversion is supported by an assembler
• processor is connected to memory
• processor works on n-bits
• memory is a sequnce of bits, organized as words.
• n-bits = 32 bits
• 4 bytes per word.
• bytes: 0, 1, 2, 3, 4, …
• words: 0, 4, 8, 12, 16, …
• little endian versus big endian
• little-endian
• first word is 3, 2, 1, 0
• addresses start at the least significant bit
• big-endian
• first word is 0, 1, 2, 3
• addresses start from the most significant bit
• most archs use little endian, including NIOS and x86
• TCP/IP and many older computers use big-endian
• hton
• ntoh
• Word alignment:
• a word address has an aligned address if the address is a multiple of the number of bytes in a word
• for n=32: 0, 4, 8, 12, 16,…
• unaligned for n=32: 1, 2, 3, 5, …
• some archs don’t support unaligned word addresses
• Assembly Basics:
• java/c:
• int declarations

int r2 = 1;
int r3 = r2;
int r4 = r2 + r3;

• Assembly:

        .text
.global _start
_start:
movi r2, 1              ; Sets Register 2 to 1
movi r3, r2             ; Set Register 3 to the contents of register 2
add r4, r2, r3          ; Set Register 4 to r2 + r3
.data
.end
;; Registers 2-23 are general purpose
;; There are 32 registers, those that are not general purpose are special purpose
;; There are 6 control registers, used for I/O
;; PC is also a register, the program counter.  This points to the address of the next instruction to be executed

• A register stores one word of data
• register namse are case sensitive in NIOS2 assembly, all others are case-insensitive
• assembly follows the mnemonic operand* pattern
• two types of instructions, regular instructions (directly implemented in hardware) and psuedo instructions (implemented by regular insturctions (used to be called macro instructions))
• two types of instruction sets:
• Reduced – RISC
• small number of simple instructions
• instructions can fit in a single word
• ARM
• OpenRISC
• NIOS2
• POWER
• Complex – CISC
• large number of both simple and complex instructions
• instructions fit in one or more words
• x86(64)
• VAX
• NIOS2 Instructions
• arithmetic
• sub
• subi
• mul
• muli
• div
• divu
• divi
• diviu
• logical
• data copy
• control transfer
• unconditional branching
• jmp
• br
• label must be ±32 kb from current instruction
• conditional branching
• blt, bltu
• bgt, bgtu
• ble, bleu
• bge, bgeu
• beq
• bne
• misc
• shift
• rotate
• Remember to INF loop at the end of a program if you are unable to send a software signal to shut off
• ex:

_start:
movi r2, 1
movi r3, r2
movi r4, 0
bgtu r2, r3, _true
_false: movi r4, r3n
br _cont
_true:  movi r4, r2
_cont:  nop
br _cont

• ex:

_start: movi r2, 1
movi r3, 10
movi r4, 0
bleu r2, r3, _loop
_cont:  nop
br _cont


## Lecture 4: Assembly Continued

• types
• arithmetic
• branching
• logical
• and
• and/r,r,r
• andi/r,r,v
• andhi/r,r,v
• or
• or
• ori
• orhi
• xor
• xor
• xori
• xorhi
• data
• move
• mov
• movi
• movui
• movio – move label address to register
• assembler directives
• .egu
• .data
• .byte
• .hword
• .word
• .ascii
• .stop
• displacement addresses: V(rj), number(register) = v + rj
• number is 16 bit signed
• absolute mode v(r0)
• register indirect 0(rj)
• instructions
• ex:

X:      .byte 1

• can be addressed using absolute mode X(r0)
• address to register, and then register indirect
• ex:

ARRAY:  .byte 1,2,3,4

• movi r2, 0; ARRAY(r2)
• movi r2, 1; ARRAY(r2)
• movia r2, ARRAY; 0(r2)
• movia r2, ARRAY; 1(r2)
• ldw
• ldh
• ldb
• ldhu
• ldbu
• stw
• sth
• stb
• subroutines

## Lecture 5: More Assembly

• Exam on February 16, clases split afterward
• displacement mode – ldw r5, num(r6)
• register indirect mode – ldw r5, (r6)
• absolute mode – ldw r5, num(r0)
• ex:

#define N 10

int num[N] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

int i = 0;

.egu N 10
start:
movi r2, 0
loop:   ldb r3, NUM(r2)
ldw r4, SUM(r0)
stw r4, SUM(r0)
blt r2, N, loop
end:    br end
.data
NUM:    .byte 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
.skip 2
SUM:    .word 0

.egu N 40
start:
movi r2, 0
loop:   ldw r3, NUM(r2)
ldw r4, SUM(r0)
stw r4, SUM(r0)
blt r2, N, loop
end:    br end
.data
NUM:    .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
SUM:    .word 0

• Actual Subroutines
• instructions
• callr
• call
• ret
• On call, next address is put in the ra (return address) register, and then the subroutine is jumped to.
• At return, the return-address is jumped back to.
• ex (factorial):

start:  movi r2, 6
call factorial
end:    br end
factorial:
movi r3, 1
movi r4, 1
loop:   mul r3, r3, r4
ble r4, r4, loop
ret

factorial2:
movi r3, 1
mov r4, r2
loop2:  mul r3, r3, r4
subi r4, r4, 1
bne r4, r0, loop2
ret

• .stack, a lifo
• r27 is the stack pointer, sp
• subi, sp, sp, 4
• stw ri,(sp)
• remove top element
• ldw ri,(sp)
• to save registers in a subroutine
• pass parameters to a subroutine
• ex:

        movi r5,
subi sp, sp, 4
stw r5, (sp)
call factorial
ldw r5, (sp)
end:    br end
factorial:
ldw r3, 4(sp)
movi r4, 1
loop:   mul r4, r4, r3
subi r3, r3, 1
bne r3, 0, loop
subi sp, sp, 4
stw r4, (sp)
ret


## Lecture 6: Hardware, and the background of logic circuits

• a logic circuit implements a function of logic signals (logical variables) that take two possible values that take two possible values 0 and 1 corresponding to different voltage levels.
• or looks like an arrowhead with a curved back, $$f = x_1 + x_2$$
• and looks like a box with a leading half-circle, $$f = x_1 \cdot x_2$$
• not takes one, wedge with leading circle, $$f = \overline{x}$$
• synthesis of logic functions:
• truth table
• sum-of-products form
• minimize
• Ex: Bit multiplexer
• takes S, A, B, single output
• S is control Signal
• data are inputs a and b, each one bit
• if S = 0, a, else, B
• truth table

s a b out (f(s, a, b)) term
0 0 0 0
0 0 1 0
0 1 0 1 $$\overline{s} \cdot a \cdot \overline{b}$$
0 1 1 1 $$\overline{s} \cdot a \cdot b$$
1 0 0 0
1 0 1 1 $$s \cdot \overline{a} \cdot b$$
1 1 0 0
1 1 1 1 $$s \cdot a \cdot b$$
• Product Form terms
• $$f(s, a, b) = \overline{s} \cdot a \cdot \overline{b} + \overline{s} \cdot a \cdot b + s \cdot \overline{a} \cdot b + s \cdot a \cdot b$$
• not and or
• $$f = (\overline{S}a) + (Sb)$$
• minimize, using logic rules:
• Commutative: $$w + y = y + w$$, $$wy = yw$$
• Distributive: $$wy + wz = w(y + z)$$, $$w + yz = (w + y)(w + z)$$
• Associative: $$(w + y) + z = w + (y + z)$$, $$(wy)z = w(yz)$$
• Idempotent: $$w + w = w$$, $$ww = w$$
• Involution: $$\overline{\overline{w}} = w$$
• Complement: $$w + \overline{w} = 1$$, $$w\overline{w} = 0$$
• DeMorgan’s: $$\overline{w + y} = \overline{w}\cdot\overline{y}$$, $$\overline{wy} = \overline{w} + \overline{y}$$
• Ex:
• Truth table:

x1 x2 x3 f term
0 0 0 1 $$\overline{x_1}\overline{x_2}\overline{x_3}$$
0 0 1 1 $$\overline{x_1}\overline{x_2}x_3$$
0 1 0 1 $$\overline{x_1}x_2\overline{x_3}$$
0 1 1 0
1 0 0 1 $$x_1\overline{x_2}\overline{x_3}$$
1 0 1 1 $$x_1\overline{x_2}x_3$$
1 1 0 0
1 1 1 0
• Product of Terms:
• $$f = \overline{x_1}\overline{x_2}\overline{x_3} + \overline{x_1}\overline{x_2}x_3 + \overline{x_1}x_2\overline{x_3} + x_1\overline{x_2}\overline{x_3} + x_1\overline{x_2}x_3$$
• Minimized:
• $$f = \overline{x_1}\overline{x_2} + \overline{x_1}x_2\overline{x_3} + x_1\overline{x_2}$$
• $$f = \overline{x_2} + \overline{x_1}x_2\overline{x_3}$$
• $$f = \overline{x_2} + \overline{x_1}\cdot\overline{x_3}$$
• Other Logical Gates:
• xor, or with a line behind it
• nand, and with a not on the output
• nor, or with a not on the output
• xnor, xor with a not on the output
• Implementation:
• propagation delay: a very small amount of time between input change and output change, $$T_p$$, non-zero.
• PLA(Programable Logic Array): Appendix A.11.1
• FPGA(Field Programmable Gate Array): A form of programmable logic device, the most powerful and flexible of the main programmable logic devices.
• VHDL is the hardware description language used
• Quartus2 is the software used to write and develop VHDL, can minimize functions automagically
• Modelsim is modeling software to simulate logic circuits

## Recitation 3: Subroutines and Logic Design

• NIOS2 is little-endian
• .text always starts assembly
• .end is always at the end
• .global _start is the main entry point
• .data starts the data section
• Make sure to mention TA’s poor comunication skills

## Lecture 7: VHDL

• Damn bastardized child of Ada.
• example: $$f(s, a, b) = \overbar{s}a + sb$$

library ieee;
use ieee.std_logic_1164.all;
entity mux2inputbit is
part (
s, a, b : in std_logic;
result: out std_logic );
end mux2inputbit;

architecture implementation1 of mux2inputbit is
begin
result <= (not s and a) or (s and b);
end implementation1;

• use ieee and ieee.stdlogic1164.all for the stdlogic type
• entity defines an interface for a part, including the signals
• architecture defines the implementation
• case insensitive
• use quartus to:
• compile
• minimize
• rtl view before minimize
• technology map view after minimize
• Datatypes
• stdlogic: 9 possible values, only use 2: 0 and 1; others are Z, X, W, L, H, -, U
• stdlogicvector: an array of stdlogic; (n downto m); V : out stdlogicvector(3 downto 0); V <= “1000”; V(2 downto 1) <= “10”; V(0) <= ’1’
• Operators
• logic: and, or, not, xor, nand, nor, xnor
• concatenation operator: &

## Lecture 8: More VHDL

• Combinations circuits, output is a function of the current inputs
• multiplexor: see mux2inputb.vhd
• 2 data input, 1 bit wide.
• control input
• multiplexor: m data inputs ($$m = 2^k$$), $$\lg m$$ control inputs, n-bit-wide max
• for 4 data inputs, 1 bit wide, we need 2 control inputs, see mux4input1bit.vhd
• for 4 data inputs, 2 bit wide, we need 2 control inputs, see mux4input2bit.vhd
• Sequential circuits: output depends on both input and the current state
• SR Latch
• see srlatch.vhd
• Gated SR latch
• see gatedsrlatch.vhd
• D latch
• see dlatch.vhd
• D flip-flop
• D latch triggered only on rising edge of the clock.
• d latch who’s output p is connected to another d latch whos clock is notted (triggered on falling edge)
• to make above triggered on rising edge, not the clock before the first, and renot on second

## Lecture 9: VHDL Sequential Statements

• Normally al statements within an architecture are concurrent.
• process is concurrent, however the statements within it are sequential.
• select/vhdl conditionals are not allowed, use if or a case statement.

## Lecture 10 (First Day Split): Machine-Level Programming 1: X86 Basics

• History of processors and archs
• C, assembly, machine code
• assembly basics: registers, operands, move
• intro to x86-64

### History

• totally dominates the computer market
• evolutionary design
• compatible back to the 8086 from 1978
• Added more features as time goes on
• Is CISC
• many different instructions with many different formats
• only a small subset encountered within GNU/Linux programs
• Hard to match the performance of RISC
• But Intel has done it in terms of speed, but less so for power
• Milestones
• 8086, 1978
• 386, 1985
• Pentium 4F, 2004
• Core i7, 2008
• 64 bit means you can have more memory
• Multi-core lets you run more at the same time (parallelism), thanks to Moore’s law, we can have more cores on the same space
• Introduced multimedia operations
• AMD recruted DEC’s designers and built Opteron and developed x86-64
• Intel tried IA64, but ended up building EM64T, nearly identical to x86-64

### C, Assembly, Machine Code

• Architecture: the ISA. The parts of a processor design that one needs to understand to write assembly code, specification, registers
• Microarchitecture: implementation of the architecture, chache sizes, core frequencies
• ISAs include x86, Nios, MIPS, SPARC, Power
• Generally have Program Counter, Registers, Condition Codes, in memory, you have object code, program data, os data, the stack and possibly a heap

### Assembly Basics

• C (gcc -S) to assembly (as) to object (ld) to executable
• remember that the stack is important, thus there are push and pop instructions
• Integers are 1, 2 or 4 bytes, they are data values or untyped pointers (addresses)
• floating point data are 4, 8 or 10 bytes
• no aggregate like arrays or structures
• arith functions
• data transfer
• control flow
• variable length instructions are a PITA
• Stacks have %esp for the stacp pointer
• follows a similar pattern as NIOS
• control flow, call, return, various branching ops
• program counter is %eip
• stacks are important for languages that support recursion.
• code must be reentrant
• you must have a place to store all of the relavent data

## Lecture 11: More X86

• only 6 registers in x86
• caller saves – the caller saves the registers before calling
• eax – ax, ah, al – accumulator
• ecx – cx, ch, cl – counter
• edx – dx, dh, dl – data
• callee saves – the callee saves the registers before using
• ebx – bx, bh, bl – base
• esi – si – source index
• edi – di – destination index
• esp – sp – stack pointer
• ebp – bp – base pointer

### Moving data

• movl source, dest
• imm, reg/mem
• reg, reg/mem
• mem, reg
• operand types
• immediate – prefix with ’\$’, integer data of 1, 2 or 4 bytes
• register, one of the 8 registers
• memory (4 consecutive bytes a address given by register), (%eax) or other address modes
• Normal, (reg), the register specifies the address
• Displacement n(reg), reg + d is the address
• General n(reg,in,scale) reg + s*in + d is the address

## Lecture 12: Still More X86

• 64 bit has registers r8 to r15, and their half-word variants
• -l instructions hav -q instructions also for 8 byte words (quad words)
• with the registers, we have enough to pass most arguments in registers rather than on the stack
• arith operations take src, dest, where dest = dest op source
• be careful with arith, can be done with atypical functions
• condition codes are used to determine equality and the like, these are in cf, zf, sf, of registers
• cd is unsigned overflow
• zf if zero
• sf if negative
• of if signed overflow
• can be set with cmpl/cmpq
• cf if unsigned carry out
• zf if a = b
• sf if a - b < 0
• of if signed carry out
• Or with testl/testq
• based on a & b
• can be set with setX instructions
• Jumping go based on tthe status of various flags, named in the X of jX

## Lecture 13: C Programming, Part 1

• using gcc
• pointers are used for indirection
• ampersand and a variable name give it’s address
• * is used to deref a pointer
• malloc is used to allocate memory from the heap, returns the address of the allocated chunk
• takes an argument to specify the size to allocate
• done at run time

## Lecture 14: C Programming, Part 2

#include <stdio.h>
#include <stdlib.h>

#define SIZE 10

struct node {
int value;
struct node *next;
};

typedef struct node NODE;

void display_list(int, NODE *, int);

int main(void) {
int i;
for (i = 0 ; i < SIZE; i++) {
current = (NODE*) malloc(sizeof(NODE));
current->value = i*i;
current->next = NULL;
if (i == 0)
else
prior->next = current;
prior = current;
}

}

void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}

• argc is an argument count
• arg is an argument vector
#include <stdio.h>
#include <stdlib.h>

struct node {
int value;
struct node *next;
};

typedef struct node NODE;

void display_list(int, NODE *, int);

int main(int argc, char** argv) {
int i, size;

if(argc != 2)
{
fprintf(stderr, "usage: %s number\n", argv[0]);
exit(1);
}

size = atoi(argv[1]);

for (i = 0 ; i < size; i++) {
current = (NODE*) malloc(sizeof(NODE));
current->value = i*i;
current->next = NULL;
if (i == 0)
else
prior->next = current;
prior = current;
}

}

void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}

#include <stdio.h>
#include <stdlib.h>

struct node {
int value;
struct node *next;
};

typedef struct node NODE;

void display_list(int, NODE *, int);

int main(int argc, char** argv) {
int size = 0, temp;
FILE * fptr;

if(argc != 2)
{
fprintf(stderr, "usage: %s file\n", argv[0]);
exit(1);
}

if (!(fptr = fopen(argv[1], "r")))
{
exit(1);
}

while(fscanf(fptr,"%d", &temp) == 1)
{
current = (NODE*) malloc(sizeof(NODE));
current->value = temp;
current->next = NULL;
if (size == 0)
else
prior->next = current;
prior = current;
size++;
}

fclose(fptr);
}

void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}

#include <stdio.h>
#include <stdlib.h>

struct node {
int value;
struct node *next;
};

typedef struct node NODE;

void display_list(int, NODE *, int);
void free_list(int, NODE *);

int main(int argc, char** argv) {
int size = 0, temp;
FILE * fptr;

if(argc != 2)
{
fprintf(stderr, "usage: %s file\n", argv[0]);
exit(1);
}

if (!(fptr = fopen(argv[1], "r")))
{
exit(1);
}

while(fscanf(fptr,"%d", &temp) == 1)
{
current = (NODE*) malloc(sizeof(NODE));
current->value = temp;
current->next = NULL;
if (size == 0)
else
prior->next = current;
prior = current;
size++;
}

fclose(fptr);
}

void display_list(int itr, NODE * mynode, int len)
{
int i;
NODE * current = mynode;
for(i = 0; i < itr; i++) {
printf("node[%d] = %d,%p\n", i % len, current->value, current->next);
current = current->next;
}
}

void free_list(int size, NODE* mynode)
{
NODE *current = mynode, *next;
next = current->next;
current->next = NULL;
current = next;
while (current->next != NULL) {
next = current->next;
printf("Free %d,%p\n", current->value, current->next);
free(current);
current = next;
}
printf("Free %d,%p\n", current-value, current->next);
free(current);
}



## Lecture 15: C Programming, Part 3

• be careful of endian-ness and dangling references
• worms can run by itself, and can propagate a fully woring versionto other computers
• virus adds themselves tto other programs. Cannot run independently
• return oriented programming involves finding gadgets in the code
• at the end of each gadget a return is called

## Lecture 16: Accessing I/O devices in C

• memory mapped I/O
• nios timers: 6 16 bit timer registers
• provides information about status, and for control mechanism
• to is a timeout signal set to by the timer when the count is 0, can be reset by writing a 0 to it
• run is to 1 by the timer if counting
• start/stop are used to start/suspend the timer by setting the relevant bit to 1
• managed by simply setting the various registers/addresses to specific values
• Be careful of pointer types and thus type sizes
• Be careful of sign extension

## Lecture 17: Interrupt Handling

• interrupt handling requires all registers and context to be saved
• exceptions are a transfer of control away from the normal program. These are software exceptions or hardware interrupts
• threading allows for a simulation of multiple processes
• multithreading happens with scheduling techniques

## Lecture 18: Hardware IO with the DE1

• Final Project due the 20th
• Midterm on the 6th over intel assembly and C programming
• Use eclipse todeal with the mechanics
• must open, install interrupt handler, and can then read/write to the device
• use the HAL for the interface, can use keys 1-3 and ledg2-8

## Lecture 19: ABIs and Memory Layout

• an ABI specifies
• Memory Layout
• On linux, everything gets 4 gb, layed out in various ways

## Recitation Notes

• Buliding a vending machine

## Lecture 20: The NIOS2 Datapath

• document avaliable on line
• 3 types of instructions in NIOS
• R: rsrc1 (31:27), rsrc2 (26:22), rdst(21:17), opx (16:6), op code (5:0)
• I: rsrc (31:27), rdst (26:22), immediate operand (21:6), op code (5:0)
• involves dark voodo magic
• MIPS also has R, I, J and they are similar, but layed out slightly differently
• data paths are a tad confusing, but make sense if you examine them carefully
• this is von Neumann
• classical model
• Steps
• fetch
• decode
• read registers (if not call or j-type)
• execute (most use the ALU, but in different ways)
• write back to either registers or memory
• Must be completed in one cycle
• implications:
• constraint: need separate memories for instructions and data (Harvard architecture)
• choice: No caches are needed (to simplify design)
• choice: register file has 2 output ports (to speed up reading 2 operand registers) and one input port
• choice: programmable alu instead of separate functional units
• ALUs must be controlled, in many cases, this is done with a nybble to choose the function

## Lecture 21: Memory Hierarchy

• Overview
• Storage technologies and trends – pyramidal
• Locality of reference
• caching in the memory hierarchy
• Tech and Trends
• Registers
• within processors
• RAM
• traditionally packaged as a chip
• basic unit is a one-bit cell
• multiple chips form the memory
• static ram
• 4 or 6 transistor circuit, keeps value as long as it’s powered
• insensitive to noise and radiation
• faster and moe expensive than dram
• dynamic ram
• stored with capacitor and transistor for access
• value must be refreshed every 10-100ms
• more sensitive to disturbances
• slower and cheaper than sram
• dxw organization
• access is by row and then column
• double data-rate synchronous dram (ddr sdram)
• Non-volatile
• types
rom
programmed during production
prom
can be programmed once
eprom
can be bulk erased using uv or xray
eeprom
electronic erase capability
flash memory
eeproms with partial erase capability, wears out after about 100000 erasings
• used for:
• firmware programes
• solid state disks
• disk caches
• Disk Drives
• very mechanical
• each platter is double sided, and has a head for each side
• uses one of several interfaces
• organized in tracks (concentric circles) and sectors (chords of each track)
• aligned tracks form a cylinder
• recording density is bits/inch
• track density is tracks/inch
• areal density bits/sq.inch
• bytes/sector * sectors/track * tracks/surface * surfaces/platter * platters/disk
• access = seektime + rotationtime + transfertime
• logical blocks are maped to physical sectors by hardware/firmware
• SSDs
• are flash, take a long time to write
• but are very fast for reads
• generally have wear-leveling logic
• busses
• system bus
• memory bus
• io bridge interfaces memory bus and system bus
• speeds have become dramatically faster, while storage hasn’t caught up
• locality of reference – programs tend to use data and instructions with addresses near or equal to those they have used recently
• temporal – think loops, like a loop counter
• spatial locality – nearby addresses tend to be referenced close together in time

## Lecture 22: Pipelining

• steps
1. fetch (f)
4. memory access (m)
5. write (w)
• single-cycle
• takes single cycle for frcmw
• multi-cycle
• takes a cycle for each step of frcmw
• pipelining lets the process for a new instruction start at each pulse, effectively being both single cycle and multicycle simltaneously
• has hazards: situations that delay or discard an instruction in the pipeline, we say that the pipeline is ’stalled’, or that there is a ’bubble’ in the pipeline
structure hazards
multiple instructions using the same component at the same time
• deleteing memory
• when two instructions must read memory at the same time
• pipeline stalling, delay instructions by a clock cycle, staggering, simple, but inefficient
• component duplication for to code and normal memory
• single memory, duplicated i/o
• when changing pc address due to branching
data hazards
data dependency between two instructions
• when registers are accessed at the same time
• pipeline stalling, is very, very inefficient
• operand forwarding, don’t delay second instruction, but instead send the values of the relevant registers to the next instruction in the pipeline, much faster
• software handling, puts nops detween relevant operations
control hazard
control transfer instruction

## Lecture 23: Caching in the Memory Hierarchy

• levels:
• registers
• l1 cache
• l2 cache
• main memory
• secondary storage (disks, solid state)
• remote storage (tapes, distributed file systems, the web)
• caches provide a smaller, faster storage device to act as a staging area for subset of data in a larger, slower device
• these are used because of locality
• three types of misses
• cold (start up, always happens)
• conflict miss (maps two potential blocks to the same place)
• capacity miss (when active cache blocks is larger than the cache)
• small and sram based, generally 2 levels
• entries in caches have valid and tag
• cache addresses have 3 parts
block offset
4 bits
set index
3 bits
tag
remainder
• addresses have s, b, tag, and if valid bit is set, you can use that cache line
• where there is only one line per set, this is a direct mapped cache
• when no match, then the old line must be replaced
• on miss, grap from memory, cpu only accesses data from the cache
• set associative, two lines per set
• allows for two different group of info in the cache

## Lecture 24: More Caching

• review of last lecture
• E-way set assosiactive cashe lets you have E lines per set
• writes can have problems when multiple copies exist
• L1, L2, Ln
• Main memory
• disk
• other
• On write hit
write-through
write straight to memory
write-back
defer writing until line relpacement, needs a dirty bit
• On write-miss
write-allocate
load into cache, update line in cache
no-write-allocate
writes immediately to memory
• typicall
• write-through, write-allocate
• write-back, no-write-allocate (most common)
• metrics
• miss rate
• hit time
• miss penalty
• huge numbers game
• to write friendly code, make the common case go fast
• minimize the misses in inner loops
• repeated references to variables are good
• stride-1 reference patterns are good

### Virtual Memory

• Address Range may be up to a specific number, but only a certain portion is available
• virtual memory lets you swap pages of memory in and out of slower devices
• pages are the primary unit, and don’t always have to be in memory
• chunks of a process called pages, and chunks of memory are frames
• the OS maintains a page table for each process, contains frame location for each page, address consists of a page number and offset within the page

## Lecture 25: More Memory Management and Paging

• taking process and splitting memory up into bits that can be swapped in and out of memory, stored on disk temporarily if necessary
• this is done with virtual memory, locality and the like and managed by the operating system, sometimes with the help of an mmu
• each page can go in any free frame
• the page table is used to manage where each page is located, either in a frame or in the swap space
• then addresses have a logical address, the first n bits is a page number, the remaining is an offset
• the page number is replaced with the address start from the page table
• replacement policy
• deals with the selection of pages to replace and when new pages must be brought in
• the more elaborate the policy, the greater the hardware and software overhead to implement it
• policies
optimal
selects the page for which the next reference is the longest (may not be known ahead of time), produces three page faults after the frame allocation has been filled
least-recently-used
the one that hasn’t been used most recently is replaced, many faults can be caused. This is very difficult to ipmlement, as most implementations require a great deal of overhead
first-in-first-out
exactly as it sounds
• translation lookaside buffer – a type of cache, caching the page table to make access to memory easier
• page number and frame number pairs are cached
• virtual access is checked, if tlb has it, get physical address, if not, check page table and then get physical address, unless non-address value is returned
• page faults:
• disk to memory
• update page table
• update tlb