Main Study Topics

Introduction to Computer Architecture

Textbook
Chapter 1, Computer Abstraction and Technology
Appendix A, Compiler, Assembler, Linker and Program Execution
C Introduction, Pointer and Memory Address in C
Chapter 2, Instructions, Language of the Computer
Chapter 3, Arithmetic for Computers
Chapter 4, The Processor

Textbooks

Read at least 20 pages per week in average following the class schedule

Required: Computer Organization and Design RISC-V Edition: The Hardware/Software Interface, Amazon, Publisher website
Reference: Digital Design and Computer Architecture: 2nd Edition, by Sarah Harris and David Harris, Amazon, Publisher website.
Reference: C Programming Language, 2nd Edition, by Brian W. Kernighan, Dennis Ritchie, Dennis M. Ritchie Amazon.

For each lecture, the first 3-4 topics are the most important ones.

Chapter 1, Computer Abstraction and Technology

Lecture 01, Chapter 1.1 - 1.4, Introduction, great ideas, Moore’s law, abstraction, computer components, and program execution

Explain what is Moore's Law, how it has been impacting the computer industry and how it has been used for predicting computer performance starting from 1970s. Another question that can be answered by Moore's Law is: Why do computer industry and technology advance so fast? For example, it is common that a new generation of computer, CPU, smartphones are produced in every two years.
How the idea of abstraction is exploited in computer hardware/software? E.g. in the layers of high-level language, ISA and ABI, and hardware implementation, low-level details are hidden from the higher level.
Understand how a program (application software) written in high-level programming languages is being executed by the hardware. First, compilers translate programs written in high-level programming languages into machine instructions (assembly code). From the assembly codes, assemblers translate symbolic notation of the instructions to the binary. Operating system launches the program of the binary format onto the hardware to execute. The hardware execute the program instruction by instruction.
Inspect the assembly code (.s file and the objdump -D output) of swap and sum program. The code are generated using gcc commands with -s, -c and the objdump -D command for X86_64 ISA. If you can, compile and run the code by yourself from Linux command line.
What are the names of the two authors of the textbook and what award the two authors won in 2018?
List 3-5 great ideas in computer architectures and explain them.
Watch John Hennessy and David Patterson Turing Award Lecture at ISCA 2018 and read the article about why they think this is A New Golden Age for Computer Architecture.

Lecture 02 Basic C Programming and Compilation, Assembling, Linking and Program Execution

Understand the three major components of a computer? processor, memory and I/O interfaces. Understand how memory store data using binary format.
how a high-level program is compiled, linked and executed in a computer.

Lecture 03, Chapter 1.6 - 1.7, Performance and Power

Understand the formula to compute CPU time, CPU time = Clock cycles * Cycle time, CPU Time = Instruction Count * CPI * Cycle time. Know how to use the fomula to solve different performance related problems and perform quantitative analyslis of CPU performance. You should be able to solve the problem with book and note closed. A problem may ask you to calcuate one of the three, CPU time, Cycle time/frequency, required CPI.
Know how to compute CPI and CPU time under the condition that multiple classes of instructions should be taken into account.
Understand the relationship between cycle time and CPU frequency.
The core elements for computer chips is silicon.
Explain what is the response time (latency) and throughput (bandwidth)
Review and get familar with Linux and C programming (slides), and make sure you are able to access CSE Linux machines, edit, compile and run a program from Linux by following the hands-on exercise.
What are the factors that determines energy consumption and dynamic power of computers? It is important to know that power draw of a computer is linearly proportional to the frequency.

Lecture 04, Memory and Binary Systems

Variables of a program and memory address (pointers); the relationship between C array, pointers, and memory in C programming; array references are memory access;
Binary numbers to decimal number conversion

Lecture 05, Chapter 1.8 - 1.9

Power consumption is one of the major limiting factor for scaling CPU frequency according to the Moore's law. Thus, even according to the Moore's law, we can double the transistor density every two years, we were not able to increase the performance by twice every two years starting from around 2005.
From around 2005, the industry started shifting to multi-core and multiprocessor design in the main-stream computer and devices. Parallel programming multicore and multiprocessing system is more challenging than programming a serial program.
Do the the hands-on exercise.
SPEC CPU are the standard benchmarks the industry use to evaluate and compare CPU performance.
Amdahl's Law equation can conclude at least two things for designing a computer for a program: 1). There is an upperbound of overall performance improvement of program when we try to optimize and improve part of the program. 2). To optimize or improve the common case fast is a general guideline (common sense) when one want to improve the performance of a program.

Appendix A: Compiler, Assembler, Linker and Program Execution

Lecture 04, Chapter A.1 - A.4, and Hands-On

Review the three most important topics of Chapter 01 (Moore's Law, the idea of abstraction in computer hardware and software, analysis of CPU time using IC, CPI and Cycle time).
Assembly language can be written by human or generated by compiler. It is mostly used for mission-critical system. Assembly code are architecture-specific and not portable.
Today's computer are Von Neumann computer, which has CPU, memory and I/O.
Know that a for or while loop in high-level language is translated to assembly code using branch instructions. References to array elements or variables are translated to load or store instructions (References on the left of the = sign are translated to store, and array references on the right of the = sign are translated to load.
Know how to use perf and PAPI to collect cycles and instructions of a program.
Know the building and execution process of a program: preprocessing, compiling, assembly, linking, and load and execution. The role of assembler and linker, how to use gcc to compile and link source and object files into executables.

Lecture 05

Understand the four stages of compilation process of creating an executable
Understand the difference between static linking and dynamic linking
Know how to build an executable using both static and dynamic linking
Familar with linux commands of building and inspecting files (gcc, objdump, nm, and ldd).
Familar with how the memory works in computer. CPU request or send data from or to memory by sending an address and a signal indicating a read or write. Memory responds depending on whether it is a read or write request.

Lecture 06, C Introduction, Pointer and Memory

To become familar with fundamental C programming, including the basics of using macro and proprecessing for programming.
Understand that the value of a variable (regarlesss of its type, e.g. int a) is stored in memory and the memory address of that variable can be obtained using &a. The address of a variable can be stored in a pointer data type, e.g. int * a_ptr; a_ptr = &a;. Given the memor address of a variable, one can use * to reference the data (value) stored in that memory address, .e.g *a. It is important to understand the left-value and right-value concept.
Familar with C array and its relationship with pointer. E.g. for array int a[10], a is int * type and it is the base memory address of the array, which is also the memory address of the first element of the array (a[0]). The memory address of element i can be obtained using int *ptr = &a[i]. Given an array, know how to calcuate the address of each element of the array, which is (char*)a + sizeof(array_element_type) * i, or simply a+i since C compiler is able to scale the pointer arithmetic with the size of the data type of the array.
For C two-dimensional array, e.g. int A[M][N], know that C stores all the array elements in row major. Know how to calcuate memory address of any element of the array A[i][j], which is (char*)A + sizeof(int) * (i * N + j), or simply A+i*N+j.
Understand the Java's approach of storing 2-dimensional array, e.g. int A[M][N];. A[M] is an array of memory addresses; each address is the base address of an array int [N]. Thus comparing with C's approach of array storage, Java's approach needs M additional elements for storign the memory addresses.

Chapter 2, Instructions, Language of the Computer

Lecture 07, Chapter 2.1 - 2.5

Instruction Set Architecture (ISA): the interface between hardware and software. It specifies the instructions, their format and usage for programmer and compiler writer to program CPU and hardware. Hardware implement those instructions and provide support for software to use the hardware through the ISA interface. Know the philosophy of CISE and RISC, and their major difference and examples.
RISC-style (MIPS) arithmetic instructions (e.g. add, sub, addi) all have three operands, two of which are source operands, named rs and rt and one is destination operand named rd. add and sub are R-type instructions in the format of add|sub rd, rs, rt. R-type means that all the operands are from Registers. addi is an I-type instruction meaning one of the operands (rt) is an Immediate (constant), e.g. addi $10, $5, 40.
RISC-style (MIPS) memory access instructions are load and store instructions (lw, and sw for example). These two instructions have three operands as well; one is rt or rd register, the second is the register for the base address and the third one is the offset. You should be very famailar with using those instructions to transform high-level language statement such as A[12] = h + A[8]] to machine instructions.
2s-Complement Signed Integers representation: know how to create the binary representation in 2's complement format of both positive and negative numbers. Know how to extend a binary number in 2's complement format to a number that has more digits.
Understand how to encode an instruction in its binary format. MIPS instructions are all encoded using 32-bit words. Why in MIPS, we only need 5-bit fields for register filed in the instruction encoding? (because MIPS only has 32 registers)

Lecture 08, Chapter 2.6 - 2.9, Appendix A.4 - A.6

Three major classes of instructions in most RISC ISA: Arithmetic-logic instructions, Memory load/store instructions, and control transfer instructions.
Know major logic instructions, shift left or right, AND, OR, NOT. They are either R- or I-format requiring three operands.
Familar with the three instructions for conditional branch and unconditional jump (bne, beq and j), and know how to use them to convert high-level if-else and while statement using the three instructions.
Familar with the jalr and jr instructions for implementing function calls. Understand the steps of function calls.
Understand stack memory, activation record (stack frame, function frame) and how function calls are handled with regards to the handling of stack memory, parameter passing, return address and values.
Be aware that a string in C is referrenced using a char * variable which is the address of the first character of a string. E.g. for char * str = "Hello World\n", str is the address of character H. MIPS has lb/lh and sb/sh for loading and storing a byte or a halfword from and to memory.
It is important to be familar with writing pseduo assembly code using the instructions from the textbook.

Lecture 09, Chapter 2.10, 2.13, and 2.14

Understand how hardware calcualte the branch target address for conditional branch instructions and unconditional jump instructions. For bne and beq instructions, only 16-bit fields of an instruction word can be used for the branch target and the target address is calcuated as pc + offset * 4. It is also referred to as relative address. For unconditional jump, it use 26-bit of an instruction word to specify absolute address, thus the jump target address is just the number * 4. Understand the example in the textbook.
Understand the patterns for translating high-level if-else statement and for/while loop into assembly instructions. bne/beq and j are needed for the translation. For greater/less than comparison and branching, slt is needed to be combined with beq/bne.
Understand the sort/swap examples and able to explain the purpose of each instruction used in the code.
Familar with the C pointer and pointer arithmetic, and familar with their usage in comparison with arrary reference. know the differences of translated code from array references and pointer version.

Lecture 10, Chapter 2.16, 2.17, 2.20, and MARS MIPS Simulator

ARM ISA is widely used for embedded devices including smart phone and smart pad. ARM ISA was very similar to MIPS when it was created. It is RISC style ISA.
x86 ISA is a CISC style ISA and it has been evolved several generations from 8-bit to today's 64-bit ISA. Intel has a hardware layer to translate x86 CISC style instructions into RISC-style microoperations for hardware implementation.
There are three major classes of instructions that are essential for general purpose CPUs: arithmetic-logic instructions (R- or I-format), load/store memory instructions (I-format), and control transfor instructions (conditional branch and unconditional jump).
Get familar with MARS MPIS assemblyer and simulator.

Chapter 3, Arithmetic for Computers

Lecture 11, Chapter 3.1 - 3.4

Understand how computer does integer addition, substraction, multiplication and division?
Understand why overflow could happen for addition and substraction and the situation that overflow could happen when adding or subtracting two numbers depending on their sign and the sign of the result.
Understand the basic algorithms to do multiplication and division.
Know how to use MIPS instructions to do multiplication and divison. How LO and HI registers are used for multiplication and divison.

Lecture 12, Chapter 3.5

Able to create IEEE 754 formatted representation in single and double precision for decimal numbers.
Understand the three parts needed in the IEEE 754 FP format (sigh, fraction and exponent).
Able to calcuate the decimal numbers from an IEEE 754 reprenstation.
Understand how addition and multiplication are performed using IEEE 754 format.
Understand the assembly code that convert F to C degree.

Lecture 13, Chapter 3.6 - 3.8 and 3.10

Explain what is SIMD architecture and why it is posibble to do in a scalar architecture and ISA
Explain why image and audio processing are suited for SIMD processing
Know that Intel extends FP processing with three kinds of SIMD extensions, MMX, SSE/SSE2/SSE3/SSE4 and AVX

Chapter 4, The Processor

Lecture 14, Chapter 4.1 - 4.3

Explain each steps of instruction execution for each of the three classes of instructions (AL, LW/SW, beq/j)
Given a CPU datapath figure (Figure 4.2, 4.15, or 4.17 in the textbook), know what each component does (PC, two adders, IM, Registers, Mux, ALU, DM, Sign-extend), know what each line does and their width (datapath and control path), given an instruction, mark the lines that an instruction uses, and given a control signal, know what instructions use, e.g. RegWrite, MemRead, MemWrite, ALUSrc, etc

Lecture 15, Chapter 4.5

Explain what is pipeline and its benefits. Why does pipeline only improve overall throughtput, not per-instruction performance? What is the ideal speedup of the 5-stage pipeline?
What are pipeline hazards and what are the three pipeline hazards? Explain what is structural hazards and what is the genral solution for structural hazards?
Explain what is data hazards using a sequence of two or three instructions. What are the two solutions (interlocking and forwarding) for dealing with data hazards. How forwarding works for handling data hazards of two ALU instructions? Why even using forwarding, Load-Use data hazard still needs one bubble (stall cycle) to deal with data hazard?
Understand what is control hazards and why control hazards happen?

Lecture 16, Chapter 4.6

Using examples (e.g. executing a ADD/LW/SW instruction) to explain the purpose of pipeline registers when creating a pipeline CPU?
Explain the components and data path used in Figure 4.41 or 4.51 when executing an instruction in each of the pipeline stages.
Know the control signal used in Figure 4.51 when executing an instruction in each of the pipeline stages. Optional: know how to derive each of the control signals in Figure 4.51

Main Study Topics

Table of Contents

Textbooks

Read at least 20 pages per week in average following the class schedule

For each lecture, the first 3-4 topics are the most important ones.

Chapter 1, Computer Abstraction and Technology

Lecture 02 Basic C Programming and Compilation, Assembling, Linking and Program Execution

Lecture 05, Chapter 1.8 - 1.9

Lecture 04, Chapter A.1 - A.4, and Hands-On

Lecture 05

Lecture 07, Chapter 2.1 - 2.5

Lecture 08, Chapter 2.6 - 2.9, Appendix A.4 - A.6

Lecture 09, Chapter 2.10, 2.13, and 2.14

Lecture 10, Chapter 2.16, 2.17, 2.20, and MARS MIPS Simulator

Lecture 11, Chapter 3.1 - 3.4

Lecture 12, Chapter 3.5

Lecture 13, Chapter 3.6 - 3.8 and 3.10

Lecture 14, Chapter 4.1 - 4.3

Lecture 15, Chapter 4.5

Lecture 16, Chapter 4.6