CS 61B, Data Structures
Matt's Notes
Welcome! Here are some notes from our sections together!
Click on any link below to view.
Dis 00. Template.
January 00, 2019.
Dis 10. Graphs.
April 3, 2018.
DFS Traversal
We can traverse a graph in DFS-fashion by following this pseudocode:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11  
Use STACK as fringe.
Maintain SET of visited vertices.
Add start vertex to fringe.

while fringe not empty:
    Remove next from fringe.
    if not visited:
        Process removed node.
        for each neighbor:
            Add to fringe.
        Mark removed node as visited.

Just as before with trees, it is important to note that the order in which we add our neighbors to the fringe may alter our final result. Further, some variations of this pseudocode may check whether a node has been visited at a different point of the algorithm, and this may also lead to differing results.

For the graph below, start at node 0 and execute the DFS-traversal algorithm. What is the list of nodes processed?


(Source: CSM 61BL Su17, WS 6, Q6.)

Dis 09. Traversals.
March 19 + 20, 2019
Preorder, Inorder, Postorder
Pre-order, In-order, and Post-order are ways to describe when nodes in a binary tree are processed (an example of processing a node would be printing its value). A binary tree is a tree in which nodes can have up to 2 children each.

Pre-order means to process the current node before pre-order traversing down the left and right of the current node. In-order means to first in-order traverse the left of the current node, process the current node, and then in-order traverse down the right of the current node. Post-order means to post-order traverse down the left and right of the current node before finally processing the current node.

Visually, you can determine the order nodes are processed during pre-, in-, or post-order traversal by drawing a peg on the west, south, or east side of each node, respectively, and tracing a line around the tree diagram as shown below. The nodes are processed in the order the yellow line touches each red peg.

Pre-order has pegs on the west side of each node.
A, B, C, D, E.



In-order has pegs on the south side of each node.
B, A, D, C, E.



Post-order has pegs on the east side of each node.
B, D, E, C, A.



More formally, we can write this pseudocode to reflect the definitions of when pre-, in-, and post-order processing occurs:

1
2
3
4
5
6
7
8   
binaryTreeTraversal(node):
    preorder(node)
    if left != null:
        binaryTreeTraversal(left)
    inorder(node)
        if right != null:
        binaryTreeTraversal(right)
    postorder(node)



DFS, BFS
The yellow path we traced here for pre/in/post-order is known as depth-first search traversal (DFS). This is because in all three variations, we visited nodes by venturing all the way down a path before back-tracking and venturing down another path (we go deep down into the tree first). Given the tree above, DFS traversal gives: 1 2 3 4 5 6 7.

On the other hand, breadth-first search traversal (BFS) involves visiting nodes level by level. Again, remember that pre/in/post-order does not make sense in the context of BFS because in all three of those, we visit nodes by following that yellow path-in DFS fashion. Given the tree above, BFS traversal gives: 1 2 4 5 3 6 7.

Before we continue, recall that a stack data structure is like a stack of pancakes. If new pancakes are added to my stack, I must eat the fresh pancakes on the top of the stack before I can eat the earlier ones which are sitting lower in the stack. In contrast, a queue is like a queue of people standing in line to buy Fish and Chips at the marina. Whoever got in line first will be the first one served and as new people jump in line, they must wait until everyone before them has been served before they have a chance to enjoy light, flaky fish steaks paired with luscious tarter sauce and a modest side of fries.

The pseudocode below describes the algorithm for DFS traversal. Note that just like how when we traverse an array we need an index/counter variable or when we traverse a LinkedList we need a node pointer, for tree traversal, we can use certain data structures to help us keep track of where we are. This is called a fringe. For DFS, we use a stack and use the following procedure:

1
2
3
4
5
6
7  
Use STACK as fringe.
Add root to fringe.
while fringe not empty:
    Remove next from fringe.
    Add its children to fringe.
    Process removed node.
    

The procedure for performing BFS traversal is almost identical! The only difference is that we instead use a queue has our fringe:

1
2
3
4
5
6
7 
Use QUEUE as fringe.
Add root to fringe.
while fringe not empty:
    Remove next from fringe.
    Add its children to fringe.
    Process removed node.
            

Dis 06. Disjoint Sets and Asymptotics.
February 26 + 27, 2019.
Disjoint Sets
Disjoint Sets are a bunch of sets which are disjoint (disjoint means that the sets have no elements in common). Initially, each item belongs to its own set; everybody starts out by themselves. Disjoint sets support a union operation, which merges together two sets, and a find operation, which tells us what set an item belongs to. Each set can be represented as a tree (there can be one representative for each of the sets, and all the other members can be in a tree under that root). And because we can represent them as trees, we can also represent them in an array; each child can hold the index of its parent in the tree representation, and the root can store the size of the tree under it instead, since it has no parent (we can store the size as a negative number to differentiate it from a parent index). The Weighted Quick Union optimization helps reduce the depth of items in a set by unioning smaller subtrees into larger ones. The Path Compression optimization also helps with depth by reassigning each node's parent to being the root of the set/tree as nodes are traversed in the search for a particular member.

Asymptotics
We talked about the formal definition of the asymptotic bounds and we made it our goal to gain intuition for how they work so that we don't have to setup a limit explictily every single time. The summer CS61BL website has a great and thorough explanation of this again for you: CS 61BL Lab 07

Further, we plotted some lines using an iPython notebook just so that we can see what some functions look like and why it would make sense to want the asymptotic definitions we have. You can see the notebook we were looking at here: iPython notebook

Some strategies for asymptotic analysis: Remember that your usage of Big-O, Big-Omega, or Big-Theta depends on the exact question being asked.

If you are simply asked for the runtime of the function and you find that you cannot give a tight bound, you should give an upper bound and possibly a lower bound as well if it is meaningful. These bounds would describe the range of runtimes for all possible cases of inputs.

If for the same code you are asked specifically for the best case or worst case runtime, then you would give the same magnitude answer as your lower or upper bound, respectively, but you would present it as a tight bound. This is because the best (or worst) case is one particular case (one particular input) and thus you would state that that one case is tightly bound by a particular function; it doesn't really make sense to say that one case is lower or upper bounded (it's vague to state that when you can provide a tight bound instead).
Dis 05. Iterators and Iterables.
February 19 + 20, 2019.
The word "iterator" is a noun, while the word "iterable" is an adjective.

In Java, Iterators thus represent a mechanism for dispensing one item at a time in some sequence, and it does so by providing a hasNext() method, which lets us know whether there is still another item for it to provide, and a next() method, which provides said item. (If next() is called by hasNext() is false, meaning that no such next item exists, then a NoSuchElementException should be thrown.)

On the other hand, an Iterable represents a sequence that can be iterated over. If you've ever been to an old-school deli, the red ticket dispensing machine might be analogous to an Iterator, since it provides one ticket at a time, while the spool of tickets represents the thing that is Iterable, since it is a series of the tickets we want to provide in order one at a time. Iterables must have an iterator() method, which is responsible for providing a fresh iterator for itself that starts from the beginning and that is independent of any other instances of iterators. For example, we might call iterator() twice to have two separate iterators if we wanted to have one spool of tickets for our meat counter, and another spool of tickets for our fish counter. Both spools would have tickets numbered from 1-1000, but each red ticket dispensing machine will do its job independent of what the other is doing.

The benefit of doing this is that if we have something that is Iterable, we can use it via an enhanced for-loop. Many Java data structures are iterable, and thus make it easy for you to iterate over the items in it, such as an ArrayList or a Deque. Of course, we thus need to know about Iterators because in order to make our own custom-defined classes Iterable, it must implement iterator(), and so we will have to define an Iterator as well; its not enough to just have a spool of tickets, you have to define a mechanism for how you're going to hand out those tickets too.
Dis 04. Inheritance.
February 12 + 13, 2019.
Click here to view slides from this discussion.
Dis 03. LinkedLists and Arrays.
February 5 + 6, 2019.
Our Story of LinkedLists
We started out with the raw IntList. This was a nice a simple data structure that we could use to store individual integers in a series of connected links.

Next came the Singly Linked List, which allowed us to encapsulate our raw IntList. The advantage of encapsulation is that it allows us to provide for our users methods for performing operations on our data structure so that they do not need to be knowledgeable about the underlying intricacies of our implementation in our to use it; they can simply refer to an API or list of our methods and their descriptions and they can use our data structure like a black box.

Following that, we introduced Sentinels into our Singly Linked List. This was a "dummy" node that had an unimportant item that was always included. The purpose of a sentinel is to remove the need for null checks, since those can be messy in our code or it may be easy to miss a couple situations. Thus, instead of checking if an instance is null, we can check if it has returned back to the sentinel.

Finally, we allowed ourselves to have referenes in both directions, so that we could access both the previous and the next. This was the Doubly Linked List.

Problem Solving Strategies
Whenever you are asked to write a signficant amount of code by hand, first make a concrete example for yourself so that you understand what the problem is asking you to do, so that you get a feel for the operations that may be involved, and so that you have something to check against once you actually start writing code in a bit.

Next, break down the problem into functional parts or steps that you know how to do. If you are stuck, write out the parts of the code that are most obvious to you, even if you have to leave conditions or other things blank. For example, if we have to do an operation repeatedly, you know that there would have to be a loop or a recursive call, so you can set up the components for that first while you're still thinking (loop conditions, incrementations, base cases, etc.). Try to get the general situation working first, and then worry about the edge cases once you're done. Often times, those can be addressed by adding a simple if case at the top of your code later.

Arrays vs LinkedLists
Finally, we informally made observations about the arrays and linkedlists that we worked with today. We'll cover these in more detail later when we study runtime analysis. But for now, we saw that it was easier to get items out of an array, but that it was harder to "resize" them since they are of fixed length and so that would actually require us to make a new array and to copy everything over. LinkedLists on the other hand, require that we traverse the list in order to access a particular item, but it only requires us to switch around a few pointers to insert or delete items.
Dis 02. Static and IntLists.
January 29 + 30, 2019.
Architecture of a Class
Typical class definitions contain three sections of code.

First we have fields, which consist of class variables and instance variables that are useful for storing attributes, state, characteristics, data, etc. Class variables have the keyword static and thus each only hold one value that is shared among all object instances of the class. Instance variables are non-static declarations that contain unique values for every object instance made.

Next are the constructors, which are necessarily invoked when the user of your class wants to make a new object instance. The code inside constructors should be responsible for accepting some arguments, and storing those inputs into the fields.

Finally we have methods or functions. These may accept input, and may interact with and modify the fields before potentially returning some result.

Here is an example of these three components in action.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21  
public class Waffle {

    // Fields
    private boolean isDelicious;
    private int numStrawberries;

    // Constructors
    public Waffle(boolean isDelicious, int numStrawberries) {
        this.isDelicious = isDelicious;
        this.numStrawberries = numStrawberries;
    }

    // Methods
    public boolean isDelicious() {
        return isDelicious;
    }

    public void addStrawberries(int num) {
        this.numStrawberries += num;
    }
}
Static Methods and Variables
Fields can be labeled as static (class variables), and methods can also be labeled as static.

For static fields, there is only one variable that is shared among all instances of the class, and thus consequently, all object instances see the same value for that field (again, because there is a single variable shared among all; there is not one variable per instance). To call upon a static field, we may use dot notation on the class name rather than on a particular instance. If you do have a particular instance name, you may also use that to call upon a static field.

For static methods, an object instance is not required for the function to be called. As a consequence, it cannot access non-static fields. If you'd like, you can call static methods on an object instance, but in that case Java will lookup the class of the instance and call it as we've just described.

IntLists
You'll often see statements such as L.rest.rest = L.rest.rest.rest;, but sometimes it seems like the way you follow the "rests" seem different for the left and right sides of the equals sign. Why is that?

Think about what the equals sign means. For example, let's look at an assignment statement such as this: String var = "object";. What this the box and pointer diagram look like?



We can see that var corresponds to a variable box in the diagram, and that the right hand side of the assignment statement was a String object.

In the same way, the left side of the statement L.rest.rest = L.rest.rest.rest; is a variable box in the diagram, and the right hand side is an IntList object. Therefore, when we count the number of "rests" for the left hand side, we stop counting in time so that we are left with the last "rest" variable box so that we can reassign it. When we count for the right hand side, we follow the final "rest" to an Object, and assign the left hand side to reference it.

Walk through this simple example and make sure you understand how this statement works and why.



Destructive vs. Non-Destructive
When writing a destructive function, you want to destroy the original and so you should not be creating new objects. Thus, destructive functions do not use the new keyword.

On the other hand, non-destructive functions want to preserve the original and return a new version. Thus, non-destructive functions should use the new keyword to create fresh objects as the necessary modifications are applied.
Dis 01. Introduction to Java
January 22 + 23, 2019.
Java programs begin execution starting from a main method public static void main (String[] args) {...}.

Primitives vs. Non-Primitives
We keep track of data using variables. Variables come in two flavors: primitives and non-primitives. Primitives in Java include only the following: byte, short, int, long, float, double, boolean, char. All other kinds of variables are non-primitives. Variables representing primitives store the appropriate raw value, while variables representing non-primitives store references to the actual data.

The creators of Java made this design decision for the language due to efficiency justifications. Imagine that you're in at an interview for your dream job. The manager really loves you and is almost ready to hire you, but first, they'd like to ensure that everything you've said about yourself is true. And so, the manager would like to verify with your best friend that you have indeed been truthful. But now the question is, would it be appropriate for you to clone your best friend so that the manager can chat with them? Probably not...your friend might not appreciate that very much. It makes much more sense to provide the manager with a reference to your friend, a phone number perhaps. This is why Java handles "complicated" (non-primitive) variables using references.

This goes hand-in-hand with what Professor Hug calls the Golden Rule of Equals. When we are drawing box and pointer diagrams, when a variable's contents are asked of, we provide whatever is inside that variable's corresponding box. This means that for primitives we copy the raw value, while for non-primitives we copy the arrow that we have drawn, such that our copied arrow points to the same destination. Given this distinction, if a function takes in both a primitive and a non-primitive as arguments, and it mangles both during the function call, when the function is over, which of the two (or both?) were actually changed? See the example below to find out:



Iteration and Recursion
Often times, we want to write code that does something over and over again. We have two techniques for doing this: iteration and recursion.

Iteration involves using a loop. In Java, one kind of loop is the while loop. To evaluate it, we follow these steps in order:
  1. Evaluate the loop condition. If it is false the loop is over and execution continues starting after the closing curly bracket of the loop.
  2. If it is true, then we start at the top of the loop's body of code and execute those lines in order until we reach its end.
  3. At this point, we go back to step 1 and check the condition again.
It is important to note that even if the condition becomes false in the middle of executing the loop body, the loop does not magically quite immediately. The loop can only quit in between executions of its body (unless there is a break statement, an Exception is thrown, etc.). Another kind of loop is the for loop. This kind of loop was introduced because programmers noticed that loops in general typically contained the initialization of some counter variable, the check of some condition against that counter variable, and the modification of this counter variable as iterations occur. For loops simply bundle these three elements together into a visually convenient place. We evaluate for loops following these steps in order:
  1. Initialize the counter variable.
  2. Evaluate the loop condition. If it is false the loop is over and execution continues starting after the closing curly bracket of the loop.
  3. If it is true, then we start at the top of the loop's body of code and execute those lines in order until we reach its end.
  4. Adjust the counter value according to the specified fashion. Usually an incrementation or decrementation.
  5. At this point, we go back to step 2 and check the condition again.
The other option we have is recursion. This occurs anytime we have a function and that function's body of code contains a statement in which it calls upon itself again. To prevent infinite recursion, we want our recursive functions to contain base cases, which are if statements that provide direct results of the simplest situations, and in all other cases our recursive calls should aim to move our complicated situation closer and closer to one of our base cases.

With the classic Fibonacci exercise (#3 on the worksheet), we saw how there may be many ways to solve the same problem and to arrive at the correct answer. Therefore, simply solving a problem one way is cute and all, but in the real world, that doesn't mean you're done--we want to find the "best" solution. What does "best" mean? It means code that is not only correct, but also fast, memory efficient, and clear to other human readers. In other words, in this class we will learn to consider time efficiency, space efficiency, and style/documentation.