Question 3:

a) The lowest layer is: Reactive layer. The defining characteristics are: Constant time and Smooth.

b) The middle layer is: Execution Engine. The defining characteristics are: Bounded time, monitoring, map planner steps to reactive actions.

c) The top layer is: Deliberative layer. The defining characteristics are: Planner (arbitrary time) and abstract layer time.

Question 4

[http://www.cs.utah.edu/~hal/courses/2009S_AI/Walkthrough/GreedyBFS/]

Question 5

a) Having a heuristic value of zero everywhere is not optimistic because if the value is negative, the algorithm will overshoot its distance estimate.

b) Question by Andrew L:

…You ask us to find a bound on the new cost given a modified a* heuristic. We have figured out the minimal bound will be the same as the original minimum cost C*,but we cannot figure out an upper bound. I was going to put an upper bound of a*n (n=number of nodes on optimal path), but I don't think that's correct anymore…

Response from Will:

In normal A* search, f(n) = g(n) + h(n). We're modifying things to make f'(n) = g(n) + a x h(n). The optimum path cost is C*. When you use f'(n) instead of f(n) in what would otherwise be A*, then you get a new path. That path will have cost <= a x C*. You should be able to prove that. The proof is very similar to the proof of optimality of A*, but using f'(n).

As Chen and I nutted out (mostly Chen):

f(n) = g(n) + h(n) = C*

Therefore a(C*) = a(f(n)) (a = alpha)

Therefore f'(n) = g(n) + a(h(n)) < a(C*), assuming a > 1.

c) Speed up the search by relaxing the problemâ€” trade speed for optimality.

d) If q is on the optimal path, then the algorithm won't find the optimal path since it won't be visited. Unless the algorithm visits node q, it won't know whether or not its optimal.

e) Relax constraints of the problem. For example, an algorithm could ignore walls in a map as its finding a path or allow chips to be placed on top of one another in a packing problem.

Question 6

(a) (on A Table) (clear B) -> (on A B)

(b) (on C Table) (clear A)

(c) Any intermediate step â€” (on B C)

(d) Yes. (fill out)

(e)

puton(c,d) interferes with newtower(D)

puton(A,B) threatens newtower(B)

puton(C,D) interferes with newtower(D)

Tom:

A couple of us are having a study sesh and found that in part e) there is a threat between the initial state and the action puton(A,B) caused by On(D,A) in the initial state and Clear(A) in the preconditions in the action. We were wondering if this is counted as a threat even though it is the initial state involved in the threat?

Will:

That isn't a threat for multiple reasons.

i) On(D, A) and Clear(A) cannot threaten each other. For a threat there has to be direct negation. e.g. On(D, A) and ~On(D, A).

ii) A state doesn't threaten an action. A postcondition threatens a link. In the case you mention there is no link in the plan that can be threatened. Now, at some future time a new link may be introduced, and then there may be a threat, but there is no such threat yet in the given plan.

(f) Heuristic = # threats - # goals satisfied

Question 7

a) If the probability of going toward the goal is high, then the algorithm could potentially spend a lot of time getting stuck in obstacles. If the probability is low, then the algorithm will spend a lot of time exploring without going to the goal. The optimal probability depends on the search space.

Question 10

a)

$Q(s,a) \leftarrow \sum\limits_{s'} [ P(s' | s,a) R(s' | s,a) ] + \gamma \sum\limits_{s'} [ P(s' | s,a ) V(s') ]$

$V(s') = \text{argmax}~Q(s,a)$

b)

Start by Driving Dirty (DD), select action Park and result in Parked Dirty (PD)

Q(DD, P) <- 1*0 + 0.9 * [ 1 * 6 ] = 0.9 * 6 = 5.4

We are explicitly updating the maximum action for the state DD, so even though 5.4 is less than 6, we still update Q(DD, P) to be 5.4.

c)

deltaV(DD) = old Q - new Q = 6 - 5.4 = 0.6

Therefore change in value function is 0.6

d)

**STEP 1:**

Priority queue initially contains all actions that transitions to DD.

So, the priority queue is: *(P = Priority)*

{ P(DD,D)=0.6, P(DD,C)=0.6, P(PD,D)=0.6, P(PC,D)=0.6 }

0.6 is set to ALL of them because it is the maximum Q value for the state they all transition to.

All 4 transition to DD, which changed value by 0.6.

Since all values are the same, we need to use a tiebreaker. We use the order in the table to decide which one we pop. (PD,D) is highest up so it is selected.

Pull off (PD,D)

Q(PD,D) <- 1 * 0 + 0.9 * (1 * 5.4)

Q(PD,D) <- 4.86

**BUT** Drive is NOT the maximum action for Parked Dirty and 4.86 < 6 (the maximum Q value for Parked Dirty) so the change in the Q value is 0. However, the value for Q(PD,D) will change.

Since there is no change to the value function, we don't need to add any new pairs to the priority queue and are done for this step.

**STEP 2:**

We removed (PD,D) last time, so the new queue is:

{ P(DC,D) = 0.6, P(DD,C) = 0.6. P(DC,D) = 0.6 }

We need to tie-break again and the first in the table is (DC,D)

We pop (DC,D)

Q(DC,D) = [0.1 * 0 + 0.9 * 1] + 0.9*[0.1 * 5.4 + 0.9 * 6]

Q(DC,D) = 0.9 + 0.9*[5.94]

Q(DC,D) = 6.246

Now, although DC,D is NOT the maximal action for DC, 6.246 is greater than the maximal value for DC which is 6. Therefore, we DO update a Q value, and set Q(DC,D) to 6.246.

We now look at the change in the value function

| deltaV(DC) | = 6 - 6.246 = 0.246

This is not zero, so we can add some nodes to the priority queue.

We get all the state action pairs that transition INTO DC, and add them to the priority queue with a priority of 0.246

The priority queue is now:

{ P(DD, D) = 0.6, P(DD,C) =0.6, (DC,C) = 0.246, (PC,D) = 0.246, (DC,D) = 0.246 }

**STEP 3:**

We need to tiebreak, and pick (DD,D) which is the highest in the table between the two pairs with priority 0.6.

We calculate the Q update:

Q(DD,D) = 1*0 + 0.9[1 * 5.4] = 4.86

4.86 is less than the maximum Q value for all the Driving Dirty state action pairs, so the value function doesn't change.

e)

The resulting priority queue is:

{ P(DD,C) =0.6, P(DC,C) = 0.246, P(PC,D) = 0.246, P(DC,D) = 0.246 }