Figure 5From: Learning search polices from humans in a partially observable contextOverview of the decision loop. At the top, a strategy is chosen given an initial belief p(x0|z0) of the location of the end effector (initially through sampling the conditional). A speed is applied to the given direction based on the believed distance to the goal. This velocity is passed onwards to a low-level impedance controller which sends out the required torques. The resulting sensation, encoded through the multinomial distribution over the environment features, and actual displacement are sent back to update the belief.Back to article page