An example of reward-weighted regression. (From left to right) Subset of iterations during the refinement algorithm. At each iteration, a new Gaussian distribution, depicted by the green ellipse, is fit to the most promising augmented dataset (policy parameters and goal). A regression-based exploration strategy is then used in the augmented-space ζ to iteratively find better policies to achieve the goal. For the regression, we assume that we know the desired goal (blue vertical bar ζ
∗) but we do not know how to achieve it (namely, the input of the regression is the desired goal).