Figure 12From: Learning by imitation with the STIFF-FLOP surgical robot: a biomimetic approach inspired by octopus movements Convergence of the GMR search process. (a-f) A sample search in the augmented goal-policy space ζ is plotted for six selected self-refinement iterations. Here, only one dimension for ζ O and ζ I is depicted, but we have ζ O ∈ ℝ 36 and ζ I ∈ ℝ 9 . The gray dots show the initial set of iterations and the black dots are the new estimated policy. The blue line represents the known goal of the task with highest reward on which the red cross shows the best achieved policy after convergence.Back to article page