Despite recent advances in video-guided robotic imitation learning, many methods still rely on human experts to provide sparse rewards that indicate whether robots have successfully completed tasks. The challenge of enabling robots to autonomously evaluate whether their actions can complete complex, multi-stage tasks remains unresolved. In this work, we propose an efficient few-shot robotic learning algorithm that centres around learning and evaluating from a third-person perspective to address the aforementioned challenge. We develop a novel Siamese neural network-based robotic action-state evaluation system, named 'Behavior-Outcome Dual Assessment' (BODA), in our robotic imitation learning system, so as to replace artificial evaluations from human experts in multi-stage imitation learning processes and to improve learning efficiency. In this way, one video demonstration of a target task is divided into several stages. For each stage, we design two Siamese neural network-based evaluation modules in BODA: One module focuses on action changes, and the other handles working environment changes. The two modules work together to provide a comprehensive assessment of the robot's completion of each stage from the view of both the action and working environment changes. Then, BODA is integrated within a model-based reinforcement learning framework to enable the completion of our imitation learning cycle. Extensive experiments demonstrate that the evaluation processes of BODA can automatically and accurately evaluate task completion status without human intervention. In contrast to conventional methods, BODA is able to keep the accumulation of errors within acceptable limits through self-assessment in stages.