Artificial intelligence (AI) is progressively changing techniques of teaching and learning. In the past, the objective was to provide an intelligent tutoring system without intervention from a human teacher to enhance skills, control, knowledge construction, and intellectual engagement. This paper proposes a definition of AI focusing on enhancing the humanoid agent Nao’s learning capabilities and interactions. The aim is to increase Nao intelligence using big data by activating multisensory perceptions such as visual and auditory stimuli modules and speech-related stimuli, as well as being in various movements. The method is to develop a toolkit by enabling Arabic speech recognition and implementing the Haar algorithm for robust image recognition to improve the capabilities of Nao during interactions with a child in a mixed reality system using big data. The experiment design and testing processes were conducted by implementing an AI principle design, namely, the three-constituent principle. Four experiments were conducted to boost Nao’s intelligence level using 100 children, different environments (class, lab, home, and mixed reality Leap Motion Controller (LMC). An objective function and an operational time cost function are developed to improve Nao’s learning experience in different environments accomplishing the best results in 4.2 seconds for each number recognition. The experiments’ results showed an increase in Nao’s intelligence from 3 to 7 years old compared with a child’s intelligence in learning simple mathematics with the best communication using a kappa ratio value of 90.8%, having a corpus that exceeded 390,000 segments, and scoring 93% of success rate when activating both auditory and vision modules for the agent Nao. The developed toolkit uses Arabic speech recognition and the Haar algorithm in a mixed reality system using big data enabling Nao to achieve a 94% success learning rate at a distance of 0.09 m; when using LMC in mixed reality, the hand sign gestures recorded the highest accuracy of 98.50% using Haar algorithm. The work shows that the current work enabled Nao to gradually achieve a higher learning success rate as the environment changes and multisensory perception increases. This paper also proposes a cutting-edge research work direction for fostering child-robots education in real time.