Modular self-reconfigurable robots (MSRs) are mostly used in environments where it is difficult to navigate and explore otherwise. Especially, the shape-changing ability of MSRs makes them more dexterous in these situations compared to fixed-body robots. But when the MSR forms a new configuration, usually, the locomotion pattern for that particular configuration is not known by its constituting robotic modules. The main challenge for modules is to learn how to move in that specific configuration within a reasonable amount of time. In this paper, we study the problem where an MSR needs to learn its movement pattern on-the-fly. To solve this problem, we have proposed a game theoretic solution based on multi-agent reinforcement learning using which the constituting modules distributedly learn the best actions that they need to perform to travel more distance in less time. We have implemented this approach in simulation on both ModRED and Yamor MSR platforms. Results show that our approach performs better (up to 7.86 times) in terms of average speed achieved for most of the tested configurations as compared to an existing locomotion learning approach.