In response to the challenges posed by the dynamic post-COVID market environment, which rendered the existing DeepFM-based price recommendation system at Mercari less effective due to its inability to adjust quickly to rapid changes in pricing behaviors, I spearheaded the development of a novel solution. Recognizing the limitations of frequent retraining of the DeepFM model and the extensive overhaul required for real-time training data pipelines, I proposed and created a proof of concept for a more agile and adaptive approach. By integrating a Multi-Armed Bandits (MAB) based reinforcement learning model atop the existing DeepFM framework, this solution was designed to capture shifts in pricing behavior in real-time. This hybrid model significantly enhanced the platform's ability to offer better, more personalized price suggestions, leading to a +6% increase in average Gross Merchandise Value (GMV) and a +1.2% improvement in Sell Through Rate (STR), alongside faster selling speeds for listings. This initiative not only demonstrated the feasibility of augmenting traditional models with reinforcement learning for improved responsiveness but also established a scalable, efficient pathway to adapt to rapidly changing market conditions without necessitating major infrastructural changes.