SMU Data Science Review


Pitch selection in baseball plays a crucial role, involving pitchers, catchers, and batters working together. This practice, dating back to early baseball, has seen teams try various methods to gain an advantage. This research aims to use reinforcement learning and pitch-by-pitch Statcast data to improve batting strategies. It also builds on previous statistical work (sabermetrics) to make better choices in pitch selection and plate discipline. The dataset used, including over 700,000 pitches for each full season and 200,000 pitches for the COVID-shortened 2020 season, encompasses a wealth of crucial metrics including pitch release point, velocity, and launch angle. This study dives deep into player interactions and pitch behavior, seeking to find new ideas that could change how teams approach their offensive tactics. By analyzing player performance and applying advanced stats, this research hopes to uncover hidden patterns. To ensure accuracy in pitch type classification, a critical aspect of our analysis, we reclassified pitch types. By incorporating 15 distinct variables, ranging from release point coordinates to spin rates, we enhanced the granularity of pitch type identification. These variables were normalized and subjected to UMAP dimensionality reduction, resulting in the creation of 2D vector embeddings for each pitch. This methodology not only refines pitch classification but also unlocks a deeper understanding of player interactions and pitch behavior.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License