END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?
DATA SET STORY:
- This dataset was originally taken from the StatLib library at Carnegie Mellon University.
- This is part of the data that was used in the 1988 ASA Graphics Section Poster Session.
- The salary data were originally from Sports Illustrated, April 20, 1987.
- The 1986 and career statistics were obtained from The 1987 Baseball Encyclopedia Update published by Collier Books, Macmillan Publishing Company, New York.
A data frame with 322 observations of major league players on the following 20 variables.
- AtBat: Number of times at bat in 1986-1987 season
- Hits: Number of hits in 1986-1987 season
- HmRun: Number of home runs in 1986-1987 season
- Runs: Number of runs in 1986-1987 season
- RBI: Number of runs batted in 1986-1987 season
- Walks: Number of walks in 1986-1987 season
- Years: Number of years in the major leagues
- CAtBat: Number of times at bat during his career
- CHits: Number of hits during his career
- CHmRun: Number of home runs during his career
- CRuns: Number of runs during his career
- CRBI: Number of runs batted in during his career
- CWalks: Number of walks during his career
- League: A factor with levels A and N indicating player’s league at the end of 1986
- Division: A factor with levels E and W indicating player’s division at the end of 1986
- PutOuts: Number of put outs in 1986-1987 season
- Assists: Number of assists in 1986-1987 season
- Errors: Number of errors in 1986-1987 season
- Salary: 1996-1987 annual salary on opening day in thousands of dollars
- NewLeague: A factor with levels A and N indicating player’s league at the beginning of 1987