Tidy number dating
I'm with Hilary on this one, you should make sure your data is tidy. In episode 11 of Not So Standard Deviations, Hilary and Roger discussed their typical approaches.Before you do any plots, filtering, transformations, summary statistics, regressions...Without a tidy dataset, you'll be fighting your tools to get the result you need.That's the essence of tidy data, the reason why it's worth considering what shape your data should be in.It's about setting yourself up for success so that the answers naturally flow from the data (just kidding, it's usually still difficult. Let's assign that back into our Data Frame date variable 2015-10-28 away_team 0.000000 home_team 0.000000 2015-10-29 away_team 0.333333 home_team 0.000000 2015-10-30 away_team 1.083333 ...What's the effect (in terms of probability to win) of being the home team?We need to create an indicator for whether the home team won.
Each month slips in an extra row of mostly Na Ns, the column names aren't too useful, and we have some dtypes to fix up.
The structure Wickham defines as tidy has the following attributes: Through the following examples extracted from Wickham’s paper, we’ll wrangle messy datasets into the tidy format.
The goal here is not to analyze the datasets but rather prepare them in a standardized way prior to the analysis.
With a tidy dataset, it's relatively easy to do all of those.
Hadley Wickham kindly summarized tidiness as a dataset where This Stack Overflow question asked about calculating the number of days of rest NBA teams have between games.