Too many Github notifications, too little time. The obvious answer there is to spend lots of time creating a hyper personalised prediction engine that can tell me what I’m interested in. And learn a whole bunch of stuff on the way. This is a tongue-in-cheek experiment, which resulted in a realisation that I’m pretty unpredictable.
See it in action (predictions for the
chillu github user):
- Collect Github events from each repo the viewer has previously interacted with
- Score each issue and pull request based on the amount of interactions (if any)
- Train a neural network with both categorical and continuous data, with a regression learner
- Provide a prediction service for this user
The input parameters are sourced from https://githubarchive.org, a ~6TB data set of every Github event ever created. The data is accessible via Google BigQuery. We’re only interested in events related to repositories that the user has previously interacted with. In my case, this got the training data set to about 20k rows.
See notebook/learn.ipynb for the BigQuery queries run to retrieve the parameters.
See notebook/learn.ipynb for a (non-interactive) snapshot of the training process.