To all watchers of the libgdx repository: i’m terribly sorry and hope i didn’t interfer with your work in any way
This is meant as a cautionary tale about using Github’s API on a repository with quite a few watchers (460 in this case).
Earlier this year we migrated our code from Google Code to Github. We didn’t have a good migration plan for the 1200 or so issues back then, so we kept them on Google Code. We now have about 1700 issues on the tracker
Today i finally wanted to tackle the issue tracker migration, using a Python script i found on Github. The script requires one to specify a Github user account that owns the repository the issues will get migrated to. I did a dry run on a fork of the main repo using my Github account, fixed up some issues in the script, and validated things to the best of my abilities. Things looked good.
Then i ran it on the main repository. Luckily i was watching our IRC channel. After about 4 minutes, people started to scream. They each received 789 e-mails from Github. Every single issue i migrated, and every single comment of each issue triggered an e-mail notification to all watchers of the main repository.
This wasn’t apparent to me during the dry runs, as i used my own Github account. The script posts all issues/comments with the user account i supplied, so naturally, i did not get any notification mails.
I stopped the script after 130 issues (4 minutes), and immediately started sending out apologies and a mail to Github support, to which i haven’t received an answer yet. I sent roughly 300k mails through their servers in a matter of minutes. If i hadn’t watched IRC, i’d have send out about 4 million mails to 460 people within an hour.
Let me assure you that i’m extremely sorry about this incident. I know that things like this can interrupt daily workflows quite a bit, even if getting rid of those mails is not a Herculean task. I’d be rather upset if a repo maintainer pulled something like this on me. Please accept my deepest apologies.
The lesson for Github API users: think hard about the implications of automating tasks through the Github API if you have more than a few watchers.
The lesson for Github/API designers: consider safe-guarding against such issues in your API, in case other idiots like me pull off something similar in the future.
This is meant as a cautionary tale about using Github’s API on a repository with quite a few watchers (460 in this case).
Earlier this year we migrated our code from Google Code to Github. We didn’t have a good migration plan for the 1200 or so issues back then, so we kept them on Google Code. We now have about 1700 issues on the tracker
Today i finally wanted to tackle the issue tracker migration, using a Python script i found on Github. The script requires one to specify a Github user account that owns the repository the issues will get migrated to. I did a dry run on a fork of the main repo using my Github account, fixed up some issues in the script, and validated things to the best of my abilities. Things looked good.
Then i ran it on the main repository. Luckily i was watching our IRC channel. After about 4 minutes, people started to scream. They each received 789 e-mails from Github. Every single issue i migrated, and every single comment of each issue triggered an e-mail notification to all watchers of the main repository.
This wasn’t apparent to me during the dry runs, as i used my own Github account. The script posts all issues/comments with the user account i supplied, so naturally, i did not get any notification mails.
I stopped the script after 130 issues (4 minutes), and immediately started sending out apologies and a mail to Github support, to which i haven’t received an answer yet. I sent roughly 300k mails through their servers in a matter of minutes. If i hadn’t watched IRC, i’d have send out about 4 million mails to 460 people within an hour.
Let me assure you that i’m extremely sorry about this incident. I know that things like this can interrupt daily workflows quite a bit, even if getting rid of those mails is not a Herculean task. I’d be rather upset if a repo maintainer pulled something like this on me. Please accept my deepest apologies.
The lesson for Github API users: think hard about the implications of automating tasks through the Github API if you have more than a few watchers.
The lesson for Github/API designers: consider safe-guarding against such issues in your API, in case other idiots like me pull off something similar in the future.
No comments:
Post a Comment