Today I revisited some scripts I last touched on December 5, 2011 for very very carefully archiving research data with checksums, an audit trail, and other very very careful things like that.
One of the requirements for this project is that the first phase of my processing needs to accept input data from a provider. Unfortunately, this input format has never been the same twice. Grr.
Upon receipt of the second variation on July 12, 2011 (six days after I started the project), I took the time to make the script somewhat configurable with an external file.
This was handy in November 2011 when I needed to do a similar set of work for a second research dataset. I put everything in a configuration file stored alongside the input data. Date format strings, headers, fields of interest, key/values for data types, etc. That meant I could share code between datasets as they emerged from the wild.
So last week, I got another set of input data. Yep, another unique format. I haven’t thought about this in over a year, and I have a terrible memory. Today, I got the input data parsed and validated in five minutes after editing a config file, because:
- I had one place to do customization
- I took steps to encourage code reuse
- I wrote good comments and gave myself a
All this despite knowing that I was probably the only one who would ever look at this again. And I have those dates because everything is in a Subversion repository. Did I mention that I wrote it in a language I don’t know very well?
Granted, this is a tiny little thing in the universe of computer things, but my point here is that it’s often worth doing the right thing for the next guy, even for small things, even if the next guy is you. Perhaps especially if it’s you.