Calling some code “Research-grade” is a euphemism for “it’s a mess”. It happens because we generally expect that we’re going to write this code once, run our experiments, and then throw it away. But that’s not how research works: we’re often working out the exact hypothesis to be tested while simultaneously developing the techniques needed to verify it.
It is often the case that, in pursuing a single hypothesis, we have to develop and discard a series of sub-hypotheses and techniques before we find the full set that works. In that sense, we’re writing code to reach a target that constantly changes, sometimes even before we’re reached it. Parts of code become outdated, and functions you are writing change what they need to do, sometimes even before a previous rewrite is complete.
It’s crucial that we can trust our code as we run experiments. A false-negative experiment at the wrong time will cost you time, and could cost you a paper, an entire discovery, or more. Testing is the only good way we have to believe what your code tells you.
When stuck with some incredibly complicated research code, I wrote a short guide to testing in research, and how to get neural networks to actually learn something. You can find it here. If you’d like to make changes, you can find an editable version online here. If you use it, I’d appreciate a link back!