Thursday, June 22, 2017

computation allocation

I occasionally need to manipulate a bunch of data, either to input into another program or to pick out trends/issues, that are for whatever reason not easily reducible or have issues that are not immediately obvious. And sometimes this data manipulation takes the form of tedious and relatively simple excel crunching to simplify things.

I have a couple of options here:

1. I can offload the manipulation to an intern/low level scientist, give them extensive directions, set them loose with the first iteration, and then look at what they have and have them refocus/redo a bunch of stuff. They will spend an exceptionally long time doing this, but they have a very low billing rate.

2. I can give the data to my data manipulation colleague, who will do some sort of macro/program building magic. I will get the data back relatively quickly, but it's a bit of a black box and I will need to go through her results and figure out what might have been missed/what didn't sort correctly. My colleague has a very high billing rate, and depending on how much massaging is required, it may take a while to get set up.

3. I go ahead and do all the crunching myself, even though it's tedious and surely there must be a better way for me to get what I need, because I need the data evaluated now, not when the cheap staff member or the expensive specialist are available.

If I have a forgiving schedule and the data set isn't ridiculously large (can be conveyed in one spreadsheet file of less, than, say, 4 MB), I go with #1. If the data set starts to get out of control, I either go with #2 entirely (less often) or use #2 to cut out the data that I'm sure I don't need and focus on what I think I may need. But at crunch time, when it looks like it would just take me a day or so, it's all me.

No comments: