« Ethiopia Guide Redux | Main | Word Clouds for Different Varietals »

July 18, 2011


Feed You can follow this conversation by subscribing to the comment feed for this post.

Shawn Steiman

Aloha Daniel,

What fun stuff you're doing! Who are you working with in PR? I'd love to contribute brain power if you need it. It is really fantastic that you're amassing this data and I commend your efforts!

Your data is fun but it is hard to accept without some R^2. There's some serious spread going on in those graphs and the lines, particularly with the ones split by variety. While certainly trending upwards, they don't seem to be glorious correlations. Is that some data you're willing to share, currently, or must we await patiently for it?

I'd love to discuss with you about the cupping procedure. I think your correlations are promising but are confounded by the nature of the SCAA cupping form. If you're keen to discuss it, drop me a line!

Well done!
Shawn Steiman

Daniel Humphries

Hi Shawn,

Thanks for the kind words. Yes, I was wondering how many people out there would comment on that, or even notice it.

There's more data than just what I'm showing here, and we're not drawing strong conclusions from any of the stuff you see here, for precisely the reasons you point out. The variance is actually not as severe as it looks in these graphs, though it's still not very neat. The cupping scores are means of several different sessions/cuppers.

There are a couple of problems, from a data-analysis point of view. One may be the cupping forms; though we didn't use the SCAA forms we used something similar in concept. I'm interested to hear what you think in terms of alternatives. But the other problem — the main problem in my opinion — is that there are so many possible factors that influence flavor, literally dozens of potential variables, each of which we attempted to control for... but seriously, if you want to control for that many variables you will need a ton of data points. But each one of these data points represents an actual person who collected the cherries on-site on a hillside in Puerto Rico. In order for him to get those cherries, he first had to call each farmer one-by-one, trying to catch them when they are available, explain to them why they should take time out from their busy schedules to participate in this project, find out when the cherries were going to be harvested, drive out there on the appropriate day, pick the cherries, get them back to the research station and make sure they get processed immediately, and according to the standards that we set. Frankly, I'm amazed that we got 63 samples. 30 seems ambitious when you think about it. But to get really strong data, we would need hundreds of coffees and dozens of cuppers, in my opinion.

So our way around this is to basically look at as many categories as we can and try to see where trends are stronger than in other categories, and draw our conclusions based also on what we already know from experience. The data will all be available eventually, but I don't have a publish-friendly version right now.

As for who we are working with, the funding and on-ground support comes through the USDA in Puerto Rico, and is based out of the agricultural research station in Jayuya. My goal, actually, with all the stuff you see here, is to show that this kind of analysis is possible, to get our feet wet, so to speak, and to go back in the coming year and collect a lot more samples and correct some other little problems we have had. I also plan to do about double the amount of cupping (both instances and cuppers). And having the information we already have from this year's crop (correlations weak or strong), we'll be better able to zoom in on what we suspect are the important distinguishing factors.

Basically the way I see people doing things now is: they will cup many coffees from many regions of, say, Kenya. Then they will say, "Well coffee from region xyz in Kenya is consistently better because of it's citrus flavor [or whatever]. We believe it's because of the particular iron content in the soil there." Now those cuppers, if they are experienced, are probably right about this. But all they can do is point to their hunches.

My expectation for this project (and it's basically done) is that we're going to end up with some "hunches-plus". That is, we won't be able to prove definitively our conclusions (besides some very trivial points we could make about mold or something that's obvious anyway), but we will be able to support or not-support (depending on the case) the way cuppers talk about the coffee already. I'd be very interested to do this again with a larger sample set, draw some instantaneous conclusions (day-of or day-after) and then set up the samples again the next day by categories that the cuppers claim are important and see if they really match up their expectations with ours.

Based on your CV, I bet you could give me some more insight beyond this. Feel free to drop me an email, and thanks for the comment and kind words!

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter