Here are some of the results I've compiled of our large project classifying soil types and investigating the influence of soil, climate, and varietal on cup quality in Puerto Rico.
These graphs and comments are part of a much larger report, with databases and maps, which will be released soon. Just a little preview, which I hope you find interesting. The purpose of this project is to help farmers improve quality and prices in Puerto Rico. All the data collected, and all of the analysis, will be published and distributed to farmers (in Spanish and Enlgish), so that they can put our findings into action for their own benefit.
How did I arrive at these data points? The altitude and varietal information was provided by the USDA researchers in Puerto Rico. Each sample was collected individually, and sample collectors made a GPS recording on-site, standing next to the trees from which the cherries were taken. These readings include latitude, longitude, and altitude above sea level. The cupping scores are the result of a week of extensive blind cupping done by a team of Q-graders (myself included) in California, in May. Each sample was cupped multiple times, in random order, and the scores you see here are means taken from all the individual scores recorded.
The portion quoted here refers to other sections of the document which are not publicly available yet. Sorry, you'll just have to wait!
(click on graphs to embiggen)
Altitude
There is a strong correlation between the quality of the samples and the altitude at which the coffee was grown at. As one would expect, generally speaking, the higher the coffee was grown, the higher it scored on the cupping table. This is one of the strongest relationships we found in all the data.
While not all of the high-grown coffees were in the very upper echelon of cupping scores, most of them were. And the relationship is even stronger at the low end of the spectrum. All of the lowest scoring coffees belonged in the lower altitude categories.
The following graph illustrates the relationship between coffee quality and altitude, across all varietals and soil types:
This graph shows an average gain in quality of just over 2 points from the lowest altitudes to the highest.
Altitude by varietal
One of the most interesting findings in all of the data we collected and analyzed was that improvements due to altitude were much stronger when certain varietals were used. This leads us to the conclusion that producers at higher altitudes would benefit even more by switching to the preferred varietals.
Once again, the average improvement (shown above) for higher altitudes was just over 2 points.
The next graph shows what that improvement looked like when we limit the analysis to just limaní, fronton, and catimor (three low-scoring varietals):
We can observe the same general upward trend that we saw in the comprehensive data set. However, if we read the graph closely, we can observe that the improvement is much less dramatic. In fact, there is barely 1 point of quality improvement in over 2000 feet of altitude increase. Producers who plant these varietals (fronton, limani, and catimor) at high altitude are not receiving the full benefit of their natural altitude advantage.
Let us now contrast this with the data from a different subset, using the high-scoring varietals pacas, bourbon, and caturra.
Once again, we see the expected increase in quality as altitude increases. However, in this case, the increase is far more dramatic. The low-altitude coffees score just above 80 points. But the high-altitude coffees are nearly at 84. That is nearly a 4 point gain in cup quality due to altitude, the kind of quality gains that tend to bring much higher prices in the specialty market.
Producers who are planting pacas, bourbon, and caturra at high altitude are getting a much better return on the natural advantage of high altitude.
Conclusion: As we saw in the first section, all coffee producers can expect an increase in quality by switching to higher-quality varietals. However, this switch is even more crucial for higher altitude producers. The higher the altitude, the more benefit producers can see from using these better varietals. Producers who plant lower quality varietals at higher altitudes are missing out on the huge benefit that altitude can provide.
More to come...
Aloha Daniel,
What fun stuff you're doing! Who are you working with in PR? I'd love to contribute brain power if you need it. It is really fantastic that you're amassing this data and I commend your efforts!
Your data is fun but it is hard to accept without some R^2. There's some serious spread going on in those graphs and the lines, particularly with the ones split by variety. While certainly trending upwards, they don't seem to be glorious correlations. Is that some data you're willing to share, currently, or must we await patiently for it?
I'd love to discuss with you about the cupping procedure. I think your correlations are promising but are confounded by the nature of the SCAA cupping form. If you're keen to discuss it, drop me a line!
Well done!
Shawn Steiman
Posted by: Shawn Steiman | July 22, 2011 at 10:13 PM
Hi Shawn,
Thanks for the kind words. Yes, I was wondering how many people out there would comment on that, or even notice it.
There's more data than just what I'm showing here, and we're not drawing strong conclusions from any of the stuff you see here, for precisely the reasons you point out. The variance is actually not as severe as it looks in these graphs, though it's still not very neat. The cupping scores are means of several different sessions/cuppers.
There are a couple of problems, from a data-analysis point of view. One may be the cupping forms; though we didn't use the SCAA forms we used something similar in concept. I'm interested to hear what you think in terms of alternatives. But the other problem — the main problem in my opinion — is that there are so many possible factors that influence flavor, literally dozens of potential variables, each of which we attempted to control for... but seriously, if you want to control for that many variables you will need a ton of data points. But each one of these data points represents an actual person who collected the cherries on-site on a hillside in Puerto Rico. In order for him to get those cherries, he first had to call each farmer one-by-one, trying to catch them when they are available, explain to them why they should take time out from their busy schedules to participate in this project, find out when the cherries were going to be harvested, drive out there on the appropriate day, pick the cherries, get them back to the research station and make sure they get processed immediately, and according to the standards that we set. Frankly, I'm amazed that we got 63 samples. 30 seems ambitious when you think about it. But to get really strong data, we would need hundreds of coffees and dozens of cuppers, in my opinion.
So our way around this is to basically look at as many categories as we can and try to see where trends are stronger than in other categories, and draw our conclusions based also on what we already know from experience. The data will all be available eventually, but I don't have a publish-friendly version right now.
As for who we are working with, the funding and on-ground support comes through the USDA in Puerto Rico, and is based out of the agricultural research station in Jayuya. My goal, actually, with all the stuff you see here, is to show that this kind of analysis is possible, to get our feet wet, so to speak, and to go back in the coming year and collect a lot more samples and correct some other little problems we have had. I also plan to do about double the amount of cupping (both instances and cuppers). And having the information we already have from this year's crop (correlations weak or strong), we'll be better able to zoom in on what we suspect are the important distinguishing factors.
Basically the way I see people doing things now is: they will cup many coffees from many regions of, say, Kenya. Then they will say, "Well coffee from region xyz in Kenya is consistently better because of it's citrus flavor [or whatever]. We believe it's because of the particular iron content in the soil there." Now those cuppers, if they are experienced, are probably right about this. But all they can do is point to their hunches.
My expectation for this project (and it's basically done) is that we're going to end up with some "hunches-plus". That is, we won't be able to prove definitively our conclusions (besides some very trivial points we could make about mold or something that's obvious anyway), but we will be able to support or not-support (depending on the case) the way cuppers talk about the coffee already. I'd be very interested to do this again with a larger sample set, draw some instantaneous conclusions (day-of or day-after) and then set up the samples again the next day by categories that the cuppers claim are important and see if they really match up their expectations with ours.
Based on your CV, I bet you could give me some more insight beyond this. Feel free to drop me an email, and thanks for the comment and kind words!
Posted by: Daniel Humphries | July 23, 2011 at 01:23 PM