Analyzing Classroom Images Locally with R and LLMs
We are almost done with our new book Computational Social Science Cookbook, a collaborative open-access book aimed at education researchers working with, or looking to expand their research toolkits with, computational methods. Among many other things, I led the chapter on using local (or cloud) LLMs to conduct methodological image analyses. The chapter covers something I think is underused in the field: analyzing classroom images with vision-language models in R.
The chapter is practical and self-contained, like the rest of the book, which allows researchers to easily follow the steps and repeat the analysis for their own datasets. If you have R and Ollama installed, you can follow along and run the code. But the method is worth a brief argument, because most researchers who haven’t tried image analysis usually don’t do so because they are not aware of how easy it can be and how rich the results can be. This point is actually at the heart of the book, there are “things” that can be data beyond what we traditionally consider, and these can lead to new research ideas and questions. There are of course barriers, and they are mostly technical. Of course, privacy is a big concern that local models now largely solve.
Images are hard to analyze at scale, too (another reason we don’t consider them for research). Manual coding is slow, expensive, and hard to replicate. And in educational contexts specifically, school photos contain students. Sending them to a commercial API raises real privacy and IRB concerns that most researchers are not willing to navigate.
So how can this analysis can be used (for educational research)? A researcher can record classroom interactions. These visuals can be used to answer questions about instructional format, spatial arrangement, student engagement. This was the example used in the book. Although the example focuses on student engagement, it is up to the researchers to imagine new ways how this method can apply to any research question that can be answered with visual data. For the chapter’s worked example, we used classroom photos pulled from Wikimedia Commons. I ran the workflow across three questions: instructional format, group composition, and student engagement. The results were accurate and easy to verify, and it processed the images far faster than manual coding would allow. There are of course limitations. The model is not perfect, and the results are not a substitute for careful qualitative analysis. But they are a useful starting point, and a way to get more out of visual data than we have in the past. The book also has a full section on using LLMs for qualitative data analysis directly, with a responsible use framework built around correctness, transparency, and reproducibility. More on that here.
These analysis were possible through the {kuzco} package, which wraps Ollama’s local model inference and exposes four functions: classification, object recognition, sentiment estimation, and a free-form custom prompt. About a dozen lines of code gets you from a folder of classroom photos to structured, analyzable output. Shout out to Frank Hull, the author of the package, for making this so easy.
Visual data has been treated as supplementary in educational research mostly because analyzing it was impractical, it was a niche. That is no longer true. If you work with classroom images, observational photos, or any kind of visual data (education or not) the chapter is a working starting point in R. The book is open access, so you can read it online for free, and the code is all available on GitHub. I hope this encourages more researchers to consider visual data in their work, and to use local LLMs to analyze it in a way that is ethical, practical, and insightful.
Wang, W., Akcaoglu, M., Rosenberg, J., & Kellogg, S. (2026). Computational Analysis of Educational Data: A Field Guide Using R.