libertysnake
.338 Win Mag
This is why you limit what you post online.
www.theguardian.com
So, she's not just using pictures and doing searches based on those pictures. She's basically using vast amounts of third party data, from various sites, and performing analysis on it. In terms of a breakdown of what is going on here:
This is why I'm absolutely against weaponizing data science for this purpose. It's 1984 type shit.

The data scientist exposing US white supremacists: 'This is how you fight Nazis'
After surviving far right violence in Charlottesville, Emily Gorcenski has tracked the Proud Boys and other extremist groups

So, she's not just using pictures and doing searches based on those pictures. She's basically using vast amounts of third party data, from various sites, and performing analysis on it. In terms of a breakdown of what is going on here:
- Data is being pulled from websites, either via crawling them and scraping / archiving posts, or (more likely) she's pulling posts via an API (Application Protocol Interface) from websites like Facebook, Reddit, etc.
- Data is then transformed into a format that can be processed. It's probably piped into a holding place like Kafka / AWS Kenesis before it's pulled into what's called an ETL (Extract, Transform, Load) pipeline. Normally it means you're taking the data in, removing garbage info you don't want, and renaming some data so it's all uniform before loading it into a large database for searching quickly.
- Next the data is processed. Normally you'll aggregate information and create some useful tables (basically spreadsheets) of information so you can identify bigger trends, and then use those bigger trends to limit what data you have to sort through for your answers. In this case she may be looking like where the word "MAGA" is used most often to narrow down what communities to search / do analysis into, or something very high level / superficial.
- Last is analysis. Sometimes Datascientists do this this directly, but more often they'll pass it to machine learning (IE, something like AWS Sagemaker / Datalab (google) / Databricks). This could be looking for correlations / word usage. Something like "The people on a certain right wing website use this terminology. What individuals on facebook have the highest incidences of using those same words out of the target communities from step #3?" And based on that trying to narrow down identities by probability.
This is why I'm absolutely against weaponizing data science for this purpose. It's 1984 type shit.