Case Studies
Find out how we've helped our clients and created movements that digital power transformation.
Read our Case studies, white papers, articles and more.
Data / Case Study
Daemon deploys production AI-based fuzzy matching to further scale YAGRO's data ingestion

Daemon develops rapid and successful POCs for four key use cases applying AI to YAGRO's data ingestion challenges, with a fuzzy matching use case deployed into production.
The client
YAGRO is a deeply data-driven company that aims to transform the food & farming industry through accessible, advanced data analytics. They provide a platform for farmers to store and manage all their farm records, gaining insights into sales, costs, chemical applications, anomalies, and efficiency monitoring for each harvest.
The challenge
Part of YAGRO's value proposition is that it adapts to its customers' data formats. YAGRO will ingest and produce valuable insight from it for its farmer users. The current process involves some manual data checks with humans in the loop as a part of a user-friendly automated ingestion using AWS Glue to profile and transform the data as it is ingested.
YAGRO has a goal to scale, but is constrained by the cost and complexity of manually handling new types of data, unstructured data, poorly formatted data, inconsistencies and anomalies. Although AWS Glue's data transform capabilities have proven to be great for their data wrangling needs, YAGRO sought to enhance their data ingestion practices through the application of Artificial Intelligence (AI) and Data Engineering best practices on AWS.
Daemon worked alongside YAGRO and identified the following challenge areas in YAGRO's current ingestion pipelines where the use of AI could potentially accelerate and automate their ingestion needs:
-
Extraction of unstructured data from new documents with unfamiliar formats
-
Reducing developer friction with each new data source onboarded using AI
-
Leveraging AI to figure out when incoming data entries seem potentially problematic (anomaly detection) and flagging them to users
-
Finding duplicate records in input data to flag potential data quality issues
-
Fuzzy data matching of input data with similar or unknown names to ensure consistent data and enable accurate insights.
Our approach
Daemon assisted YAGRO in utilising AWS services, such as Glue and Bedrock, to develop scalable, accurate, and cost-effective AI-driven solutions. This enabled YAGRO to further scale data ingestion by reducing the need for manual intervention. Daemon then took the most successful and promising POC (fuzzy matching) and productionised it for YAGRO, integrating it into YAGRO's custom Data Ingestion Engine, tuning it on YAGRO's data, and making it ready for scale.
Discovery
The engagement began with Daemon collaborating with YAGRO over a discovery phase, which uncovered the above use cases and translated YAGRO's business and data ingestion needs into possible AI solutions. Daemon successfully evaluated each use case through a POC (Proof of Concept). Each use case was fundamentally different from the others and provided YAGRO insights for the next area of focus and continued use of AI in their data ingestion suite.
Intelligent Document Processing
AWS Bedrock with Claude Sonnet was used to extract structured data from unstructured documents like PDF invoices and spreadsheet files. With the help of few-shot prompting, data augmentation, AWS Textract, and real-time LLM self-evaluation, we demonstrated how AI can be utilised as a reliable tool for automated data extraction, accompanied by an additional automated evaluation step to flag potential errors or complex cases, thereby requiring rare human-in-the-loop involvement to improve control over suspicious output.
Data Transformations
By leveraging AWS Bedrock and Claude Sonnet, Daemon created a proof of concept process for generating executable Python transformation scripts that convert unknown or complex data formats into YAGRO's required data formats compatible with their database data structures, with the aim of further automating the processing of unrecognised tabular input data.
Anomaly detection
Flagging potential anomalies is essential to ensure data quality problems are found and addressed, as well as address YAGRO's specific requirements, like the need to proactively identify critical anomalies such as chemical over-application, monitoring and optimising crop yield and growth factors, and detecting chemical mixture anomalies. Daemon first trialled AWS Glue Data Quality for real-time anomaly detection, comprehensively evaluated its performance, and subsequently created a detailed plan proposal for implementing a custom ML solution.
Duplicate finding
In order to flag potential duplicates, YAGRO implements sophisticated rules containing nuances that are difficult to model statistically. In addition to deterministic rules, some rules are written in plain English and used by a data analyst to search for possible duplicates manually.
Daemon first trialled traditional ML (Machine Learning) driven tools using AWS Lake Formation FindMatches, but found the quality of the results not sufficient for YAGRO’s use case. Daemon subsequently proposed and tested a fuzzy matching algorithm utilising an LLM via Amazon Bedrock. Using embeddings from AWS Titan and additional prompting rules defined in plain English, Daemon created a system to identify and flag duplicate records with great accuracy.
Fuzzy data matching
In order to be able to draw insights from their data, ensuring consistency across data uploads and to be able to draw insights from broader market data, farmers need incoming entities like crops, varieties and products to be matched to canonical records in YAGRO's data platform. As with duplicate finding, flagging and fixing matching errors is a mixture of deterministic rules and manual intervention and communication by YAGRO's data analysts.
Daemon used the same successful approach as with duplicate finding to match incoming data with their canonical entries in the database. The solution again used a two-phase approach. The first step utilised embeddings, and the second step is a choice of either a special purpose reranking model by Cohere (low cost) or a foundational LLM, Anthropic Claude Sonnet, to understand more sophisticated business rules (high performance).
This change was productionised for YAGRO and merged into YAGRO's Data Ingestion Engine, so that in the future, farmers can upload data with confidence, knowing the data will be recognised and ingested into the correct standardised records within YAGRO, ready to provide reliable insights. YAGRO now owns and controls the full solution, ready to apply to any similar use cases and deployed completely within YAGRO's systems.
The outcome
Following the successful POCs, identification of the proper AI methods for each use case, and the success of intelligent document processing, duplicate finding, and fuzzy matching, YAGRO now possess:
- Compelling evidence for an AI-driven roadmap. Successful POCs queued up: Intelligent document processing, duplicate finding and fuzzy matching showed YAGRO how AI could be built into their future.
- AI in production: With performance outperforming human-labelled data, YAGRO's fuzzy matching model for crops and varieties is now in production.
- Saving valuable analyst time, and improving customer experience: This automation frees up YAGRO's analysts from manual data cleaning, resulting in a faster, more reliable data onboarding experience for customers.
YAGRO now has a clear roadmap for their AI strategy to implement these successful approaches. This will enable them to significantly scale their data ingestion capabilities by reducing manual intervention, seamlessly onboarding new data types and volumes and unlocking deeper insights for farmer customers. This provides YAGRO with a competitive advantage through cost efficiency, yield optimisation, and safeguarding regulatory compliance and brand reputation.
Testimonial from YAGRO
“Working with Daemon has significantly elevated our data processing capabilities. The production deployment of AI-driven fuzzy matching has transformed how we validate and structure incoming data. Data accuracy is absolutely critical to the value we deliver to our customers, this collaboration has taken us leaps and bounds forward. Improving reliability, scaling our operations, and reducing the administrative burden and cost of data processing.”
Alex Hann, Senior Enterprise Product Manager, YAGRO Connect
Labels
AWSBedrock
LLMs / Large Language Models
Anthropic Claude
Cohere Rerank
ML / Machine Learning
Fuzzy matching
Deduplication
Data quality
Claude
IDP / Intelligent Document Processing
Related Resources
Cloud / Case Study
Smart Cloud migration you can trust
Cloud / Case Study
Managed Service transition during the COVID-19 pandemic
Cloud / Case Study
Sainsbury's GOL
If you’d like to know more about how we do things at Daemon
©2025 Daemon Solutions Ltd. | Company Number: 03442937 | VAT Number: 768365777 Paddington Clubhouse | Studio C, 21 Conduit Place, Paddington, London W2 1HS | United Kingdom