Building AI systems that organisations actually rely on
Published on
Reading Time
4 mins
The AI landscape has transformed dramatically over the past year. Tools like Claude's computer use capabilities, ChatGPT's advanced data analysis, and emerging protocols like MCP (Model Context Protocol) have made AI-data integration remarkably accessible. Yet there's a crucial gap between quick prototypes and production systems that organisations can genuinely depend on. Building the Abena AI assistant for the race Equality Foundation (REF) and Strengthening Families, Strengthening Communities team taught me exactly where that gap lies and what it takes to bridge it.
This project built on my experience creating AI systems in commercial environments, but presented entirely different challenges. While the loveholidays AI assistant operated within established enterprise infrastructure and well-funded technical environments, building for a non-profit required solving performance, security, and usability challenges with significant resource constraints. The journey revealed that while the barriers to AI experimentation continue to fall, creating reliable, secure, and genuinely useful AI systems still requires deep technical understanding alongside thoughtful design.
Why off the shelf solutions weren't sufficient
When I began working with REF, the obvious question was whether existing tools could meet their needs. ChatGPT with file uploads offered basic document interaction, but fell short in several critical areas. Non-profits handle sensitive community data requiring strict access controls. Their staff need rapid access to organisational knowledge while working within tight resource constraints. Most importantly, they needed AI that understood their specific context rather than providing generic responses.
The emerging MCP ecosystem promised easier integrations, yet still lacked the performance optimisation, security controls, and organisational-specific customisation that REF required. Generic semantic search struggles with organisational document structures. Basic conversation interfaces don't support the multi-threaded knowledge work that characterises non-profit operations. Simple file upload systems can't handle the hybrid search requirements needed for complex organisational knowledge retrieval.
These limitations weren't just technical inconveniences. They represented fundamental barriers to creating genuinely useful AI systems that could integrate into daily workflows and provide reliable value over time.
Understanding the design challenge
The design process began with deep research into how REF staff actually work. Grant applications require synthesising information from multiple document types while maintaining consistency with organisational messaging. Strategic planning needs both internal context and external intelligence. Daily operations benefit from rapid access to policy documents, previous research, and institutional knowledge.
These workflows revealed requirements that generic AI tools couldn't address. Staff needed conversation threads that persisted across sessions and could be shared with colleagues. Search needed to consider document metadata, organisational hierarchy, and temporal relevance alongside semantic similarity. The interface required progressive disclosure that could surface relevant information without overwhelming users with options.
Most critically, the system needed to feel trustworthy and reliable rather than experimental. Non-profit staff can't afford to waste time on unreliable tools or risk sensitive information through inadequate security controls.
Advanced retrieval architecture
Meeting these requirements demanded sophisticated technical implementation that went well beyond basic semantic search. I developed a hybrid retrieval system combining FAISS vector similarity with FTS5 full-text search, using Reciprocal Rank Fusion to combine results intelligently.
The vector search uses optimised HNSW algorithms configured for the memory constraints of a 4GB server environment. Document chunking follows organisational structure rather than arbitrary token limits, preserving context that matters for grant writing and policy work. Metadata extraction captures document titles, modification dates, and organisational hierarchy to inform search ranking.
The full-text search implements proximity operators, term weighting, and phrase matching that semantic search alone misses. When someone searches for specific policy references or exact quotes needed for grant applications, keyword precision matters as much as conceptual similarity.
Custom re-ranking algorithms then combine both approaches while considering organisational-specific signals: document recency for policy updates, title relevance for official documents, and source hierarchy for institutional knowledge. This creates search results that understand organisational context rather than just semantic similarity.
Production deployment and performance optimisation
Moving from prototype to production required solving challenging performance and reliability constraints. The system needed to run efficiently on modest server resources while maintaining sub-five-second response times for complex queries.
Memory optimisation became crucial when working within 4GB RAM limits. I implemented dynamic context window sizing that adapts to available resources, efficient database connection management, and careful FAISS parameter tuning to balance accuracy with memory usage. Vector indices use optimised storage formats and smart caching to minimise memory footprint while preserving search quality.
The deployment stack includes Nginx reverse proxy configuration, SSL certificate management through Let's Encrypt, and systemd service management for reliable process supervision. Comprehensive logging captures performance metrics and error conditions for ongoing optimisation.
Error handling implements graceful degradation when system resources are constrained. Rather than failing completely, the system reduces context window size, simplifies search parameters, and provides fallback responses that maintain functionality under stress.
Security and organisational integration
Non-profits require enterprise-grade security despite operating on startup budgets. I implemented domain-restricted Google OAuth that limits access to specific organisational domains while supporting seamless single sign-on integration.
Session management includes proper CSRF protection and secure cookie configuration. Database design enforces user isolation through foreign key constraints and access controls. Input validation prevents injection attacks while supporting natural language queries.
The conversation management system maintains persistent multi-threaded discussions with proper user attribution and access controls. Staff can create, share, and collaborate on conversation threads while maintaining audit trails for sensitive discussions.
File upload handling supports contextual document analysis while implementing proper validation and security scanning. Users can upload meeting notes or policy documents to enhance conversation context without compromising system security.
LangChain agent framework and tool orchestration
The conversational interface uses a sophisticated LangChain agent that orchestrates multiple specialised tools based on query analysis and organisational context. The InternalDocumentSearch tool handles organisational knowledge retrieval through the hybrid search system. WebSearch integration provides external intelligence when internal documents are insufficient.
Tool selection happens through careful prompt engineering that helps the agent understand when to search internal documents versus external sources, how to combine multiple information sources, and when to acknowledge limitations rather than providing uncertain responses.
The agent maintains conversation context across multiple turns while managing memory efficiently. Previous conversation history informs current responses without overwhelming the language model's context window. Source attribution helps users verify and expand on AI responses while building trust in the system's reliability.
Interface design for organisational workflows
The web interface balances sophistication with accessibility, supporting both casual queries and complex research workflows. The conversation management system allows users to organise discussions by project, topic, or timeframe while maintaining searchable history.
Progressive disclosure reveals relevant information without overwhelming users. Search results include clear source attribution and direct links to original documents. Conversation threads support collaboration while maintaining individual workspace organization.
Mobile optimisation ensures staff can access organisational knowledge while travelling or working remotely. Responsive design adapts to different screen sizes while preserving full functionality across devices.
Performance results and organisational impact
The production system consistently delivers sub-five-second response times for complex queries across hundreds of organisational documents. Search accuracy significantly exceeds basic semantic search through the hybrid retrieval approach and organisational context understanding.
Staff adoption exceeded expectations, with regular use across grant applications, policy research, and strategic planning workflows. The conversation threading system supports collaborative work while maintaining individual productivity gains.
System reliability has proven robust under real-world conditions, with graceful handling of resource constraints and comprehensive error recovery.
Technical lessons and transferable insights
Building Abena revealed several technical insights that apply broadly to organisational AI implementation. Performance optimisation requires understanding both algorithmic efficiency and real-world resource constraints. Security implementation needs enterprise-grade controls delivered through accessible interfaces.
Retrieval architecture benefits significantly from combining multiple search approaches rather than relying on semantic similarity alone. Organisational context provides crucial signals for search ranking and result presentation that generic systems miss.
Agent framework design requires careful attention to tool selection, context management, and error handling that goes beyond basic language model integration. Production deployment demands comprehensive infrastructure management alongside sophisticated AI capabilities.
Beyond technical implementation
The most important insight from this project relates to the relationship between technical sophistication and user adoption. Users don't engage with AI systems because they're technically impressive. They adopt tools that integrate naturally into existing workflows while providing reliable value over time.
This requires design thinking that considers both technical capabilities and organisational dynamics. The most sophisticated retrieval algorithms are worthless if the interface doesn't support how people actually work. The most secure systems fail if staff can't understand how to use them effectively.
Successful AI implementation demands both technical depth and design sensitivity, understanding how technology choices impact user experience and organisational adoption.
The evolving landscape and persistent challenges
While tools like MCP continue to lower barriers to AI experimentation, the challenges of production deployment, performance optimisation, and organisational integration remain complex. Connecting an AI model to Google Drive is now straightforward. Building systems that organisations can depend on still requires deep technical understanding.
The gap between prototype and production represents an opportunity for designers and developers who understand both AI capabilities and organisational needs. As AI becomes more accessible, competitive advantage shifts toward implementation quality, user experience design, and understanding how to create genuine value rather than impressive demonstrations.
Building Abena reinforced my conviction that the future belongs to AI implementations that prioritise human agency and organisational effectiveness over technical novelty. The most successful AI systems will be those that amplify human capabilities while respecting the complexity of real organisational contexts.
This project demonstrated that creating such systems requires both technical sophistication and design sensitivity. The infrastructure exists to build production-ready AI that genuinely serves organisational needs. The question is whether we have the patience and skill to implement it thoughtfully rather than rushing toward the next impressive demo.