A fintech startup’s experience highlights the importance of understanding LLMs’ capabilities and limitations for effective product design, emphasising the need for integration with reliable data sources and cost management strategies.
In the rapidly evolving landscape of artificial intelligence, large language models (LLMs) have emerged as powerful tools capable of generating human-like text and assisting in complex interactions. However, the nuanced underst...
Continue Reading This Article
Enjoy this article as well as all of our content, including reports, news, tips and more.
By registering or signing into your SRM Today account, you agree to SRM Today's Terms of Use and consent to the processing of your personal information as described in our Privacy Policy.
An illustrative case stems from a Series A fintech startup that developed an AI-driven financial advisor chatbot over six months. The feature, demonstrated flawlessly to the board, appeared ready to scale with a $1 million investment approved. Yet shortly after launch, compliance flagged major issues: the chatbot confidently presented services the company never offered, including cryptocurrency trading—something the startup explicitly did not provide. Upon investigation, it revealed multiple instances where the chatbot “invented” facts, describing fee structures and account types inaccurately. The unsettling conclusion was that the technology wasn’t malfunctioning; rather, the product team fundamentally misunderstood how the LLM worked.
At the heart of this misunderstanding lies the nature of LLMs themselves. Contrary to popular assumptions, these models do not store factual databases. A sophisticated form of autocomplete, LLMs are trained on billions of text pages to predict the next token—essentially a piece of a word—based on pattern recognition rather than knowledge verification. As the author of the original fintech case study explains, it is akin to someone who has read every cookbook but never cooked a meal: capable of reproducing recipes flawlessly but lacking actual experiential verification.
This dynamic explains why an LLM-based chatbot could confidently generate plausible but incorrect information about cryptocurrency services. The model recognized common patterns across fintech content generally but had no mechanism to distinguish between the specific company’s offerings and industry-wide trends. The fundamental design of these models means they can generate answers that sound authoritative but are sometimes fabrications, a behaviour often termed “hallucination” but more accurately a result of probabilistic pattern matching.
The critical lesson for product architecture is clear: LLMs should never serve as the single source of truth for factual information. Instead, the recommended approach is to integrate LLMs as the conversational interface layer that interprets natural language and delivers outputs sourced from reliable databases or knowledge bases. For example, in financial product design, queries like “What is my portfolio allocation?” should trigger a database call for real data, with the LLM phrasing the response naturally to the user. This ensures accuracy and trustworthiness while harnessing the conversational benefits of LLMs.
Further complexities arise from the economic and technical constraints of deploying LLMs at scale. These models process language in tokens rather than words, and every token—input or output—incurs cost. In the fintech example, an initially overlooked detail was the enormity of token usage per interaction, which led to a sudden sixfold increase in API expenses despite moderate growth in message volume. Detailed system prompts, customer profiles, conversation histories, and context retrievals combined to send thousands of tokens per message. Through diligent optimisation—trimming system prompts, summarising customer profiles, employing rolling conversation summaries—the startup cut costs by over 60%, proving that token budgeting is a vital product decision, not just an engineering exercise.
Context windows, another core LLM attribute, define the maximum information the model can process at once, measured again in tokens. While models like GPT-4 boast theoretically large context limits (up to 128,000 tokens), effective operational thresholds are lower because context input, system instructions, and response space all compete for token capacity. Exceeding roughly 60% of this capacity leads to degraded output quality, slower responses, and internal inconsistencies. Managing context windows requires redesigning features to chunk and summarise information, progressively disclosing data rather than overwhelming the model in one request.
An ongoing challenge is the intrinsic probabilistic nature of LLMs, which means identical inputs can yield different outputs. Unlike deterministic software—where fixed inputs produce consistent results—LLMs select each next token from a probability distribution. This variability adds a natural conversational feel but can create compliance concerns in environments demanding consistency, such as legal or financial disclosures. Product teams must decide when variation is acceptable and when precision is paramount, employing temperature settings and post-processing to control output variance. Importantly, teams must educate stakeholders upfront that such variation is a feature of LLMs, not a fault, and that for mission-critical tasks requiring exact consistency, traditional deterministic software remains preferable.
The collective understanding of these fundamental principles—LLMs as pattern recognisers, token cost awareness, context window constraints, and the probabilistic versus deterministic trade-offs—equips product managers to design better AI-powered products. It enables informed conversations with engineers and stakeholders, guiding decisions on when to implement LLMs and when to opt for conventional software solutions.
Beyond this specific fintech case, the broader AI industry recognises these challenges and adaptations as standard practice. Leading LLMs, from OpenAI’s GPT series to models like Meta’s Llama and Anthropic’s Claude, reflect ongoing efforts to balance accuracy, cost, and user experience. Innovations focus on scaling capabilities while managing such inherent limitations, supported by architectures combining LLMs with reliable data retrieval systems, cost-effective token usage, and careful feature design.
For PMs and teams venturing into AI-driven products, mastering these fundamentals is not just beneficial—it is essential. This ensures users receive experiences that are not only innovative and conversational but also accurate, trustworthy, and economically sustainable. The cautionary tale of the fintech startup stands as a robust reminder that cutting-edge technology demands a deep understanding and thoughtful application to truly unlock its transformative potential.
Source: Noah Wire Services