ELECTE's Podcast: AI Frontiers

90% of SMEs do not need GPT-5

Fabio Lauria Episode 48

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 24:06
In this eye-opening episode, we challenge the prevailing belief that every business needs the latest AI advancements, like GPT-5. Discover why a staggering 90% of small and medium enterprises (SMEs) can thrive using models that are significantly smaller and more efficient. We’ll explore how industry giants like AT&T, IBM, and Ford are saving millions by opting for streamlined AI solutions that deliver results without the complexity. Key topics include the cost-effectiveness of smaller models, practical applications for SMEs, and the strategic advantages of adopting tailored AI solutions. We’ll also discuss how businesses can leverage these insights to enhance productivity and drive innovation without breaking the bank. Whether you're a business owner, an AI enthusiast, or simply curious about the future of technology, this episode is packed with actionable insights that could reshape your approach to AI. Don’t miss out—tune in now to learn how to optimize your AI strategy for success!
SPEAKER_00

GPT-5, 2 outperforms human experts on 70, 9% of professional tasks, it solves 40, 3% of math problems that 60% of PhDs cannot solve. It has a context window of 400,000 tokens, impressive, expensive, and completely useless for most European companies. Two weeks ago, I was at CES in Las Vegas. Between keynotes and robot demos, one sentence went almost unnoticed. Small fine-tuned language models achieve the same accuracy as large generalist models for enterprise applications, but they are superior in terms of cost and speed. This was not a minor talk. It was Andy Marcus, chief data officer at AT ⁇ T, the company that handles millions of daily interactions and has just implemented 71 different AI solutions using mainly open source models with 7 to 13 billion parameters instead of GPT-4 or Claude. In November, I wrote a letter from 2028 arguing that in 2025 we were optimizing answers to the wrong questions. At CES, that prediction materialized before my eyes. No one was talking about AGI or superintelligence anymore. Everyone was talking about breakeven points, costs per token, and latency requirements. The change in tone was palpable, and what emerged completely overturns the dominant narrative about AI, TLDR, Frontier Models, GPT-5, X, win in benchmarks, but lose in PL. Fine-tuned 7B-13B models beat GPT-4 in specific enterprise tasks. The real competitive advantage is economic, not cognitive. 80-90% of queries can be handled by SLM at 100x lower costs. Fabio Lauria, the numbers no one is looking at. While everyone is chasing the latest 500 billion parameter model, Pretabase analyzed 700 plus fine-tuning experiments and discovered something surprising. 7B13B parameter models, once trained on specific data, beat GPT-4 on 85% of specialized enterprise tasks while costing 100 times less. Let me show you what that means in practice. Take the analysis of medical queries about diabetes. A fine-tuned 7 billion parameter model, Diabetica 7B, achieves 87, 2% accuracy, GPT-4, between 70 to 75%. The small model wins by 17% points and runs on a single $5,000 GPU instead of APIs that cost thousands per month. Or look at code review. Llama 3 with 8B parameters and LoRa fine-tuning beats both Lama 70B and Nematron 340B in bug severity classification. You got that right. A model 42 times smaller that outperforms the giant. The pattern repeats everywhere. Biomedical QA, Lama 8B fine-tuned, C E R 26, 8% versus GPT-4, yeah 16%, plus 167% relative. Legal contract review, Mistral 7B fine-tuned ye shift 25 to 40% compared to the reference model. Mathematical Reasoning 5414B beats GPT 4.0 80% versus 77%. Why does this happen? The answer is counterintuitive but fundamental. GPT 5.2 has been trained on virtually all text available on the internet. It knows a little about everything medicine, law, coding in 50 languages, philosophy, art history. This is its strength and its limitation. When you ask GPT 5, 2 to classify your customer support tickets, you are using zero. 01% of its capacity. The remaining 99, 99%, all that encyclopedic knowledge is useless. But you're paying for it anyway. Every token costs money, every millisecond of processing costs money, every GB of GPU memory costs money. A specialized model that only knows how to handle your tickets, costs less, runs faster, and if trained on your data, works better because it understands the structure of your specific conversations. Performance comparison, small versus frontier, models task fine-tuned, SLM GPT 4 Resultato Diabetes, Medical Queries, Diabetica 7B, 87, 2% 70 to 75% SLM, 17 points. Code review severity llama 38B plus Laura Lama 70B and Nematron 340B SLM 42x smaller. Mathematical reasoning 54, 14b, 80% GPT 40, 77% SLM plus 3 points. Biomedical Kiyolama 8B, 26, 8% GPT 4, 16, 0% SLM, plus and 67% relative. Let's do the math because math matters. I asked the CFOs I spoke with at CES, how much do you really spend? The answers were enlightening. Current API pricing per million tokens. GPT 40 625 blended inputs output claudson. 49 Auto00, self-hosted 7B on H100, $0,10 E0. 50. It sounds technical, but let's translate that into actual monthly spending. A company that processes 1 million queries per month, about 800 tokens each, a typical customer support conversation, GPT 4.0, $6,000 month, SNASH current $72,000 year self-hosted, $7B, $100 to $500 months for $1,200 to $6,000 a year reduction. 92 to 98% API pricing per million tokens, January 2026, modello input cost output cost blended 5050 GPT 42, $5010, $0.006, $25 clawed Sunday $4.3, $0015, $009, $00 GPT 40, Mini 01500600, 38 self-hosted 70 or 004, 01410 to 050. This is in theory. A documented fintech case study showed a shift from 47,000 month on GPT-40 mini to 8,000 month with a hybrid approach. Self-hosted models for routine queries, but selective API for complex cases. Payback in four months. After a year, they had saved $468,000 and fine-tuning. It costs less than everyone thinks. Clora on 7B model, $50 to $150 on consumer hardware. RTX $4090, the GPU you probably already have in your office for 3D rendering. LoRa production quality on cloud GPU, $500 to $3,000 one time. Comparison, a single month of GPT-4 API for moderate use costs $5,000 to $1,000. You're spending more in a month than it would cost you to specialize a model forever. An MIT Wikipedia summarization project documented costs of $36,000 using GPT 4 API versus $2,000, $3,000 with self-hosted fine-tuned Lama 7B. That's $120 reduction. That's not a typo. Breakeven points are lower than you think. Under 100K tokens, day stay on API, self-hosting is not worth it. 2M plus tokens day, self-hosting becomes cost effective. Break-even six to twelve months. 10 M plus tokens day self-hosting strongly preferred. Breakeven 4 to 6 months. Annual API expense over 500K. GPU cluster Pullora fine-tuning wins hands down. Most SMEs that think we can't afford AI actually can't afford not to investigate this avenue. Breakeven analysis. When is self-hosting worthwhile? Daily volume recommendation break-even 100k tokens, day API, never two M plus tokens, day self-hosting, slightly convenient, 6 12 months, 10 M plus tokens, day self-hosting for six months, yearly expense API, 500k GPU cluster, plus LoRa fine-tuning winner companies that have already understood this. When you talk to those who have implemented these systems in production, a clear pattern emerges. They are not waiting for GPT-6. They have stopped chasing frontier models and have started building competitive MOTA with specialized models. AT ⁇ T is the most sophisticated example. Andy Marcus told me clearly, open source models are a lot cheaper than open AI. We now have the capability to spin up a foundational model for specific use cases such as coding. This is not philosophy, it is operational architecture. Lama 2 and Falcon handle routine cost optimized queries. Custom models fine-tuned for network operations. They analyze one. Two trillion alarms daily, yes, trillion. Azure OpenAI reserved only for complex reasoning that really requires GPT-4 measurable result, code review reduced from hours to minutes, vulnerability remediation from days to seconds, not improvement, but orders of magnitude. I asked, why don't you use GPT-4 for everything since it's the best? The answer was straightforward. Because we don't need the best. We need the most efficient for our specific use case. IBM Granite has brought this approach to 28,000 employees globally using 8B and 13B parameter models. Gartner calls them the company to beat in enabling domain-specific language models. Westfield Insurance, not exactly a tech startup, has achieved an 80% reduction in the time it takes developers to understand legacy applications. How? Granite 13B fine-tuned on their specific code base, not GPT-4, not Claude Opus, a 13B parameter model trained on their code. Ford, announced at CES while I was there, made an even more interesting choice. They use off-the-shelf large language models on Google Cloud, explicitly not frontier models. Their VP of software explained the reasoning to me. The vehicle specific context and integration with sensors matter infinitely more than the raw capability of the model. We prefer an average model that knows every specific F-150 to a genius that knows nothing about pickup trucks. Strategy. Democratize AI starting with $3,000 vehicles. They can't do that with GPT-4 economics. And then there's Bloomberg GPT, 50 B parameters, so not tiny, but domain specific. Trained on 363 billion tokens of proprietary financial data. Outperforms existing open models on financial tasks by large margins while matching general benchmarks. Training cost $2, $67 million. Seems like a lot until you compare it to what they would have spent on GPT-4 APIs in the first year of operation, estimated at over 10 predecessors. CTO Sean Edwards summed up the philosophy: much higher performance out of the box than custom models for each application at a faster time to market. They invested upfront so they wouldn't have to depend on external APIs. Three years later, the ROI is clear. The decision-making model because you need method, not enthusiasm. After seeing dozens of implementations, both successes and spectacular failures, a clear model emerges. It's not complicated, but it requires honesty about your real needs. Use SLM when, and this covers 80 to 90% of real cases. Well-defined domain-specific tasks, classification, extraction, routing, things you do hundreds of times a day in the same way latency requirement below 100 meters, customer facing, real-time decision daily volume over 2M tokens. Here, API costs become prohibitive privacy compliance requirements, HIPAA, GDPR, PCI, your data cannot leave your perimeter. You have training data, minimum 1,000 examples, ideally 2,000 to 6,000 less than you think. Budget below 5K month. Above this threshold, APIs will eat you alive. Reserve LLM for the 10 to 20% that really need it. Multi-step cross-domain reasoning, analyze this legal contract, assess the tax impact, propose strategic alternatives, creative or open-ended generation, marketing content, brainstorming cases without repeatable patterns, less than 1,000 examples for fine-tuning. You don't have enough data to train, better to use APIs. Prototype phase, use GPT-4 or Mini to validate that the problem is solvable, then specialized volume below 100k tokens.day, so low that APIs cost less than development time. Fine tuning thresholds based on 700 plus experiments. Examples approach. Note 1,000 examples use prompting, rag fine tuning over fitting thousand dollars. Or 6,000 examples, Laura fine tuning, optimal point plus 73 to 74% accuracy improvement. 10,000 plus examples, full fine tuning rarely needed for single use cases. The data is clear. Fine-tune GPT 3. 5. Achieve 73 to 74% greater accuracy than prompt engineering on code review. Format accuracy improves by 96% with only 50 to 100 examples. You don't need huge data sets. You need good data sets. The winning pattern I've seen repeatedly, hybrid routing. Route 80 to 90% of predictable queries to fine-tuned SLM. Reserve frontier models for the 10 to 20% that requires complex reasoning. One documented case achieved $3,000 a month versus $937500 in pure API costs. $11. $2 million in annual savings with a $4,900 ROI. Yes, $4,900. The million dollar mistakes that I've seen made. Now for the uncomfortable part. The part that no one wants to talk about at conferences, but which emerges clearly when you talk informally with those who have been in the trenches. MIT has documented that 95% of enterprise AI pilots fail to achieve measurable impact on the PL, 95%. It's not a technology problem. It's an approach problem. The data that should make every SME think, purchasing from vendors, 67% success rate internal development, 33% success rate, you are twice as likely to succeed by buying than by building. For SMEs with limited resources, this is critical. Not for ideological reasons, but for pure mathematics. Costly failures are instructive. Volkswagen carry adds $7. 5B in operating losses, 1600 layoffs. The problem? A big bang approach. Attempting to build a unified AI driven OS from scratch instead of iterating on specific use cases. They pursued a vision instead of solving concrete problems. Zillow, five hundred million plucks in losses, twenty five percent workforce reduction. The AI pricing algorithm couldn't handle unstructured factors, the classic case of we have data but not the right data. They traded data volume for data quality. IBM Watson, oncology, program discontinued after years and huge investments. The problem? Over reliance on synthetic data instead of real clinical data. The AI worked beautifully on theoretical cases, but failed on real patients. Taco Bell. Voice AI deployed in 500 plus drive-throughs, viral failures, return to hybrid system. The problem? Failure to test edge cases, accents, background noise, non-standard orders. They optimized for the happy path and forgot the long tail. Rand identified five root causes through 65 interviews with experts. Great study, you should read it. Misunderstanding of the problem, most common. We want AI instead of we want to reduce customer support response times from four hours to 30 minutes. Two, lack of necessary training data. They think they have it, but they have dirty Excel files, not labeled data sets. Focus on technology instead of solution. We need to use GPT-4 instead of we need to solve X4. Inadequate infrastructure for data governance. The data exists but is scattered across 15 different systems that don't talk to each other. Problems too difficult for AI. Some things cannot be solved with ML, but no one wants to tell the board. The specific barriers for SMEs are different. 50% of SMEs report that employees lack AI skills, OECD. 82% of smaller SMEs cite AI isn't applicable as a reason for non-adoption. The truth? It's not that AI isn't applicable. It's that the gap between GPT-4 can do magical things and how do I implement it in my 2010 ERP is immense. And everyone is selling the magic, no one is selling the bridge. The solution I've seen work, start microscopically small. One use case, one process, realistic, measurable outcome. Reduce lead classification time by 30%. Not transform the business with AI. Focus on high volume, low risk tasks, and strongly consider external partnerships, given the 2x success rate. It's hard to justify internal development unless you're already a tech company. Where the market is really going, the dominant theme at CES was physical AI, AI coming off screens and into robots, vehicles, edge devices, no longer agentic AI promising future autonomy, but AI that today welds components, drives vehicles, analyzes sensors in real time. Jensen Huang of NVIDIA stated it explicitly. The chat GPT moment for physical AI is here. But these embedded systems don't run GPT-4. They run 7B, 13B models optimized for inference on edge chips, exactly what we're talking about. Gardner's projections are clear. By 2027, organizations will use task-specific SLMs at 3x the volume of general purpose LLMs. Over 50% of enterprise Gen AI models will be domain-specific, up from 1% in 2024. This is not wishful thinking. It is already happening. At CES, the hardware told the same story. AMD Ryzen AI, 460 NPU tops, enables on-device inference for 7B models Intel Core Ultra Series 3, 122 total tops, first 18A process with 15% better performance per watt NVIDIA Jetson AGX. Orin, 275 tops for robotics and autonomous systems AI, is migrating from the cloud to the edge, not because it's cooler, but because it's cheaper and faster. A 7B model running on an ARM chip in your device responds in 30 masters and costs nothing after the hardware purchase. API GPT-4 responds in 800 meters and charges you for every query. The SLM market is projected to grow from $0.93b 2025 to $545B by 2032 with a Kagger of 28, 7%. For context, the LLM market will also grow, but more slowly. The growth delta is in specialized models. SME adoption data supports this thesis. 55% of small businesses now use AI, up from 39% in 2024, with 91% reporting revenue lift and 86% reporting improved margins. It's no longer tech-savvy early adopters. It's hairdressers using AI for scheduling and auto repair shops using AI for preventive diagnostics. The adoption gap between large firms and SMBs has closed from 1.8x in February 2024 to near parity in August 2025. AI has become democratized not because GPT-4 has become accessible, but because SLMs have. What this means for you, Monday morning, the concrete strategy that emerges from all these conversations, phase one, prototype, weeks one to four, use GPT-40 Mini or Claude Haiku to validate the use case. Invest $100 to $500 in API calls. Goal, confirm that the problem is solvable with AI and collect the first 1000 plus real examples of correct input output. Don't look for perfection, look for signal. Phase two, specialization, weeks five to eight. Fine-tune an open source 7B model, LAMA 3, 1.8B, Mr. L7B, or 5.4 on your data. One time cost, $500 and $3,000 depending on whether you do it in-house or outsource. Goal performance superior to GPT-4 on your specific task at a fraction of the recurring cost. Phase three, hybrid routing, weeks nine plus. Deploy fine-tuned model for 80 to 90% of routine queries. Keep fallback to GPT-4 or MiniClaude for complex edge cases. Monitor which ones. Often you'll discover patterns you can reincorporate into subsequent fine-tuning. Recurring cost $500, $2,000 month versus $5,000, $3,000 month for pure API, and it scales much better. The uncomfortable truth. Companies that are winning don't use frontier models for competitive advantage. ATT, IBM, Bloomberg, they're not waiting for GPT-6. They've built MOTA with specialized models deployed on proprietary data. The competitive mode is not access to GPT-5. It's domain expertise encoded in fine-tuned models that cost pennies per query and that your competitors can't replicate because they don't have your data, your processes, your specific knowledge. GPT-5. Two is impressive. But for 90% of SMEs, it's like buying a supercomputer to run Excel. Does it work? Yes. Is it the right choice? Rarely. AI white paper for SMEs 2026-20 to 25 pages of practical analysis, zero hype what it includes. Complete decision-making model, when to use SLM versus LLM 10 concrete use cases with ROI calculations. Complete CES 2026, analysis, hardware, chips, trends, 100 capable mistakes I've seen people make, real cases, hardware combinations, best price performance actionable strategies to implement immediately available in the coming weeks. Reply to this email for early access. Fabio Lauria CEO and founder, Elect Ep. S. The white paper will be released in two to three weeks. First wave limited to 200 people. If you want to see the real numbers that companies don't share publicly, reply to this email now. Welcome to the Elect A Newsletter. This newsletter explores the fascinating world of artificial intelligence, explaining how it is transforming the way we live and work. We share engaging stories and surprising discoveries about AI, from the most creative applications to new emerging tools, right up to the impact these changes have on our daily lives. You don't need to be a tech expert. Through clear language and concrete examples, we transform complex concepts into compelling stories. Whether you're interested in the latest AI discoveries, the most surprising innovations, or simply want to stay up to date on technology trends, this newsletter will guide you through the wonders of artificial intelligence. It's like having a curious and passionate guide who takes you on a weekly journey to discover the most interesting and unexpected developments in the world of AI, told in an engaging and accessible way. Sign up now to access the complete newsletter archive. Join a community of curious minds and explorers of the future. Subscribe now.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.