Interview with Yury Shamrei, CEO, SumatoSoft

Connectively

Connectively connects subject-matter experts with top publishers to increase their exposure and create Q & A content.

10 min read

Interview with Yury Shamrei, CEO, SumatoSoft

© Image Provided by Connectively

This interview is with Yury Shamrei, CEO, SumatoSoft.

For readers meeting you for the first time, how do you introduce your expertise at the intersection of computer hardware, enterprise software, and AI?

I introduce myself as a translator between physical reality and digital decision-making. It’s the clearest way to describe what sits between a sensor on a machine and the decisions someone makes in a boardroom.

Most tech companies live in one layer: a great hardware sensor, a massive database, or a sharp AI model. Each is good at its piece, but the business value almost never lives inside any one piece. It lives in whether the three can talk to each other without falling apart at the joins, which is the part no one owns and everyone underestimates.

That’s what we do at SumatoSoft: we build the nervous system connecting them. We take the messy telemetry coming off hardware at the edge, move it through secure enterprise infrastructure, and clean it up until a model can actually rely on what it’s fed. If you ignore the physical limits of the hardware, the software fails. Feed a model dirty data, and it will not fail loudly — it will hand you confident answers you can’t trust, which is worse. My expertise is knowing where each layer betrays you, and wiring the seams so the whole thing holds.

What key experiences or decisions shaped your path to leading AI-driven initiatives as a CEO in the hardware industry?

I do not manufacture hardware. I build the nervous system that makes it intelligent.

The thing that actually set me on this path was not some breakthrough in AI. It was watching expensive industrial IoT projects fail for a reason no one wanted to name: the software people did not understand physical reality. Years ago, I spent time auditing enterprise deployments where, on paper, everything was right — the sensors worked, the cloud platform was genuinely well-built — and the whole thing still did not work, because the connection between the two was a mess. The hardware was speaking; the software could not really hear it.

That stuck with me, and it shaped where we chose to point SumatoSoft. The instinct in the industry is to tell a manufacturer to rip out their old machines and buy new smart ones. I came to think the real opportunity was the opposite — leaving the twenty-year-old hardware exactly where it is, and building the secure software that connects to it, makes sense of the messy telemetry coming off it, and feeds something clean enough that a model can act on it.

So my route into AI did not run through the models at all. It ran through the unglamorous work between the physical and the digital — the broken plumbing no one else wanted to touch. That is still the part of this I find most worth doing.

What first-principles thesis guides your decisions on where AI creates real, defensible value in a hardware-centric business?

The thesis is simple: compute is cheap, steel is expensive. Once you really sit with that, a lot of decisions about AI in the physical world get easier.

When we look at an AI investment for a hardware-heavy client, the question we lead with isn’t how good the model is — it’s what expensive physical thing the model lets them avoid buying. In the industrial world, the durable value of AI is narrow and specific: getting more life and more output from assets that cost a fortune and can’t easily be changed once they’re in the ground.

Take a manufacturer who wants 20% more throughput. The instinct is to buy ten more machines and expand the building — enormous capital, locked in for years. The other path is to point AI at the telemetry the existing machines already produce: optimize how work is routed through the floor, and catch part failures before they take a line down. The first move spends millions on steel and concrete. The second incurs a far smaller, mostly software cost to pull more out of steel you’ve already paid for. Even counting what it takes to build and run the system, the gap between those two numbers is not close.

That’s the part worth sitting with. The real moat isn’t the AI itself — it’s what the math underneath it enables a company to do: grow output without growing its physical footprint at the same rate. Once revenue stops being chained to how many machines you own and how big your building is, you can scale in a way a competitor still buying steel for every increment simply can’t match.

Can you walk us through one end-to-end workflow your team transformed with AI—what changed, what stayed manual, and which outcome mattered most?

In competitive bidding, the best technical response doesn’t always win. Often the first good one does — and that’s a more uncomfortable truth than most firms want to sit with.

A B2B manufacturing client of ours kept losing bids they were technically qualified to win. The bottleneck was time. A fifty-page RFP would arrive, and their senior architects — the expensive, in-demand ones — would lose the better part of a week to it: pulling out the requirements, hunting through past projects for what worked, and assembling a baseline response. Good, rigorous work. Slow. By the time their thorough proposal went out, a competitor’s adequate one had already been sitting on the client’s desk for days, quietly setting the terms everyone else got measured against.

So we built a system that does the slow first pass. It reads the incoming RFP, maps the requirements against their library of past successful bids, and assembles a baseline quote and technical roadmap — the draft the architect used to spend days building by hand. The architect still owns everything that matters: checking the logic, setting the pricing strategy, shaping the final proposal. That part stays human — and should.

But the lesson wasn’t “our architects got more efficient.” It was that we’d misunderstood what we were competing on. We thought it was the quality of the technical response. It turned out a large part of it was latency — being early enough to be the proposal every later bid gets compared to. Automating the draft didn’t just save senior people some hours. It moved the client from chronically late to consistently first, in a game where first is most of the battle.

If you were dropped into a legacy ERP/CRM/IoT stack tomorrow, what 90-day plan would you run to make the data AI-ready without stalling the business?

The most expensive mistake with legacy tech debt is deciding to rip it all out at once. The big-bang migration — freeze the old system, rebuild everything, switch over on some triumphant go-live date — is how you stall a business for two years and spend millions to arrive somewhere barely better than where you started.

Drop me into a tangle of legacy ERPs, CRMs, and aging IoT hardware tomorrow, and the strategy is the opposite of heroic. We don’t touch the production servers. We build a smarter brain next to them and let it take over by degrees. Here’s the 90-day version of how that runs.

  1. Days 1–30: Triage and map. The first thing I do is stop the team from trying to fix everything. No one cleans “the database” — that’s how you boil the ocean for a quarter and have nothing to show. We pick one bottleneck that’s costing real money: stockout prediction, say, or routing field techs. Then we map only the data that one problem actually needs, and audit the old hardware and software just enough to learn how that data behaves — how it’s formatted, how often it refreshes, where the dead ends are. The core systems don’t get touched. We’re studying them, not opening them up.

  2. Days 31–60: Build the layer beside it. We don’t try to make the legacy ERP smart. Push modern workloads onto a twenty-year-old system and it falls over. So instead we build a read-only pipeline out of it — pull the raw telemetry off the sensors and the history out of the CRM into a clean, modern data warehouse that sits entirely outside the old environment. All the tedious work — cleaning, normalizing, reformatting — happens out there, in the new space. The legacy system keeps doing the one thing it’s genuinely good at: recording transactions reliably, the way it has for years.

  3. Days 61–90: Run it in the shadows. Now the AI goes in, with no write access at all. We connect a narrow, purpose-built model to that clean layer and run it as a shadow pilot — it reads the data and produces its predictions right alongside the human team, but it physically cannot push anything back to the live hardware. For a month it just proves itself: are the predictions accurate, and is the money they’d save real? Only once that’s settled does the model get anywhere near a live workflow.

The reason the business never stalls is that the foundation was never paused or altered. We didn’t reach into the machine that’s keeping the lights on. We tapped the exhaust coming off it and made that useful instead — and by the time anything needs to change, we know how to do it safely.

As a certified Scrum Master leading cross-functional teams, how do you structure roles, rituals, and cadences so AI systems actually improve lean operations rather than add friction?

I came up as a Scrum Master before I ran a company, and the lesson that stuck wasn’t about Agile rituals. It was about what they actually cost. The most expensive resource a company has isn’t any individual’s time — it’s synchronized time: the hours when five senior people are all paying attention to the same thing in the same room at once. That’s the rarest, priciest thing you spend. Most companies spend it terribly.

So the way we use AI internally isn’t really about speed. It’s about protecting those hours — making sure that when we finally put expensive people in a room together, none of it is wasted on things that never needed the whole room.

The test I apply to any recurring meeting is blunt: how much of this is people arriving at a shared understanding of facts, versus people exercising judgment that genuinely needs all of them present? The first kind is almost always a waste of synchronized time. Pulling together history, assembling context, summarizing where things stand, and prepping the baseline everyone reacts to — that’s solitary work, or a machine’s work, and doing it live with five people watching is the most expensive way imaginable to gather information. AI does that prep now, before anyone walks in.

What’s left in the room is the part that actually justifies the room: the disagreement, the edge cases, the call that needs three people’s context colliding to get right. The meeting doesn’t disappear — it gets denser. People stop reading each other’s status updates and start doing the thing only a group can do.

That’s the whole philosophy, and it’s smaller than it sounds. Don’t use AI to eliminate the human gathering. Use it to make sure the human gathering is spent on the one thing that was worth gathering for.

What decision framework do you use to choose between rules, RAG, fine-tuning, SaaS, or on-device inference when building an AI feature into a tech product?

I tell my engineering teams to tune out the hype and make architecture decisions based on two simple questions instead: Where does the data live, and how fast do we need the answer? Almost everything else follows from those two.

Start with location and speed. If a client needs predictive maintenance on a factory floor where the internet drops every twenty minutes, the decision is already made — the model runs on the device, not in the cloud. People reach for a big cloud API by reflex, but when the connection is unreliable and a delayed answer is a safety problem, not just an annoyance, a small model running locally on the hardware wins, because it still works when the link is down and returns answers in milliseconds. You’re trading some raw capability for locality and reliability, and on a factory floor that’s the right trade every time. In a setting where the network is solid and latency isn’t an issue, the calculus flips back — which is the point. The environment decides, not the trend.

Then ask whether the knowledge changes. Building an enterprise search tool over internal policies that get rewritten weekly? Fine-tuning a model for that is a way to set money on fire — every price change or policy update means retraining. That’s the textbook case for RAG instead: a cheap base model querying a living database, so the moment the source updates, the answers do too; no retraining required.

That’s the whole framework, and it stays simple. You fine-tune to teach the model a permanent new skill. You use RAG when the information it needs keeps changing and has to stay current. And — the part people skip — you use plain old rules when the problem never needed AI in the first place. Knowing which of those three you’re in is most of the job. The teams that get it wrong almost always picked the technology first and then worked backward to justify it.

Which metrics and safeguards do you rely on to prove ROI while keeping AI systems safe, auditable, and aligned with messy real-world processes?

The industry loves to measure AI by hours saved. I measure it by cash recovered — and the two are not the same conversation.

Hours saved is a soft number. It’s easy to claim, hard to verify, and it rarely survives contact with a CFO who wants to know what actually changed on the P&L. So when we put an AI system into a messy enterprise workflow, the metric we hold it to is financial and specific. Automate an invoice-auditing process, and success isn’t “the team works faster” — it’s the dollar value of billing errors the system catches that were quietly leaking out before. That’s a number you can put in front of a finance team and defend, because it shows up as money that stayed in the business instead of money that didn’t.

But a financial claim is only as trustworthy as your ability to audit it — and that’s where the architecture matters. A number you can’t trace is a number a CFO is right to ignore. So every recommendation the system produces has to be checkable back to its source: here’s the discrepancy, here’s the exact contract clause it’s based on, here’s the math. If you can’t show your work, you don’t really have an ROI claim. You have an assertion.

Which leads to a design rule we don’t bend on: the AI doesn’t get to act on its own. In a real business, a model should never have the authority to change a master record or release a payment by itself. Ours read the messy data, find the pattern, and hand a specific recommendation to a human — with the source clause attached — and a person makes the call. The machine finds the money. The human decides whether to act on it. Keeping those two separate isn’t caution for its own sake; it’s what makes the whole thing safe enough, and traceable enough, to trust the dollar figure at the end.

Looking 12–24 months ahead, which edge or supply-chain AI capabilities do you expect to most change how you design, build, or support hardware products—and why?

Over the next couple of years, the biggest shift in our corner of the industry won’t be a smarter cloud. It’ll be industrial AI that no longer needs the cloud to make a decision.

Right now, most edge devices are barely more than sensors with a wire. They gather physical telemetry and wait — for a centralized server somewhere to crunch it and send back an instruction. In an office app, that round trip is invisible. On a factory floor, it’s a liability, because the lag between “something is going wrong” and “a server three states away notices” is the time you don’t have when the thing going wrong is physical. If an industrial drill picks up the acoustic signature of an imminent failure, it cannot wait on an internet connection to ask permission to stop. The decision has to happen on the chip, in the moment.

So the capability I’m watching isn’t a bigger model in a bigger data center; it’s small, specialized models running entirely on the device — not language models, but tight little inference models built to do one physical job: read the vibration, spot the anomaly, make the call, locally, with no connection required. The cloud doesn’t vanish: training, fleet-wide learning, and heavy analytics all still live there. What changes is that the cloud stops being a dependency for the decision itself. The intelligence moves down to where the physical event actually happens.

And that rewires what we engineer. The work stops being the construction of massive cloud pipelines and shifts toward compression — the genuinely hard problem of making a model small and efficient enough to run reliably on a twenty-dollar microprocessor with no internet at all, without it becoming useless as it shrinks. That tradeoff, squeezing real reliability onto tiny, constrained hardware, is the actual engineering frontier here, and it’s where I think the next few years of this work will be won.

The short version: we’re taking the intelligence out of the data center and putting it directly into the dirt.

Up Next