Interview with Ashish Dsa, CTO & Co-founder, Arbor

Connectively

Connectively connects subject-matter experts with top publishers to increase their exposure and create Q & A content.

Jun 18

• 10 min read

© Image Provided by Connectively

This interview is with Ashish Dsa, CTO & Co-founder, Arbor.

Ashish, you’re the CTO & Co-founder of Arbor in the computer software industry—how do you describe your current focus and the problems your team is solving with AI right now?

Right now my focus sits in three places.

The first is reliability at scale. It’s one thing to get an LLM to do something cool in a demo. It’s a totally different thing to get it to do that same thing correctly the ten-thousandth time on messy enterprise data with an auditable trail behind it. So a lot of what my team builds is the unglamorous plumbing: retrieval that actually pulls the right context (our RAG stack), an agent harness that keeps multi-agent systems from going off the rails, and evaluation pipelines so we know when something regresses before a customer does.

The second is voice. We’re doing real-time voice AI, and honestly that’s where a lot of the hardest engineering lives. Latency budgets are brutal. People expect a conversation to feel human, which means you’re juggling speech, reasoning, and retrieval all at once in under a second. When it works, it feels like magic. When it doesn’t, it feels broken. There’s not much middle ground, so we obsess over it.

The third is context engineering, which I think is the part most people underestimate. The model is rarely the bottleneck anymore. The real problem is getting the right information, in the right shape, in front of the model at the right moment. So we spend a ton of time on context engineering, prompt engineering, and fine-tuning models to a specific domain, because a generic model that knows everything is often worse than a focused one that deeply understands your operation.

The bigger problem we’re really solving is trust. Enterprise teams don’t want a chatbot; they want a system they can hand actual operational work to and not babysit. That bar is high, and it should be. Most AI breaks down somewhere between the impressive demo and the boring reality of production.

So that’s where my head is. We raised our seed to go after exactly this, and the early Fortune 500 traction tells me we’re pointed at a real problem. There’s a long way to go, but it’s the most interesting engineering work I’ve done in over a decade, and I genuinely don’t want to be working on anything else right now.

What experiences most shaped your path to CTO, especially the blend of full-stack engineering, data science, and cybersecurity that now guides how you build products?

Honestly, the thing that shaped me most wasn’t a job. It was a bug.

Years ago, early in my career, I shipped a feature that looked great in the demo and then quietly fell over in production. It turned out the data pipeline feeding it was making assumptions that were true in our test set and false in the real world. The model wasn’t wrong—the data underneath it was. I remember sitting there at about 2 a.m., realizing that the “AI” part was maybe 10% of the problem. The other 90% was plumbing, data quality, and the boring stuff nobody puts in a keynote.

That lesson basically became my whole approach.

So when people ask about the blend of full-stack, data science, and security, I don’t really see three things. I see one. At Meta, I learned what it actually takes to run software at scale, where a small inefficiency gets multiplied by millions of users and suddenly it’s a real problem. At TELUS and Hypersonix, I built ML systems that companies like Amazon and Taco Bell relied on for actual decisions, not science projects. That forces you to think differently. Because when a Fortune 500 client runs your model, “it works on my laptop” means nothing.

The data science part taught me humility, honestly. Models are confident even when they’re wrong. You have to build the guardrails, the evaluations, the context engineering around them so they fail loudly instead of quietly. That 2 a.m. bug taught me that the hard way.

The full-stack part is what lets me actually ship. I think a lot of AI folks can train a model but can’t get it in front of a user reliably. Being able to go from the database all the way up to the interface means I can see where things break before they break. And in enterprise, where we live, reliability is the whole product.

Security is the piece people underrate. When you’re handling real-time voice and sensitive enterprise data for big clients, security isn’t a feature you bolt on later. It’s a design constraint from day one. Where does the data live? Who can see it? What happens when the model gets something it shouldn’t? If you bolt that on at the end, you’ve already lost. I learned to bake it into the architecture from the start, and that mindset shapes basically every system we build at Arbor.

So if I’m being real about it, the path to CTO wasn’t some clean straight line. It was a bunch of moments where I watched something break and had to understand the full stack of why, from the silicon up to the human using it.

From your work taking voice AI and multi-agent LLM systems from demo to production, what single practice has most reliably turned messy conversations into structured signals that drive business decisions?

A while back, we had a Fortune 500 client whose support calls were basically a black hole. Thousands of conversations a week — all of it just vanishing into recordings that nobody listened to. They knew the gold was in there; they just couldn’t get to it.

So, if you ask me the single practice that’s mattered most, it’s this: define your schema before you touch a single model. Decide what a “structured signal” actually is, in business terms, before you build anything.

Sounds boring, I know. But hear me out.

The temptation is always to point an LLM at the mess and say “extract insights.” And you’ll get insights — beautiful, fluent, totally unusable ones. Because nobody upstream agreed on what an “escalation” is, or what counts as “churn risk,” or how you’d actually act on either. The model wasn’t the bottleneck. The definitions were.

What worked was sitting down with the people who’d actually use the output and forcing the hard questions first:

What decision does this drive?
What are the exact fields?
What values are allowed?

Then you make the model fill that schema — every conversation, the same shape every time.

And honestly, once you’ve got that contract, everything downstream gets easier. Evaluations become possible because you know what “right” looks like. You can measure drift. You can route the low-confidence ones to a human.

Messy conversations don’t become signal because the model is smart; they become signal because you decided, in advance, what “signal” means.

On the cloud side, having shipped on AWS, GCP, and Azure with data lakes and pipelines, what architecture choice has given you the biggest reliability-per-dollar gain for AI workloads?

Honestly, the biggest win came from a mistake. Early on at Arbor we ran our RAG and inference stack on always-on GPU instances because that’s what everyone said you do. Our cloud bill was brutal and our utilization was about 15%. We were basically paying rent on idle silicon.

So the architecture choice that actually moved the needle wasn’t a fancy one. It was decoupling compute from the workload pattern and going hard on autoscaling with spot (preemptible) instances behind a queue.

Here’s the thing about AI workloads: they’re spiky. Examples include:

Voice traffic that comes in waves
Batch fine-tuning jobs that are bursty
Embeddings backfills that are huge, then quiet

If you provision for peak, you’re wasting money about 90% of the time. If you provision for average, you fall over during spikes.

What we landed on: stateless inference workers on spot GPUs, a managed queue (SQS or Pub/Sub depending on the cloud) absorbing the bursts, and a separate small pool of on-demand instances as a reliability floor so we never fully drop requests when spot capacity vanishes. The data lake stayed on cheap object storage, decoupled from compute entirely.

That mix gave us the best reliability-per-dollar by far. We cut GPU spend roughly 60% and actually became more reliable, because the queue smooths out failures and the on-demand floor catches the spot evictions.

The lesson, basically: don’t optimize the model first. Optimize for the traffic shape. Match your compute lifecycle to how the work actually arrives, and the reliability and the savings show up together.

With your cybersecurity and ethical-hacking background, what is your go-to threat-modeling approach for voice AI systems to reduce risks like deepfakes, prompt injection, and data leakage?

I lean on a STRIDE-style decomposition adapted for the voice pipeline, because the attack surface in voice AI isn’t where most people think it is. The trick is to map threats to each stage:

Audio capture
ASR (speech-to-text)
The LLM/agent layer
Retrieval
TTS/voice output

For deepfakes and voice spoofing, I treat the audio input as fundamentally untrusted. That means:

Liveness detection and anti-spoofing classifiers on the incoming stream
Never letting voice alone authorize anything sensitive — voice is identification, not authentication
Use step-up verification (out-of-band, a code, a callback) for anything high-stakes

For prompt injection, the key insight is that transcribed speech is just untrusted user input, the same as text. I keep a hard separation between system instructions and anything the user (or a retrieved document) says. Specifically:

Treat ASR output and RAG context as data, never as instructions
Constrain what tools the agent can call
Gate dangerous tools behind explicit confirmation

For data leakage, I think in terms of data flow and blast radius. Mitigations include:

Minimizing what context the model sees
Scoping retrieval to the caller’s actual permissions
Redacting PII before it hits logs or third-party APIs
Monitoring the TTS output channel, since models can leak training or context data on the way out

The honest meta-point: most of this is the boring stuff — least privilege, treat all input as hostile, defense in depth, and assume any single control will fail. Threat modeling is mostly about being paranoid in an organized way.

Given the power and land constraints around AI infrastructure, what engineering tactic has delivered the best carbon and cost reduction for you without compromising latency or accuracy?

Honestly, the biggest unlock for us wasn’t some exotic hardware trick. It was being ruthless about not running the big model when we didn’t need to.

Here’s the thing. Early on at Arbor, we were routing basically every request through our largest model. It felt safe. But our GPU bills were brutal, and the carbon math was even worse. Then we actually looked at the traffic and realized something like 70% of requests were simple: classification, short lookups, and routine extraction. We were using a sledgehammer for thumbtacks.

So we built a semantic router up front. A small, cheap classifier looks at each incoming request and decides where it goes. Easy stuff hits a distilled, small model. The genuinely hard reasoning goes to the big one. We paired that with aggressive semantic caching, because in enterprise ops people ask the same things over and over. If we’ve already answered something close enough, we don’t burn a single token recomputing it.

That combo cut our compute roughly in half, which is real money and real watts. And here’s the part people don’t expect: latency actually got better, not worse, because most queries now skip the heavy model entirely. Accuracy held because we kept the hard cases on the strong model.

The lesson, basically, is that the cheapest and greenest token is the one you never compute. Smart routing and caching beat chasing efficiency inside one giant model every time. It’s not glamorous, but it works.

Before embedding AI into sensitive workflows, which governance or privacy control do you treat as absolutely non-negotiable in production?

If I had to pick one non-negotiable, it’s data lineage and access control at the context layer — knowing exactly what data the model can see, where it came from, and who’s allowed to retrieve it, enforced at query time, not after.

Here’s why I’d put this above everything else. Most AI failures in sensitive workflows aren’t the model “going rogue.” They’re plumbing problems: a RAG system that pulls a document the requesting user was never authorized to see; a fine-tuning run that quietly bakes PII into weights you can’t un-bake; an agent that logs a full prompt containing someone’s medical history to an observability dashboard. The model behaved fine. The data boundary leaked.

So in production I treat a few things as load-bearing:

Permission-aware retrieval. The RAG layer enforces the same access controls as the source system, per request. If a user can’t see a record in the source app, the model can’t either.
No sensitive data in training without an explicit, auditable decision. Weights are forever; you can’t issue a deletion request to a checkpoint.
Full audit trail. Every retrieval, prompt, and output is logged so you can answer “what did the model see and why” months later.

Guardrails on outputs matter too, but they’re a second line of defense. If the boundary holds, a bad output is embarrassing. If it doesn’t, it’s a breach.

To keep innovation disciplined across emerging tech (generative AI, edge hardware, blockchain), what single criterion or kill-switch do you use to decide whether to scale or stop an experiment?

One criterion: a pre-committed evidence threshold tied to a real decision, checked on a fixed date.

Before any experiment starts, write down the single metric that would prove the thesis, the target value it must reach, and the date you’ll check. The kill-switch is simple: did it clear the bar by the deadline? If not, stop. No renegotiating after the fact.

The discipline isn’t the metric itself; it’s that you commit to it before you fall in love with the work. Most failed experiments don’t die because the data was ambiguous. They die slowly because no one set the bar in advance, so every result looks like “almost there” and sunk cost keeps the lights on.

A few things make this work across generative AI, edge hardware, and blockchain alike:

The metric has to be a leading indicator of value, not activity. “Model accuracy improved” is activity. “A design partner would pay for this at current unit cost” is value. Pick the one closest to someone opening their wallet.
Tie it to a date, not a budget. Time-boxing forces a verdict; budget-boxing just lets you ask for more.
Make “stop” the default. The experiment has to earn its continuation by clearing the bar. Silence or ambiguity means stop, not “let’s give it another quarter.”

So, compressed to one line: every experiment gets a falsifiable success metric and a deadline set before it begins. Scaling requires clearing that bar; otherwise it’s killed. The number can be wrong and you can learn from it. Having no number is how disciplined innovation quietly turns into a graveyard of zombie projects.

Looking 12–24 months ahead, what is one concrete step operators should start today to prepare for the biggest shift you foresee at the intersection of AI integration, cybersecurity, cloud architecture, and sustainability?

A few months ago I watched a team spin up a slick internal AI assistant over a weekend. Real momentum. Then someone asked a simple question: Where is all this data actually living, and who can see it? Silence. That silence is the whole problem.

So if I had to name the biggest shift coming in the next 12 to 24 months, it’s this: AI agents will stop being something you query and start being something that acts. They’ll read your systems, call APIs, move data, and increasingly do it on their own. That collides head-on with security, cloud design, and energy use all at once. An agent with broad permissions is a new attack surface. It’s also a new cloud cost line and a real power draw nobody budgeted for.

The one concrete step? Start treating every AI agent as its own identity today. Give it a scoped service account, least-privilege access, and full audit logging before you give it anything useful to do. Most teams bolt this on after the agent is already wired into production. Do it first.

Why this one move? Because it quietly solves four problems at once. Security gets you containment and a paper trail. Cloud architecture gets you cleaner service boundaries. And sustainability rides along, because scoped agents only touch what they need, so you’re not burning compute and power on runaway calls.

Basically, you can’t govern what you can’t see. Give your agents identities now, while it’s still cheap to do.

Thanks for sharing your knowledge and expertise. Is there anything else you'd like to add?

Thank you for interviewing me.

Interview with Ashish Dsa, CTO & Co-founder, Arbor

Up Next