Startups

The Challenges Of Building AI Apps

Comment

Image Credits: ChristianChan (opens in a new window) / Shutterstock (opens in a new window)

Mike Chalfen

Contributor

Mike Chalfen is a co-founder and partner at Mosaic Ventures.

Artificial intelligence (AI) has an intellectual lineage stretching back to the greats of computer theory: Turing and, ultimately, Babbage, inventor of the calculating machine. What we now see in London, where leading teams such as DeepMind are working on machine learning, is the movement from the realm of computer science to practical uses and business cases.

It’s not just Google, which bought the DeepMind team last year, or Facebook, with a 50-person AI lab, that see the possibilities. Roughly one-sixth of YC companies seemed to be using machine learning in the most recent cohort, while IBM has bet billions on the success of Watson, its Jeopardy question-answering supercomputer.

Thousands of companies are taking advantage of infrastructure to manage or extract insight from large datasets. They are all working to predict outcomes or recommend or execute actions based on analyses of programmatically available digitized data.

I’d like to share some of the challenges facing startups that are attempting to build AI-based applications for businesses, and how some companies are attempting to overcome those challenges. Selecting, perfecting and combining the algorithms themselves is only a small part of the thoughtful work done by the best entrepreneurs. Other important factors include:

  • Proprietary access to unique data that will form the basis for the training data set.

  • A clear view of how insight will be generated, with tightly coupled software that intelligently extracts meaning from the data or evaluates which data requires human classification.

  • If possible, a data model that can accommodate new data sources as they emerge.

  • A skilled team that can write or adapt publicly available algorithms, select the right algorithm for the desired result and combine algorithms as needed to optimize the result.

A couple of years ago, data analysis of any kind was labelled as “data science.” Today, the label AI is widely applied, sometimes carelessly. So let’s first consider what is described as AI.

Current commercial applications are “narrow” or “weak” forms of AI. This means the machine specializes in one area, and cannot infer from first principles as humans can (“general AI”). Narrow AI is based on well-understood techniques being exploited commercially for the first time. What is considered AI can quickly become a well-understood data science technique.

A closely related approach is “deep learning,” whereby data inputs are not pre-described. Rather, models learn about data (and data structures), then, using multiple layers of non-linear feedback, learn important features of the data and even self-refine.

While the technique has been around for more than 20 years, wider access to compute power is finally a match for the data intensity it requires. London-based startup Improbable is an exciting example of a company using vast compute power and deep learning to simulate complex environments, ranging from open-ended game worlds to cities.

Still, many startups we meet claim to incorporate machine learning (ML) into their technologies. For most of these companies, when we scratch beneath the surface, ML is not a truly important element of the product. In some cases it is just a veneer to make a project seem cutting-edge. In others it is real, but just table stakes, so does not offer a technical barrier to competitors; rather, it enables startups to offer an increasingly accurate and efficient service for customers.

For instance, some startups will use commodity code, of which there are many significant open-source libraries. An interesting open-source project for distributed stream and batch data processing, Apache Flink, has collated a library of publicly available ML algorithms that will scale to large data sets.

Amazon launched machine learning as a service in April, and startups such as MetaMind are aiming to offer AI as a service to developers, an extension of the more crowded market of predictive analytics as a service. The reality is that most algorithms are well-known, and AI learning techniques will commoditize fast.

So companies building products using narrow AI need to be very thoughtful about how they build and improve their products or services.

The Moat: Training Data

Training data is at the heart of building distinctive narrow-AI-based products. Startups need to find sources of structured data that can help them build the best possible models. In this case, “best” means the data set is large enough to learn from, and varied enough that it may help a range of customers rather than only one, and the resulting machine can seamlessly improve processes or decision-making.

Machine learning theory states that with unlimited data, we could expect all algorithms to produce similar-quality results. So startups will only resist commoditization if they have access to a unique data set and extend their early lead by continuously learning how to improve their algorithms based on end-user interactions. The most famous example is Google’s use of clickstream data as a private source of training data to improve search-ranking results.

As we have touched on before, startups sometimes confuse revenue traction with value creation. Choosing projects that yield short-term revenue based on easily available data sets is unlikely to yield a differentiated, valuable application.

One example is Digital Genius, a London- and New York-based startup focused on automating customer service conversations. The founder bootstrapped for its first couple of years. Admirable though that is, the initial technology and commercial choices were not scalable. The first version of its technology was very flexible, but it needed to be highly customized. Also, initial demand was for lower-value applications in marketing services. The combination was not attractive to venture investors at that point.

However, the company may well have found its way. First, the team created a platform it can reuse for many different text-based AI applications, whereas it had started with a tool set. Second, it has found a high-value focus in automating text conversations. Importantly, the algorithm is based (amongst other data quarries) on analyses of huge repositories of real call-center transcripts. This may now yield a replicable product that can be the foundation of a large business.

A Technology-Driven Process To Extract Insight And Meaning From The Data Set

Having access to useful data sets is only a start: A system needs to extract accurate metadata from the data, and use it as an input to improve the machine’s accuracy.

We find that the best AI-driven startups are focused on increasing both the throughput and the refinement and accuracy of their algorithms. That takes iteration and time — and a lot of data — to get right.

For example, Unbabel, a Lisbon- and San Francisco-based startup focused on AI-augmented translation. In order to deliver, the company must create scalable methods for translators to annotate, refine and reject machine translations. The workflow software that Unbabel’s translators use to assess translation accuracy is strikingly granular. Rather than a simple yes/no/maybe, 15-20 measures of accuracy are judged by the translator, which also suggests an alternative. Accuracy in this case can also include brand suitability for Unbabel’s commercial customers. The machine then uses this feedback to self-improve.

That is an intelligent and well-managed approach to model improvement. It solves for quality and scale, rather than just efficiency, and acknowledges that the machine is a work in progress and not yet ready for full automation of translation tasks.

That iterative combination of training data and machine accuracy is the heart of what many startups are working through.

How Do You Make It All Work?

A lot of commentary on AI-based applications makes building them sound straightforward. Yet AI itself is rarely sufficient. As with many disruptive software opportunities, startups using AI need to be competent in multiple spheres and make the product or service easy to use.

Even once the right algorithms are chosen, a good data set identified and a process to improve and scale ML is hardened, startups are often just at the starting line. Some challenges (and often ones that are worthy of venture capital funding) require innovation on multiple fronts. Even for narrowly focused startups, engineering challenges are rarely one-dimensional.

IT operations-focused startup Moogsoft is a good example (full disclosure: I am an angel investor). Phil Tee, the founder and CEO of Moog, is a fifth-time founder, and as the founding CTO of Micromuse was responsible for the dominant incumbent in network management. His goal was to work out how to process millions of different event data points so that IT operations could be evaluated across the full stack.

He saw that he would need to build a machine that was model-free so that it could make sense of new data sources on the fly, as operations evolved. This required the technical chops to build relevant algorithms that together could cope with untagged data. Phil then went further and broke additional ground by predicting faults — all whilst tuning the machine for processing at scale and in real time.

The team also needed to have the understanding of enterprise use cases so that the software was effective in reducing time to resolve and troubleshoot tickets, and in delivering transparency to the affected organizations. This combination is not trivial.

Of the many potential applications of AI that get us excited — for example, automated code generation, QA or optimization platform, automated risk and lending decisions in the financial supply chain, automated legal documentation and contract analytics, or automated visual assessments such as health checks or insurance claims adjustments — many fall into this category of startup management and engineering challenges that are not going to be straightforward to solve.

What’s The Right Team?

Assembling the right team is a challenge. Supply of graduates from the world’s best computational linguistics, machine learning and data science programs cannot meet demand. Google and Facebook are building teams and acquiring startups with critical mass, offering recruits the chance to work on both general and narrow AI problems with enormous resources at their disposal. Their pay scales make it difficult for smaller startups to recruit. Startup CEOs who are aiming to recruit the best in their specific fields have to recruit globally.

Most importantly, startups must offer recruits an exciting problem to solve in order to attract a world-class team. At least, as we’ve shown, the valuable problems also tend to be the harder ones. Mere efficiency gains are not attractive enough as a mission to attract the best. And once the ML team is assembled, as the Moog example shows, wider skills are needed to turn a working machine into a commercially viable product.

AI, predictive analytics and data science-driven startups are only going to grow in size and in importance. Navigating how to build them is not straightforward.

If you are working on a very ambitious project in this field and have identified unique or proprietary training data, and have a product and a business model that can capitalize on the insights from the data and a well-rounded team to go to market, please get in touch, we would love to learn more.

More TechCrunch

The best known mycoprotein is probably Quorn, a meat substitute that’s fast approaching its 40th birthday. But Finnish biotech startup Enifer is cooking up something even older: Its proprietary single-cell…

Meet the Finnish biotech startup bringing a long lost mycoprotein to your plate

Silo, a Bay Area food supply chain startup, has hit a rough patch. TechCrunch has learned that the company on Tuesday laid off roughly 30% of its staff, or north…

Food supply chain software maker Silo lays off ~30% of staff amid M&A discussions

Featured Article

Meta’s new AI council is composed entirely of white men

Meanwhile, women and people of color are disproportionately impacted by irresponsible AI.

10 hours ago
Meta’s new AI council is composed entirely of white men

If you’ve ever wanted to apply to Y Combinator, here’s some inside scoop on how the iconic accelerator goes about choosing companies.

Garry Tan has revealed his ‘secret sauce’ for getting into Y Combinator

Indian ride-hailing startup BluSmart has started operating in Dubai, TechCrunch has exclusively learned and confirmed with its executive. The move to Dubai, which has been rumored for months, could help…

India’s BluSmart is testing its ride-hailing service in Dubai

Under the envisioned framework, both candidate and issue ads would be required to include an on-air and filed disclosure that AI-generated content was used.

FCC proposes all AI-generated content in political ads must be disclosed

Want to make a founder’s day, week, month, and possibly career? Refer them to Startup Battlefield 200 at Disrupt 2024! Applications close June 10 at 11:59 p.m. PT. TechCrunch’s Startup…

Refer a founder to Startup Battlefield 200 at Disrupt 2024

Social networking startup and X competitor Bluesky is officially launching DMs (direct messages), the company announced on Wednesday. Later, Bluesky plans to “fully support end-to-end encrypted messaging down the line,”…

Bluesky now has DMs

The perception in Silicon Valley is that every investor would love to be in business with Peter Thiel. But the venture capital fundraising environment has become so difficult that even…

Peter Thiel-founded Valar Ventures raised a $300 million fund, half the size of its last one

Featured Article

Spyware found on US hotel check-in computers

Several hotel check-in computers are running a remote access app, which is leaking screenshots of guest information to the internet.

13 hours ago
Spyware found on US hotel check-in computers

Gavet has had a rocky tenure at Techstars and her leadership was the subject of much controversy.

Techstars CEO Maëlle Gavet is out

The struggle isn’t universal, however.

Connected fitness is adrift post-pandemic

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the first months of 2024. Smaller-sized…

15 hours ago
A comprehensive list of 2024 tech layoffs

HoundDog actually looks at the code a developer is writing, using both traditional pattern matching and large language models to find potential issues.

HoundDog.ai helps developers prevent personal information from leaking

The changes are designed to enhance the consumer experience of using Google Pay and make it a more competitive option against other payment methods.

Google Pay will now display card perks, BNPL options and more

Few figures in the tech industry have earned the storied reputation of Vinod Khosla, founder and partner at Khosla Ventures. For over 40 years, he has been at the center…

Vinod Khosla is coming to Disrupt to discuss how AI might change the future

AI has already started replacing voice agents’ jobs. Now, companies are exploring ways to replace the existing computer-generated voice models with synthetic versions of human voices. Truecaller, the widely known…

Truecaller partners with Microsoft to let its AI respond to calls in your own voice

Meta is updating its Ray-Ban smart glasses with new hands-free functionality, the company announced on Wednesday. Most notably, users can now share an image from their smart glasses directly to…

Meta’s Ray-Ban smart glasses now let you share images directly to your Instagram Story

Spotify launched its own font, the company announced on Wednesday. The music streaming service hopes that its new typeface, “Spotify Mix,” will help Spotify distinguish its own unique visual identity. …

Why Spotify is launching its own font, Spotify Mix

In 2008, Marty Kagan, who’d previously worked at Cisco and Akamai, co-founded Cedexis, a (now-Cisco-owned) firm developing observability tech for content delivery networks. Fellow Cisco veteran Hasan Alayli joined Kagan…

Hydrolix seeks to make storing log data faster and cheaper

A dodgy email containing a link that looks “legit” but is actually malicious remains one of the most dangerous, yet successful, tricks in a cybercriminal’s handbook. Now, an AI startup…

Bolster, creator of the CheckPhish phishing tracker, raises $14M led by Microsoft’s M12

If you’ve been looking forward to seeing Boeing’s Starliner capsule carry two astronauts to the International Space Station for the first time, you’ll have to wait a bit longer. The…

Boeing, NASA indefinitely delay crewed Starliner launch

TikTok is the latest tech company to incorporate generative AI into its ads business, as the company announced on Tuesday that it’s launching a new “TikTok Symphony” AI suite for…

TikTok turns to generative AI to boost its ads business

Gone are the days when space and defense were considered fundamentally antithetical to venture investment. Now, the country’s largest venture capital firms are throwing larger portions of their money behind…

Space VC closes $20M Fund II to back frontier tech founders from day zero

These days every company is trying to figure out if their large language models are compliant with whichever rules they deem important, and with legal or regulatory requirements. If you’re…

Patronus AI is off to a magical start as LLM governance tool gains traction

Link-in-bio startup Linktree has crossed 50 million users and is rolling out the beta of its social commerce program.

Linktree surpasses 50M users, rolls out its social commerce program to more creators

For a $5.99 per month, immigrants have a bank account and debit card with fee-free international money transfers and discounted international calling.

Immigrant banking platform Majority secures $20M following 3x revenue growth

When developers have a particular job that AI can solve, it’s not typically as simple as just pointing an LLM at the data. There are other considerations such as cost,…

Unify helps developers find the best LLM for the job

Response time is Aerodome’s immediate value prop for potential clients.

Aerodome is sending drones to the scene of the crime