SELECT database

Everything is data: every letter written, word spoken, every call, image, video, and log file; every sports record, every flight completed, every seabed mapping, every video captured, every ultrasound, every seismometer measurement and every piece of music. You get the idea. It’s all data.

Mark Stevenson
October 27, 2023

A (not very) recent tweet by Garry Tan caught my eye: ‘GPT wrapper as a pejorative is like calling all SaaS companies “SQL database wrappers”.’

It’s fair to call most SaaS companies and apps database wrappers. Taking a look at my iPhone, I’ve got several apps. I’ve got banking apps, which are fundamentally ledgers, which are simply a database, plus a few other things like credit risk modelling and open banking APIs. Then I’ve got map apps: these are a little bit more complex, and will have SQL, NoSQL, graph and geospatial databases. I’ve got travel and flight booking apps. These connect to hotels, flights, hire cars, travel insurance and attractions via APIs and store the data in relational and graph databases. I’ve got email apps: database wrappers. I don’t have all of Instagram, TikTok and Snapchat, but they centre around images and videos and are database wrappers. Finally, the super apps (or the “everything apps”) such as China’s WeChat and Alipay and South East Asia’s Grab all centre around data, and yes, you guessed it, are “database wrappers.”

If you think I’m oversimplifying, you’re probably right, but I’m also not wrong. Whenever you interact with one of these apps, whether to make a purchase, transfer money, plan a route, or send an email, you transfer data and create, update, or perhaps remove a record from a database. It’s fundamental to technology and the modern economy. Yes, modern software is fundamentally a database wrapper in many ways. Garry Tan is correct, a database wrapper isn’t and can’t be a pejorative because it’s incredibly powerful.

Even with products like Amazon Alexa, you can view them as a database wrapper or perhaps a machine learning model wrapper. Actually, I’ve probably done the Amazon engineers and scientists a disservice here: it’s an automatic speech recognition, natural language understanding, text to speech, deep learning, transfer learning wrapper. But that’s a bit of a mouthful.

If SaaS companies are database wrappers, then many technology companies are machine learning wrappers, and the recent newly VC-minted AI companies are LLM wrappers. Recent model capability improvements provide an insight into the proliferation of LLM wrappers.

Source: Our World In Data

There’s nothing wrong with wrapping. On the contrary, there’s plenty right. The fact is this: some of the very best ideas are disconcertingly simple. Often, good technology bundles or slightly tweaks an existing idea into a better product.

Those who dismiss an idea or product as a “GPT wrapper” are missing the point. After all, technology works on the basis of abstraction and modularity. It’s the basis of the first GUI, the basis for web browsers from Mosaic onwards, and the basis for software such as Siri. A “GPT wrapper” is potentially the next big idea. A fair note of caution to GPT wrappers is that they potentially lack barriers to entry, but that’s the case with many early stage startups. We could even go further; instead of a “GPT wrapper”, let’s call it a “probability distribution wrapper”.

In fact, I will go one step further. Products are data. Companies are data. They always have been data, it’s just we’re getting better at it.

If companies are data, then it is perfectly understandable why the volume and success of SaaS and technology companies have exploded in the last two decades. The volume of data created has exploded. Neither shows signs of abating. As Marc Andreessen said, “software is eating the world.” In 2011, he said, “We believe that many of the prominent new Internet companies are building real, high-growth, high-margin, highly defensible businesses.” His prediction turned out to be prescient.

Source: Statista

The growth in data has led to improvements in data collection, storage and processing capabilities. On top of that, we’re doing increasingly useful things with all the data we’ve collected: ERP systems, demand forecasting and stock replenishment systems, recommendation engines, immersive gaming, self-driving cars, high frequency trading, new financial products and facial recognition technology. Everything is data: every letter written, word spoken, every call, image, video, and log file; every sports record, every flight completed, every seabed mapping, every video captured, every ultrasound, every seismometer measurement and every piece of music. You get the idea. It’s all data. Due to this, data science, machine learning and artificial intelligence will continue to grow. We still only collect a tiny proportion of all the data it’s possible to collect, and of the data we collect, it’s a small proportion we transform into something useful.

To support this industry are a whole host of companies and tooling. You’ve got traditional relational databases such as PostgreSQL and MySQL; tools for streaming data like Apache Kafka and Amazon Kinesis; data warehouses like Google BigQuery and Snowflake; machine learning platforms like AWS SageMaker and Databricks; data orchestration and workflow tools such as Airflow; experimentation platforms like Optimizely and open source frameworks like PyTorch and scikit-learn. That’s to say nothing for all the in-house custom built tooling, especially in technology firms. Yet, despite this, the sector is still ripe for innovation and disruption. Many thought the relational database had little innovation left to give, but clearly not.

What is perhaps interesting and telling, though, is that in the 1980s and 1990s, the people who managed relational databases undoubtedly were a back office function, sat in a cubicle, well away from where the “real business” happened. But now, the power of machine learning and the importance of databases has never been more critical to successful technology companies. Now, databases are the “real business”. Now, these geeks are doing staggeringly sophisticated things. Never bet against the nerds. That’s why a few years ago, Rakuten’s CEO said he wanted all of his 17,000 employees to be able to code. And it’s why Marc Andreessen said, “Find the smartest technologist in the company and make them CEO.”

I propose one way of comparing companies is by analysing the proportion of employees in data, engineering or technology based roles: the higher, the better. We could compare a company by looking at how many statistical experiments they perform each year (normalised by a measure of size). Jeff Bezos said: “Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day.”

I could easily be wrong, but as I sit here in 2023 and look forward to 2033 (yikes), I expect future innovations to continue the current trend: they’ll happen at the intersection of engineering (both software and hardware) and data. As with any prediction, we should be humble, but I’m confident about this. Why? Because from the difference engine to the punch card, to the Turing Machine and the search engine, the combination of engineering and data has pushed the boundary of what is possible forward.

The humble “database wrapper” is still at Day 1.

Reply

or to participate.