Struggling to make sense of the world post-ChatGPT? So are the rest of us. Let’s explore the basics of artificial intelligence in a more humane way.
Are you feeling overwhelmed, swept away, or inundated by the volume of news around artificial intelligence in the last 9 months? Have you felt concerned, frightened, or just confused? You’re not alone.
And here’s the secret: even the world’s most accomplished technologists are struggling to keep up with the incredible pace of progress in the machine learning and artificial intelligence space right now. Not all will admit it, of course, but at every (well ventilated) dinner party I’ve been to or backchannel conversation I’ve had since the latest ChatGPT drop, there’s a healthy amount of acknowledgement that none of us alone can truly comprehend the scope of what’s going on in real time.
So what does this mean for you, fellow bystander of the science fiction era? First of all, please don’t internalize your feelings as bad, or your lack of knowledge as a sign that you’re behind the times. It is (almost) impossible at the moment to be ahead of the times right now.
Talking about every single piece of breaking news would consume too much time, and quite frankly much of it becomes irrelevant quickly. The best way to cope is to build a strong understanding of some of the core fundamental concepts that drive many of the technologies that the press refers to collectively as “artificial intelligence”. This will help you make more informed judgements when reading the next breathlessly excited marketing pitch, or the next time a ChatGPT powered chatbot tries to convince you that the year is actually 2022.
In this article, we’ll cover:
- What is AI, anyway?
- Where the intelligence starts: Machine Learning
- An AI’s Favorite Subjects: Different models for different problems
- Putting it all together
Once you’re armed with the basics, we can move on to future posts where we get more into the details about how your personal interactions with AI may unfold — like the definition of “scraping”, how AI can end up being wrong, and how you can build your AI literacy to keep yourself safe when using new technology. I
It’s a tough time, and there are genuine threats to people’s livelihoods out there. But it’s tough to fight without all the information your opponents will have. Let’s fix that — starting with the basics, and in future articles getting into the creative implications.
What is AI, Anyway?
If you talk to someone “in the business” and ask them about artificial intelligence, you may get a somewhat salty reply. That’s not about you. The reality is that the term “artificial intelligence” has become so overly used and broadly applied — ironically, at a time when we’re really not QUITE reaching the metrics of what many purists would consider “intelligence”.
In proof of the broad application of the term, trying to find a definition on the web yields a wide variety of results. IBM’s take:
“Artificial intelligence leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind.”
OK, problem solving and decision making. Sure. But then the Oxford Dictionary takes things in a much more specific direction:
The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Decision making is still there, but now we’ve got speech recognition, visual perception, and translation too? What gives? In the interest of breaking this down in a more understandable way, let’s think about AI like this:
Traditional programming tells a device or system EXACTLY how to behave in every scenario. If an unexpected scenario occurs, it will likely throw an error.
When we build apps and experiences using artificial intelligence, we are often either solving for highly complex problems or problems where we know there will be unexpected scenarios down the line. Rather than giving our system all the answers, we are training it to expect the unexpected — giving it heuristics (rules) for how to act when it encounters different types of scenarios.
Where the intelligence starts: Machine Learning
This “training” often involves giving the system MANY examples (thousands+) of input and expected behaviors, like what a person says and what the system should do in response. From there, we ask the system to predict the correct response for new inputs based on past outcomes. The process of taking a large dataset and generating a procedural (ie, app) behavioral model from it is typically called “machine learning”, because our “machine” is learning by looking at the dataset. Cue the old 80’s anti-drug commercial. These systems are learning by watching us, in a way.
In many cases, operating at this scale means a developer may not be able to reliably predict what their system will do in a specific scenario until they’ve observed their solution working many, many times.
In the interest of completeness, here’s what Oxford had to say about the definition of machine learning — a definition that calls out statistics and the lack of specific instructions:
“The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences from patterns in data.”
When we start talking about digital systems (which can operate at great scale and influence) operating in unexpected and sometimes unpredictable ways, that’s where you get the great unease and uncertainty that’s permeating the public discourse right now. And it’s entirely fair.
We should always question the systems that define how we live, and AI-powered systems will absolutely be shaping the human experience from here on out. Improving our digital literacy will help us all keep the creators of these systems honest and accountable.
There are three subtypes of machine learning:
- Supervised, where a human labels data (“this is a picture of a hot dog”) and the machine learns to predict the label for new data based on old data
- Unsupervised, where the machine is not given labels and must draw inferences from clusters within large datasets
- Reinforcement learning, where an algorithm is trained using trial and error, reinforcing positive behavior and discouraging negative behavior.
The unsupervised form of machine learning is the most vulnerable to systemic bias — where bias in a dataset creeps into the recommendations that come from an algorithm, but because we haven’t supervised how the machine came to these conclusions it is hard to detect the issue. For example, content moderation algorithms can unintentionally reinforce societal biases when they penalize folks who are the subject of hateful harassment, or boost hateful content because it looks engaging.
Microsoft’s “Tay” Twitter bot, released briefly in 2016, is an example of unsupervised learning gone haywire — the bot was allowed to run on its own and begin making inferences on appropriate responses based on what it saw on Twitter without intervention. Within days, the bot became toxic and had to be shut down. (I won’t repost the offending Tweets here, but the IEEE article is an excellent summary.)
“Using this technique, engineers at Microsoft trained Tay’s algorithm on a dataset of anonymized public data along with some pre-written material provided by professional comedians to give it a basic grasp of language. The plan was to release Tay online, then let the bot discover patterns of language through its interactions, which she would emulate in subsequent conversations. Eventually, her programmers hoped, Tay would sound just like the Internet.” — Oscar Schwartz, IEEE Spectrum, Nov 2019
An AI’s Favorite Subjects: Different models for different problems
Thanks to Hollywood, when folks outside the tech industry hear “AI” they often think of a “singularity”: a single AI capable of solving more problems than a human can, as depicted by HAL or SkyNet. But we’re very far away from that cinematic vision in practice.
The systems we call “artificial intelligence” in today’s parlance are typically purpose built for a specific type of problem solving, often tied to a specific type of human sense or form of expression.
It may be helpful to think about this like a choosing a college major. Sure, digital systems are capable of almost anything in theory, just like humans. But college students typically declare a major, which means they are shaped by training and exposure to unique experiences in support of that major, leading them to become specialists in a given field.
Machine learning trained statistical models are like students who declared a major and studied ONLY that major, very hard, becoming sought-after specialists in their fields.
While there are many forms of machine learning and applied artificial intelligence, we’ll cover four of the most popular here: Natural Language Processing and Understanding, Large Language Models, generative content models, and human identification models.
Natural Language Processing / Understanding (NLP / NLU)
One of the earliest forms of mainstream artificial intelligence, natural language processing and natural language understanding models (NLUs) are used to help systems understand humans when they express their desired actions in the form of language instead of clicking on buttons or gesturing with a mouse or their hands.
Natural language processing (NLP) models examine text, typically human-generated, and make sense of it based on grammatical rules and knowledge assembled in specific domains and languages. Essentially, transcription.
Have you ever done one of those challenges where people give you a prompt and you use autocomplete to finish it in your SMS chat or Google Search? There are even games built on this mechanic, where you try and guess what the predicted next word will be based on a prompt. That guessing behavior is essentially the same model an NLP uses to figure out what you’re saying. Based on the words it understood from you, it’s trying to guess the most likely next word in the sentence, even if the recording got a little muddled or you left out a word or got out of order. Those guesses are informed on the tens of thousands of other utterances the system has heard or seen before.
A key word here is “guess.” Like all contemporary AI systems, natural language understanding systems are typically generating a series of best guesses based on statistical probability.
Alexa is not always right. Bing Chat and Bard can be blatantly wrong. They are almost never “sure”, even if they act that way.
We’ll talk later about how you can cope with that reality as a customer using AI-powered systems.
Natural language understanding (NLU) models take this a step further, and seek to link an interpretation of the words a person created and take action based on the intent behind them. This is the difference between transcribing your request to turn off the lights — and actually turning them off.
Speech-based NLU models like those that power Alexa are trained via supervision on tens of thousands of pairs of “utterances” — recordings of humans making a request with a transcription of the word for word digital equivalent of that request — along with the expected system behavior for those requests, in most cases.
NLP and NLU systems are a critical form of AI, and are often used to power other forms of AI like generative content models, since they need to understand what kind of content you want to make.
Large Language Models (LLM)
Large Language Models (LLMs) deal in the art of human expression via language. They are trained with massive datasets (100,000s+) of human writing samples, which enables them to decide upon an appropriate response in real time based on a novel customer request. Unlike natural language understanding models which usually stop at understanding grammar and structure and core word intent, LLMs go far beyond in their ability to generate novel types of written content: from summaries of factual content to novel narrative content.
Language is complex, and the ability to answer logically about specific subjects harder still, so the most popular LLMs like ChatGPT, Bing Chat, and Google’s Bard have been trained on vast datasets comprised of a large subset of the Internet. The written word tends to be more verbose than the spoken word, so for now these systems tend to be separate from their voice-controlled cousins. Due to the vast size of these datasets and the typically unsupervised learning model, one term you may hear thrown about with regard to LLMs is “stochastic parrots”.
“[A stochastic parrot is a] large language model that is good at generating convincing language, but does not actually understand the meaning of the language it is processing.” — Wikipedia
This term was coined by Emily M. Bender in an influential paper written with Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell: “On the Dangers of Stochastic Parrots: Can Language Models be Too Big?”. Originally penned in collaboration with Google’s Ethical AI group, this paper resulted in the forced departure of group leaders Timnit Gebru and Margaret Mitchell (followed by others in protest) due to Google’s reluctance to publish the article, seemingly as it might paint their work in a negative light. The concerns are many:
- When models are so large that we can’t fully understand where their data comes from, how can we trust the data they generate?
- How can we be sure the model and its data is not inaccurate or biased?
- And what does it say about a company that claims to be pursuing ethical AI that they would rather decimate marginalized leaders on their Ethical AI team than face open discussion of their own methods?
These concerns may look specific to Google in this instance, but they apply to all large language models, and they are unsolved problems.
Generative Content Models
The previous two examples have focused on text-based communication, but there are other ways a model can be trained and expressed. In the last year or so we’ve seen huge leaps in the ability of systems to generate novel visual content based on human prompts. You describe what you want the system to generate, and you will get an attempt at that work in near-real time. Often, you can get very specific about stylistic direction, era, composition, and more.
I won’t include a visual example here because generative visual content models are complicated — US law does not currently allow for copyright of any content that directly incorporates AI-generated art.
Part of the controversy around generative content models is the nature of the datasets. Moreso than text, visual and audio data when sampled easily ends up heavily stylistically influencing final results. In this way, a generative content model can plagarize or even steal the artistic work of a human creator, most often without credit. The current crop of generative content models like Midjourney have not provided any proof of payment to creators for the use of training data, so it is assumed that any modern stylistic data outside the public domain may be a copyright infringement that is exploiting human creators and taking work from them.
Generative content models are at the center of the current AI vs. labor debate for this very reason. Generative content doesn’t have to be visual — it can be audio, 3D models, prose, even computer code! There are a wide number of industries potentially affected. Generative models learn by “watching you”, to borrow a meme from an old 80’s drug commercial. But while humans who watch you typically can’t copy you directly, AI models are more sophisticated and are capable of mimicing styles to the point of plagarism at times. This is especially problematic when these models were trained on artists’ work without compensating the original artists. We’ll talk more about these nuances in a future article, as there is a LOT to unpack.
Human Identification Models
This isn’t necessarily a single AI system so much as an aggregate of systems, but to most of us it functions as a single unit. A common emerging use of AI is identification of humans in group or singular settings. Some examples:
- Delta Air Lines check in counters
- Global Entry kiosks
- Proprietary in-store and venue security systems
- Government monitoring systems
- FaceID on Apple devices
When using AI to identify a specific human amongst many others, it’s generally a computer vision problem. When you’re using it as a personal identification system, you’ll likely “train” your own model by taking multiple pictures from multiple angles, as seen on Apple devices. In other cases where you are being differentiated amongst other people in a large set, it becomes more complicated. There is often a combination of training from government datasets (passport photos, driver’s photos, surveillance, biometrics), public datasets (social media), and criteria to help narrow the pool of people (current travelers, people of interest, etc.).
Putting it all together
Once you start learning about the different types of machine learning and artificial intelligence, you start to realize that your favorite services and apps aren’t really big brained monoliths after all. They’re made up of multiple smaller services and capabilities that combine to give us the sense of a coordinated system. And these are just a few examples of the types of “artificial intelligence” or machine learning that are possible out there — it’s an expansive, exciting, and sometimes overwhelming space.
But one thing to know is that these systems need data to learn. That’s why data privacy remains such a hot topic. Your data —your social media posts, how and when you use your devices, where you go, what you eat, who you’re friends with — can be used to train machine learning models for both positive and negative outcomes. In some cases, that information might be used to predict potential health problems and propose early interventions. In other cases, that data might be used to determine whether you are more likely to commit a crime, and potentially lead to discrimination against you as a result. Data is powerful, and it can be misused.
It is always reasonable to question how a company is using your data, and why. And it’s reasonable to ask what value you’re getting in exchange for sharing your data: rest assured, the company getting your data is getting value from you.
This is all fine and good academically, but what does it mean to actually live with these systems? How is data being gathered? Can we trust all AI-powered systems? If not, how do we determine which AI-powered systems are acting in our best interest?
In our next posts, we will explore these concepts more deeply. We’ll dig into concepts like “scraping” and why that’s causing API lockdowns, we’ll talk about the potential harms caused by the current generation of AI-powered systems, and we’ll talk about techniques you can use to protect yourself from the worst of the harms. Digital literacy is your best defense for living in a science fiction world.
Got questions for future posts? Pop them in the comments. I’d love to hear what’s on your mind.
If you’d like to go deeper on this topic before our next post, a few options:
- “We Read the Paper that Forced Timnit Gebru Out Of Google — Here’s What it Says” (MIT Technology Review) A deeper analysis of the “stochastic parrots” paper, which exposes some of the challenges facing LLMs / large language models.
- Design Beyond Devices: Creating Multimodal, Cross-Device Experiences: I included a whole chapter on core AI concepts (pre-GPT4) in my book, along with design ethics frameworks that help those working with AI assess the potential harms of their work and mitigations before getting too deep in the process.
- Understanding the Impact of AI on Kids and the Future (Brad Bartlett) — A good high level overview, plus some impact analysis
- A Conversational Design Primer (Cheryl Platz): For learning more about the basics behind natural language understanding models.
- Better Together: Guidelines for designing human-AI interactions (Ruth Kikin-Gil) — Ruth has been working on the intersection of AI and humanity for a long time at Microsoft, and this is just one of many talks she’s given on the subject.
- A-Z Guide to the Types of Machine Learning Problems (Badr Salah)— a solid if a bit dense breakdown for those who want details
Cheryl Platz is a world renowned user experience designer, best selling author, professional actress and speaker, accomplished video game developer, and owner of design education company Ideaplatz, LLC. Her book Design Beyond Devices: Creating Multimodal, Cross-Device Experiences is available from Rosenfeld Media or your favorite online bookseller. You can keep up with her on Twitter, Bluesky, or Mastodon.