How People Actually Search in AI Mode Now: Planning, Multimodal, and Multi-Step Queries (2026)

The Shift in How People Search

For two decades, search behavior was stable enough that most SEO advice could take it for granted. People typed a few keywords, scanned a page of blue links, clicked one or two, and either found what they needed or refined the query. Content strategy was built around that pattern: target a keyword, answer it cleanly, rank for it, win the click. AI Mode broke that pattern, and it broke it fast enough that a lot of content built for the old behavior is now aimed at a question nobody is asking the way it used to be asked.

The change is not subtle. People in AI Mode write longer, describe their full intent up front, and then keep going. A search that once ended at the first answer now continues into a back-and-forth, with each follow-up building on the last. They are also no longer limited to typing. A growing share of searches start with a photo or a spoken question rather than text in a box. The result is a surface where a single search is closer to a conversation than a lookup, and that has direct consequences for what content needs to do to show up in it.

This piece walks through what the behavior data actually shows, then turns it into practical decisions about content. The short version: cover whole planning journeys instead of single questions, answer the likely follow-ups on one page or a tight cluster, structure everything so an AI can parse it across a multi-turn session, and make your pages friendly to image and voice input. If you want the broader strategic frame for the surface itself, our AI Mode SEO playbook sets it up, and this article goes deeper on the behavior side.

The Numbers Behind the Behavior

Scale first, because it sets the stakes. Google AI Mode has surpassed one billion monthly users, with queries more than doubling every quarter since launch. That is not an experiment running quietly in a corner of search. It is a primary surface, used by enough people often enough that ignoring how they behave on it is no longer a defensible position for anyone who cares about being found.

1B+Monthly users on Google AI Mode, with queries more than doubling every quarter since launch

3xAI Mode searches are about three times as long as traditional searches

40%Month-over-month growth in follow-up queries in the US

1 in 6AI Mode searches in the US are already multimodal: voice, image, or video input

Each of those numbers points at a different change. The three-times-longer queries show that people are describing intent fully instead of keyword-fishing. The 40 percent month-over-month growth in follow-up queries shows that searches are becoming sessions. The one-in-six multimodal figure shows that the input itself is shifting away from typed text. Taken together, they describe a search surface that behaves nothing like the one most websites were optimized for, and the gap between the two is where the opportunity sits.

It is worth being precise about what these figures do and do not tell you. They describe how people are using AI Mode, not how Google ranks or cites sources inside it. But behavior and optimization are linked. If people are asking longer, multi-step, multimodal questions, then content that only answers a single short query in plain text is structurally mismatched to the demand. The rest of this article is about closing that mismatch.

The Rise of Planning Queries

The single clearest behavior signal is planning. Planning-related AI Mode queries have grown 80 percent faster than AI Mode queries overall in the past six months. People use it to build travel itineraries, schedule fitness routines, plan dinner parties, and manage household budgets. These are not lookups. They are projects, and people are now turning to search to help them work through the whole thing rather than to fetch one fact along the way.

Think about what a planning query actually contains. Someone building a five-day trip is not asking one question. They are asking about destinations, then routes between them, then where to stay, then what to do on a rainy day, then how much it all costs, then how to adjust when one piece does not fit. A single keyword-targeted page answers maybe one slice of that. The person doing the planning needs the whole arc, and they are increasingly comfortable getting it from an AI that can hold the context across every step.

The reframe. Stop thinking in keywords and start thinking in journeys. The unit of demand in AI Mode is a planning task with many connected steps, not a single query with a single answer.

For content, this is the most important shift to internalize. The pages that get pulled into planning sessions are the ones that cover the journey, not just the entry point to it. That does not mean every page has to be a ten-thousand-word monolith. It means the questions that naturally follow your topic should be answerable somewhere you control, ideally on the same page or in a closely linked cluster. We come back to exactly how to structure that later, but the mindset comes first: you are serving a project, not a query.

Longer, Multi-Step Sessions

The length and follow-up data reinforce the planning story. AI Mode searches are about three times as long as traditional searches, and follow-up queries are increasing 40 percent month over month in the US. People are not just asking bigger questions, they are staying in the conversation and asking the next one, and the one after that, without starting over. A session that begins with one intent often ends several turns later having covered ground the person did not even know to ask about when they started.

This changes what it means to win a search. In the old model, winning was ranking for the query and getting the click. In a multi-step session, the question is whether your content can supply useful, relevant material at several points across the conversation, not just at the opening turn. An AI working through a session is pulling from sources at each step. The site that shows up at turn one and then has nothing relevant for turns two through six is contributing to a small fraction of the answer the user actually receives.

There is a strategic distinction here that is easy to miss. AI Mode and AI Overviews are not the same surface and do not reward the same content in the same way, which is why we treat them as two separate strategies. AI Overviews tend to summarize an answer to a single query. AI Mode runs the conversation. Optimizing for the conversation means thinking about the whole flow of likely questions, which is a different planning exercise than optimizing for one summarized answer.

The practical implication is that comprehensiveness and structure matter more than they used to, and isolated thin pages matter less. A page that answers one narrow question well but sits alone, with no connection to the questions on either side of it, is poorly positioned for a surface where the question on either side is exactly what gets asked next. Coverage and connection are now ranking-adjacent qualities, not just nice-to-haves.

The Multimodal Shift

The other large change is in the input itself. More than one in six AI Mode searches in the US are already multimodal, meaning voice, image, or video input rather than typed text. And it is accelerating: searches with image input are among AI Mode's fastest-growing query types, increasing more than 40 percent month over month since launch. People are pointing their camera at a thing and asking about it, or speaking a question, far more than they were even a few months ago.

Image input in particular changes the optimization problem. When someone searches by photo, the AI has to understand what is in the image and then find content that matches it. Your text keywords are not the primary signal in that moment. What helps is whether your pages give clear, machine-readable context about the visuals on them: descriptive alt text that actually describes the image, real captions that explain what is shown and why it matters, and surrounding copy that names the entities in the picture. Pages that treat images as decoration are invisible to this kind of search. Pages that describe their images well become findable through it.

Voice input pushes in a related direction. Spoken queries tend to be longer and more conversational than typed ones, which compounds the length trend already in the data. Content that reads naturally, answers questions in plain language, and uses the phrasing real people speak is better matched to voice input than copy written to hit an exact-match keyword. We cover the specifics of preparing for spoken search in our guide on how to optimize for voice search, and the image side in detail in our image SEO guide.

Cover Whole Journeys, Not Single Questions

Here is the first concrete content move. Map the journey behind your topic and make sure you cover it. Take a planning task your audience actually has and write out the sequence of questions a person works through from start to finish. For a topic like planning a home renovation, that might run from setting a budget, to choosing what to prioritize, to finding contractors, to sequencing the work, to handling things that go wrong. Each of those is a turn someone might take in an AI Mode session. Your job is to make sure your content has a real answer at as many of those turns as possible.

This does not mean stuffing everything onto one page regardless of fit. It means deciding deliberately whether a journey belongs on a single comprehensive page or across a tight cluster of linked pages, and then building it that way. A contained journey with closely related steps often works best as one thorough page that an AI can read top to bottom. A journey where each stage is substantial enough to stand alone works better as a cluster, as long as the internal links between the pages make the relationships unmistakable. Either way, the test is the same: can an AI assemble the whole arc from content you control?

The structure has to be legible to a machine, not just to a careful human reader. Clear headings that name the question each section answers, defined entities, logical ordering, and explicit relationships between sections all help an AI pull the right piece at the right turn. This is the same comprehensiveness and structure thinking that makes a page citable in AI Overviews, which we break down in how to optimize for AI Overviews. The difference in AI Mode is that you are optimizing for a sequence of pulls across a session, not one pull for one summary.

Answer the Likely Follow-Ups

With follow-up queries growing 40 percent month over month, the follow-up is no longer an edge case. It is the main event. So a useful discipline is to take every primary question your content answers and ask: what does the person almost certainly ask next? Then make sure that next answer is right there, on the page or one tight link away. If your content answers question one and then leaves the person to go find question two somewhere else, you have handed the rest of the session to another source.

This is a different exercise from traditional keyword expansion. You are not gathering semantically related terms to sprinkle into copy. You are reconstructing the conversation. If someone asks how to do something, the follow-ups are usually how long it takes, what it costs, what can go wrong, and what to do instead if the first option does not fit. Anticipating those and answering them in order is what keeps your content in the session past the opening turn. The pages that do this well read like a knowledgeable person walking someone through a decision, because that is functionally what an AI Mode session is.

Comprehensive, well-structured content that anticipates the follow-ups is exactly what an AI can pull through a multi-turn session. The mechanics of why structured content gets selected, including the role of clear schema and explicit relationships, are worth understanding in depth, and our piece on structured data for AI search and citations covers the technical side. When the structure and the follow-up coverage line up, you stop being a single answer and start being a source the AI returns to.

Build Image-Friendly, Multimodal-Ready Pages

Because image input is one of AI Mode's fastest-growing query types, growing more than 40 percent month over month since launch, the visual layer of your pages has gone from cosmetic to functional. Every meaningful image on a page is a potential entry point for a multimodal search, but only if the page gives an AI enough to understand it. That means alt text that genuinely describes what the image shows rather than repeating a keyword, captions that add context a person and a machine can both use, and surrounding copy that names what is in the frame.

Treat your most important images the way you treat your most important paragraphs. If a product, a place, a diagram, or a step in a process is something people might photograph and ask about, the page that shows it should describe it precisely. File names, alt text, captions, and nearby text all contribute signals that help an AI connect a real-world photo to your content. Pages that get this right are positioned to be the answer when someone points their camera at the thing you cover, which is a query type that barely existed not long ago and is now growing faster than almost anything else.

The same logic extends to how clearly your content names entities in general. AI systems work in terms of entities and relationships, so a page that is explicit about what it is describing, in both its text and its visual context, is easier to surface across voice and image input alike. Our image SEO guide goes through the specifics of alt text and captions, and the broader entity thinking carries through everything we do on the AIO optimization side.

Optimize to Be Cited Across a Session

Put the pieces together and the goal becomes clear: you are not optimizing to be the answer to one question, you are optimizing to be a source the AI pulls from repeatedly across a session. That is a meaningful change in how to think about success. A page can rank first for a query and still contribute almost nothing to the final answer a user receives if the rest of their session goes to other sources. Conversely, content that supplies useful material at several turns can shape the whole answer even without owning the top spot on any single one.

This is why measurement has to change alongside content. Tracking rank for individual keywords tells you less and less about how visible you actually are in AI Mode, because the surface does not work in single ranked results. The more honest metric is your share of model: how often AI systems surface or cite your brand across the prompts that matter to you. We make the full case for this in share of model and AI visibility measurement, and it is the lens we use to judge whether content is winning on conversational surfaces.

Being cited across a session is also the practical bridge between AI Mode and the wider world of conversational answers. The same qualities that make content pullable through a Google AI Mode conversation, comprehensiveness, structure, clear entities, and follow-up coverage, are the ones that make it citable in other AI assistants. If you want the cross-engine view of how to earn those citations, our guide on LLM visibility and getting cited connects the AI Mode behavior here to the broader citation game.

Where to Start

If you want one place to begin, pick a single high-value planning topic in your space and rebuild it for how people actually search now. Map the full journey behind it. List the follow-up questions in the order people ask them. Decide whether the journey is one comprehensive page or a tight cluster, and structure it so an AI can read the whole arc. Write the images to be understood, not just displayed. Then check whether your content has a real answer at every turn a session is likely to take, and fill the gaps. That one exercise, done well, teaches you more than any amount of keyword auditing.

Keep the behavior data in front of you while you do it. A billion monthly users, queries three times as long, follow-ups growing 40 percent month over month, and one in six searches already multimodal: that is the demand your content is being measured against now. Content built for short typed keywords and single answers is competing for a behavior that is shrinking as a share of search, while planning, multi-step, and multimodal queries grow. The teams that adjust early get to define the answers people receive before the rest of their market notices the surface changed.

This is the exact work we do for clients. Our content strategy service maps the journeys and follow-ups that matter in your space, and our AIO optimization service structures the content so AI systems pull it through whole sessions and cite it across engines. If you would rather have a team that already builds for this behavior handle it, start with a conversation about your goals through our optimization consultation.

Frequently Asked Questions

How is searching in AI Mode different from a traditional Google search?

AI Mode searches are about three times as long as traditional searches, and people use them conversationally rather than in short keyword bursts. Instead of typing two or three words and scanning a page of links, people describe a full intent, then ask follow-up questions in the same session. Follow-up queries are increasing 40 percent month over month in the US, which means a single search is now often a multi-turn conversation rather than a one-off lookup.

What are planning queries in AI Mode?

Planning queries are searches where someone uses AI Mode to build something out step by step rather than look up a single fact. People use it to build travel itineraries, schedule fitness routines, plan dinner parties, and manage household budgets. These queries have grown 80 percent faster than AI Mode queries overall in the past six months, making them one of the clearest signals of how search behavior is changing. Our AI Mode SEO playbook covers how to structure content for these journeys.

How big is AI Mode now?

Google AI Mode has surpassed one billion monthly users, with queries more than doubling every quarter since launch. That scale, combined with how different the query behavior is, means AI Mode is no longer an experiment on the side of search. It is a primary surface that content needs to be built for, not an afterthought layered on top of traditional SEO.

What does multimodal search mean for my content?

More than one in six AI Mode searches in the US are already multimodal, meaning voice, image, or video input rather than typed text, and searches with image input are increasing more than 40 percent month over month. For content, this raises the value of image-friendly pages with descriptive alt text, real captions, and clear visual context, because those signals help AI systems understand and surface your images when someone searches by photo rather than by keyword. See our image SEO guide for the specifics.

Should I write one long page or a cluster of pages for a planning topic?

Both can work, and the right choice depends on the topic. The principle is that the likely follow-up questions for a journey should be answerable on one page or across a tight, well-linked cluster, so an AI can pull a complete picture without leaving your content. A single comprehensive page works when the journey is contained. A tight cluster works when each stage is substantial enough to deserve its own page, as long as the internal links make the relationship between them obvious.

How do I measure whether my content is being cited in AI Mode?

Track your share of model, which is how often AI systems surface or cite your brand across the prompts that matter to you, rather than only tracking rank for individual keywords. Because AI Mode sessions are multi-turn, the goal is to be the source pulled across a conversation, not just the answer to one question. Measuring presence across a set of related prompts gives a far more honest picture of AI visibility than a single ranking position does.

Never miss an update

Get the latest AI and SEO strategies delivered to your inbox.