Neither NLP nor SEO is new and unexplored territory. In a way, both of them deal with the intricacies of the linguistic properties of the human mind and its interaction with technology.
In other words, both tend to try and make sense of the way we humans express ourselves and return some output based on our linguistic input.
Search engines take user queries and attempt to make sense of them in order to understand the intent behind the query. And the optimization of content for search engines is, among other things, making sure the algorithms recognize your content as the correct answer to a user’s query.
But NLP has been around for far longer than SEO. Ever since the computer was invented, there was the problem of making it understand human language and complex semantics.
Only recently have the two become interconnected. Google realized the potential that breakthroughs in natural language processing have on providing users with the right content. The tech giant also realized that NLP is the key to staying afloat in the current technological climate.
This article will explain the fundamentals of NLP and how search algorithms use it. Moreover, it will cover the practical implications of natural language processing for content writers. Let’s dive into the expansive subject that’s NLP and its relation to modern-day SEO.
What Is NLP?
A quick Google search asking “what is NLP?” can be quite misdirecting. Most results will offer mindfulness training courses, courtesy of various self-help gurus and neuro-linguistic programming. But if you Google natural language processing, you’ll get a far more impressive set of results.
That’s the NLP we’re discussing here today. Natural language processing is more than some pseudoscientific hosh-posh loosely based on hypnotherapy. Unfortunately, there’s still no known way in which meditation can help with SEO, so if that’s what you’re here for, you’ll be sorely disappointed.
No, NLP we’re talking about is a science of processing and analyzing large amounts of natural language data. Natural language processing is a subfield of linguistics, computer science, and artificial intelligence. It’s concerned with programming computers to deal with the abundance of natural language data.
At first, NLP was used for machine translation of languages, but its uses soon exceeded simple translation. Today, NLP encompasses activities such as text and speech processing, syntactic and morphological analyses, and semantics.
In other words, search engines are finally grasping the context of what’s being said, rather than just keywords and phrases. It should be much clearer now how potent NLP techniques are for a field that tries so hard to understand users as SEO does.
But NLP is an overly broad subject manner for it to simply be integrated into another, equally complex field like SEO. So how exactly does natural language search work and how does it fit existing SEO practices?
In October 2019, Google announced a new update to its algorithm known as BERT. Once it fully rolled out three months later, BERT left a notable impact, affecting about 10% of all search queries.
Google’s core updates usually do leave a lasting effect on how we approach SEO. They increase the volatility of page rankings for a short time until SEO experts and webmasters adjust to the latest changes. But the core update of January 2020 and the introduction of BERT were far more significant than that.
BERT, or Bidirectional Encoder Representations from Transformers, taps into the potential of NLP for the purposes of empowering Google’s search engine.
The key term here is “bidirectional.” It’s what makes the latest core update to the search engine so powerful and almost human-like. BERT analyzes the query not just as a single piece of information, but as a part of a greater whole.
SEO experts are used to extracting keywords and coming up with their variations. Instead of just focusing on keywords themselves, BERT looks at both the words that come before and after the keyword, hence “bidirectional.”
What that means is that BERT has the capability to derive the context of the query by analyzing it in its entirety. It takes the full content of the query into consideration. But that’s not the end of it — it’s also capable of learning the data after it evaluates the meaning from a context.
And since BERT is capable of understanding the meaning and sentiment of queries, it’s having a profound effect on featured snippets and their accuracy. With more context and understanding (courtesy of NLP), you should see more accurate and informative featured snippets.
How BERT and NLP Go Hand in Hand
If you Google NLP in relation to SEO, you won’t find a single article that doesn’t touch on BERT. That’s because the two are indeed inseparable.
BERT consists of two major components:
Data refers to pre-trained models in this case. They’re huge sets of data for BERT to analyze using its processing methodology. Without the methodology, datasets are largely useless.
That’s where NLP comes in. It’s at the core of BERT, allowing it to do what it does. NLP is the engine driving BERT’s methodology.
Together, they have the power to reshape how we do and think about SEO. Why was the reshaping necessary, however?
What Brought About This Change to the Algorithm?
Google’s search engine algorithms were already quite efficient as it is. They’ve gotten extremely accurate at recognizing keywords and phrases, as well as understanding user queries.
We can all attest to how precise Google’s SERPs are and how rarely we have to stray from the first page to find what we’re looking for. It seems like Google has built a sufficiently large database of user queries and has enough data to predict what we want.
So, why BERT and why now?
The answer to that question lies in the next evolution of how people communicate with Google. And it is, indeed, a form of communication, as more and more users are performing voice searches.
What voice searches mean is that we are now querying Google using spoken language rather than strings of keywords. And when we communicate using our everyday vernacular English, we tend to structure our questions differently than we would by using a search box.
What all this means is that there’s an increase in the number of long-tail keywords in use now. And Google doesn’t have such an excellent grip on those as it does on shorter keywords and phrases. There are very few things in Google’s historic records to match the overbearing amount of long-tail keywords now in use.
Not only is there not enough data on long-tail queries, but it’s also highly inaccurate. The reason for that is because we, the users, aren’t so accurate with spoken language. We tend to describe and ascribe meanings of words, while at the same time being very ambiguous.
Voice searches are revolutionizing the way we use Google to find solutions to various problems. And the search engines have to keep up with all the latest trends or risk losing valued customers.
Google knows this. After all, its business model revolves around providing users with exactly what they want. Recognizing user intent and responding to it with correct content is at the core of what search engines do.
But know it faced a situation where it was questionable whether it could respond properly or not. There just isn’t enough historical data to anticipate the intent behind a query. So changes had to be made.
Google needed a way to understand spoken language and all the intricacies of context. And so, they made BERT. It’s Google’s way of staying on top of things and providing the search quality they’re so well-known for.
To sum up, it was long-tail keywords in voice searches that sparked the need for adjustment. Big data is still relevant, and just as important as it was. Keywords aren’t going anywhere, either. But now more than ever, it’s crucial to understand the context and the sentiment of both the content and the query in order to match the two.
That’s a problem that the well-established field of NLP can take care of.
How Is NLP Changing the SEO Landscape?
NLP is here to help us organize all the world’s information we have at our disposal. It aims to achieve that by changing the way we understand queries as a whole.
The way we used to do SEO is to determine what the keywords are and then vary them enough throughout the text. With NLP in mind, it’s time to move from targeting keywords to targeting entire topics.
Using related keywords will not be the only objective anymore. Instead, the focus should be on discovering and including semantically-related phrases. Thanks to NLP, we can make that semantic connection between different phrases. And that’s a crucial technological leap given that more and more queries have an increasingly conversational tone.
It’s time to explain how exactly does the NLP understand the context and what qualities it introduces.
Sentiment in NLP
You might have noticed that we mentioned emotion and sentiment on a couple of occasions. That’s because NLP is capable of judging the undertone of the content and ranking it on that basis.
The sentiment can be negative, neutral, or positive, based on what some of the words you used are. If you tend to write positive adjectives such as “great” “advanced” “innovative” “creative” “useful” etc., you’re sending positive signals. It means you’re emphasizing the positive aspects of the thing you’re describing.
Google now notices such signals and determines the sentiment of the content based on those signals. The same goes for any content that makes use of adjectives such as “poor” “underwhelming” “aggressive” and other words with a negative connotation.
Then you have nouns and pronouns, which usually bear no special meaning. They’re neutral in that regard and don’t influence the sentiment. But a combination of positive and negative signals can lead to a neutral sentiment mark for your content.
To determine sentiment, Google uses a scale that ranges from -1.0 to 1.0. The scale rates the sentiment of your article as follows:
- Negative: -1.0 to -0.25
- Neutral: -0.25 to 0.25
- Positive: 0.25 to 1.0.
The sentiment can have a notable impact on your web page’s ranking in SERPs. Especially if it’s competing against positive content. If Google rates the sentiment of your article as negative and its facing positive content on page 1, it’s highly unlikely that Google will consider your page to be relevant.
Entity in NLP
Entities are the next big change brought on by NLP, and they’re at the center of the update.
The entity is a noun or a pronoun (or even an entire phrase) that you can identify, classify, and categorize. Entities are the way in which NLP will enable us to organize all the information on the internet.
An entity can be anything really. It can be a proper noun such as the name of a person, or a kind of consumer goods. It can also be a location, a business, an event, and more.
The linguistic AI has never been better at named entity recognition and named entity disambiguation. It’s what drives all the advances in natural language programming and why Google has decided to implement BERT.
Ambiguity used to be a huge problem for search engines. People are well-equipped for dealing with ambiguous meanings on a daily basis — computers, less so. But now, with entities and other NLP tools, the search engine is able to establish a connection between different entities.
To use entities even more efficiently, NLP employs two additional metrics: category and salience.
Category and Salience Metrics
A category is a straightforward metric that deals with entities on a macro level. Truth be told, SEO experts are already aware of what categories are and use them on a daily basis.
In Google NLP, you’ll notice that category simply shows a generalization of what an entity is. It doesn’t necessarily have to be a broad category. Here are a couple of examples from Google natural language processing API:
- Arts & Entertainment/Entertainment Industry/Film & TV Industry
- Computers & Electronics/Programming/Java (Programming Language)
- Online Communities/Online Goodies/Clip Art & Animated GIFs
- Real Estate/Real Estate Listings/Residential Rentals
- Sports/Winter Sports/Skiing & Snowboarding
- Travel/Tourist Destinations/Regional Parks & Gardens.
As you can see, categories consist of various subcategories to help the search engines understand the content better.
Salience, on the other hand, shows how relevant an entity is to the topic. It’s a measure of the importance of a single entity in relation to the text. Salience score ranges from 0.0 to 1.0. The higher the score, the more relevant the entity is to the subject of the page.
As an example, in an article about web hosting, the word “server” is going to be far more relevant than “support.”
NLP relies on an advanced syntactic analysis or parsing to draw the dictionary meaning from the text. The syntactic analysis places a great emphasis on writing punctually since NLP will use formal language to extract meaning.
But the syntactic analysis does more than just parse strings of symbols that conform to some formal grammar word. It also checks for the meaningfulness of the word, another trait of an intelligent linguistic AI.
So far, syntactic and entity analyses support the following languages:
- Chinese (Simplified)
- Chinese (Traditional)
- Portuguese (Brazilian & Continental)
Using NLP to Your Advantage
What Google’s trying to say with this latest change to the algorithm is the same thing that it always highlighted: Write content for users, not search engines.
With NLP, writing for real people using real language while avoiding jargon is more crucial than ever. Now that the search engine can dig into the context and assess the sentiment of the text, it’s paramount to stay relevant and to the topic.
The more keywords you try to stuff, the more you water down the topic of the page. By focusing on real visitors, giving them content in real language, and being concise, you ensure both the reader and the machine will understand what you’re saying.
According to Google, there’s nothing that you should do to adapt to BERT — if you’re used to writing quality content, nothing will change for you. If you attempt to use black hat tactics, the search engine will pick up on it faster than ever and punish you for it. Keyword stuffing is now a thing of the past.
But that doesn’t mean that you can’t use Google natural language processing API to improve your content further. Here’s what you can do.
Improve Internal Link Building
NLP and entity extraction algorithms can help you detect what important entities you might be missing in your content.
By looking at a list of extracted entities, you can figure out what you missed to explain to your reader. If you feel like it’s something your reader would be interested in, why not give it to them in the article? Better than to leave them uninformed, at which point they’ll certainly leave your website to look for additional info on a knowledge graph or on Wikipedia.
You can then see what content your website is missing, provide it, and strengthen your internal link building game.
Recommend More Content
Semantic annotation in natural language processing creates enough data which allows you to predict what the user would like to read next.
Using the semantic annotations and metadata, we can make better machine learning models to make such predictions.
The more content you recommend, the longer the user’s dwell time on the website. NLP can help you keep the user engaged.
The 301 errors don’t really have to be the end of the story for your visitors.
Thanks to NLP’s recognition of synonyms and de-referencing, you can intercept a user’s query and redirect them to the correct page.
For example, using de-referencing, you can intercept a user’s “NLP” query and redirect them to a “natural language processing” page instead.
The Bottom Line
NLP has the potential to revolutionize the way search engines understand the content.
It’s the next giant leap for SEO, and the experts who realize that now will be ahead of the curve tomorrow.
The best thing about it is that you should solely focus on your readers — use everyday language, short sentences, and be punctual.
Pay more attention to what your readers want, and the search engines are certain to pick up on it.
After all, they are nearing the human levels of comprehension, so you might just as well treat them as such.