How Is NLP Changing The Way We do On-Page SEO in 2020?
In October 2019, Google announced the official BERT Algorithm release.
According to Google's statement, a fully rolled out update affected 10% of all search queries.
Three months later, in the beginning of 2020, we’ve already had a first significant Google's core update this year called January 2020 Core Update.
Every year, there are hundreds or thousands of updates, but this particular update's pre-announcement before the actual rollout was quite distinctive.
A very perfunctory tweet doesn't contain detailed information and refers to a blog post about Google's core updates from August 1st, 2019.
Recently, we saw perceptible activity in SERPs, and SEOs are still figuring out what it really means to their work.
When new changes happen, our SEO process needs refinement as well.
BERT brought a lot of attention as its impact feels palpable to us all. The rise of quality content, context, and natural language processing (NLP) is undeniable.
But what hides behind all those buzzwords? How can we put it all into action?
I was so fascinated with the concept of BERT and NLP that I spent long hours checking tools, analyzing datasets, and testing different solutions. And I wrapped all my findings into this one article so you know what it means to you, your website, and your clients.
Spoiler alert: there is a way to implement NLP as an enhancement to your SEO and how you can use it to optimize your website to the new reality. I also included some case studies at the end of this article.
Let’s dig in and start from BERT as it’s the key to understanding the NLP concept and the last changes in Google.
Why Do BERT and NLP Appear in One Sentence So Often?
For better understanding of NLP, let’s crack the mysterious "BERT" code first. In October 2019, Google announced a new update that allows bots to understand the context and search intent significantly better.
According to Google:
“These improvements are oriented around improving language understanding, particularly for more natural language/conversational queries, as BERT is able to help Search better understand the nuance and context of words in Searches and better match those queries with helpful results.”
Now let’s look into the more advanced stuff.
BERT stands for Bidirectional Encoder Representations from Transformers. In my opinion, the first part, “Bidirectional”, is crucial to comprehend NLP methodology—especially that we can encounter at least two explanations of "bidirectional" that make sense.
The first definition states, that “bidirectional”, itself, refers to two directions of a process.
In other words, Google figures out the meaning of a word or phrase using both—preceding and the following content.
The other definition refers to learning ability. BERT not only evaluates the meaning from a context (first direction) but demonstrates learning ability as well (second direction). It is our understanding that the process is completely unsupervised.
Regardless of which interpretation we like, both explanations are justifiable and help to understand the NLP process.
BERT and NLP
BERT contains two major components: data (pre-trained models) and methodology (defined way to learn and use those models).
Models are essentially just sets of data but you need to have a way to process these datasets. Without the method or process to understand and correctly interpret this data, the datasets are useless.
BERT is an essential part of NLP.
If you want to find out more about the history of NLP, Sean Shuter published an in-depth article that covers this matter comprehensively. Thanks to that, I could jump directly to the practical aspect of websites’ optimization.
How does BERT and NLP impact SEO?
In one word: SIGNIFICANTLY!
Since Google’s algorithm uses NLP, it impacts on-page and off-page SEO.
But to be more precise, NLP changes the way we understand queries as a whole, and each word separately. Besides that, Google is capable of assessing the sentiment of selected entities from the website’s content.
BERT’s impact on the featured snippet
Here at Surfer, we have the privilege to work with loads of data, I decided to take advantage of the data I have access to and analyzed a relatively decent chunk of it to verify the BERT Update by myself.
As Google stated, they changed the way of understanding search queries; therefore, it should affect some meaningful changes in that matter.
I selected around five thousand English phrases and compared SERPs from before and after BERT.
The results were explicit and confirmed that BERT was pretty impactful in this aspect.
After the update, the number of queries with featured snipped grew by 5.2%!
How Does NLP Enhance Search Quality?
According to the information we can find on Google's blog, 15% of search queries are used for the first time. People are using more and more long tail searches to find an answer to their question especially with the rise of voice search.
It means that sometimes the algorithm doesn't have enough historical data to anticipate the intent behind the search term, thus it may have some difficulties satisfying the user and delivering relevant results.
A key to on-point results is understanding the language—regardless if we refer to spoken (voice search) or written language.
NLP is a way to improve capabilities in this area. Let me quote the statement from Pandu Nayak's article, which we consider to be one of the most trustworthy sources of information:
"With the latest advancements from our research team in the science of language understanding--made possible by machine learning-- we're making a significant improvement to how we understand queries, representing the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search."
Context Is Everything
For some more sophisticated and conversational queries, it's hard to assess the meaning of prepositions or stop-words.
Many queries contain high-frequency words like "and", "to", "in". There are also a lot of words with multiple meanings like "get", "go", etc. Let's look at an example.
The expression "go" as itself, can be meaningless, but take a look at what's happening when placed in different contexts.
Here are some more examples from Google’s blog:
What is sentiment in NLP?
The sentiment is the undertone represented in the content. The sentiment, the same as emotion, can be positive, negative, and neutral.
The positive sentiment means that the topic is being described favorably. They usually have positive words like “great”, “guru”, “hero”, “outstanding”, etc. The sentiment is considered positive if the value of it oscillates between 0.25 – 1.0.
Negative sentiment suggests the usage of detrimental statements in the content. As you probably have already guessed, those pages use words like “hate”, “weak”, “stubborn”, “boring”, “danger”, etc. The negative sentiment contains in the range of -1.0 – -0.25.
Finally, a neutral piece of content can contain both positive and negative signals, and the resulting value is contained in a neutral score range, which is -0.25 – 0.25.
It's important to know that Google's algorithm calculates the sentiment value not only for the entire subpage but for each subsection of the content.
Does sentiment affect position in SERP? Read recent study about Sentiment in SEO.
What Is The Entity in NLP?
The entity is a word or phrase that represents an object which can be identified, classified, and categorized.
Examples of objects are persons, consumer goods, events, numbers, organizations, etc. NLP's job is to select and evaluate entities from your content.
Since Google distinguishes those entities, the search engine is capable of utilizing obtained information in order to satisfy the user and provide better search results.
There are two additional metrics that are important—salience and category.
Regarding the category, there is not much to explain. Thanks to NLP, Google is capable of assigning the content to a corresponding category, such as /Internet & Telecom/Mobile & Wireless, in the following example.
What Is A Salience in NLP?
The salience in NLP represents the entity's importance in the text. The range oscillates from 0.0 to 1.0. The higher salience value, the more important and relevant the entity is for the subject of the page.
For example, the word “morning” may be more important than “evening” when we talk about breakfast.
Using Google's Natural Language API demo
The first step to enhance the SEO process from a practical point of view is to utilize available tools. The most important is Google's Natural Language API demo, which allows you to examine any text, for free.
For the record, we should be aware of a few limitations like, for example, it doesn’t support every language. There is a NLP supported languages list.
Optimizing websites for BERT
No wonder after Danny Sullivan’s tweet, in many articles I've read about BERT and NLP, the common objection is that there is nothing you can do to optimize your website for NLP.
During one of the more recent Google's webinars, John Mueller's was asked what kind of SEO work could we do regarding the BERT update.
He explained that:
"... the queries are not really something that you can influence that much as an SEO (...)the text on the page is something that you can influence. Our recommendation there is essentially to write naturally..."
He also mentioned special attributes we need to watch out for, and pointed out that it can be helpful to
"...match the query that someone is asking with the specific page...".
As a writer, you need to write in a way that humans are able to understand. Not focus on keyword stuffing, but natural language.
In my opinion, there are a bunch of steps you can take to boost your SEO results and I’m going to walk you through them now.
Keyword Research: Use Google Search Console To Find Lost Rankings
As Google understands each query much better after the BERT update, there were noticeable fluctuations in SERPs.
It's important to diagnose what types of queries increased their traffic and select the ones that noted the loss in organic search traffic.
Some pages may have shown a loss in organic traffic. In many cases, Google's algorithm expects a different type of content than before. Thus, your content doesn't seem to be relevant anymore. But decreases in traffic are not necessarily a signal that your website is no longer interesting for people who use this particular query.
To find out which keywords suffered from the drops after the update, we will use Google Search Console.
Select the keywords that dropped in rankings and compare your content to the current competitors. Maybe it needs to be rewritten, and maybe the topic's coverage isn't complimentary. Maybe your content is too long, too thin, or lacks the proper entities?
You can detect keywords that you should take care of right away thanks to GSC. Here is how to approach that in two steps:
Open GSC, click date filter and compare October 2019 to November 2019 as BERT hit at the end of October.
Pick impressions and sort by difference. This way you will get all your queries that suffered the most from BERT. Shuffle between Queries and Pages to find out if a single page or set of pages got hit.
Now you have prioritized keywords and pages list to take care of first.
Restructuring the website
Search algorithm expert from Bertey, Dawn Anderson, commented on the update:
"There will still be lots of work for us to do since we need to emphasize the importance, utilize clear structures, help to turn unstructured data into semi-structured data, utilize cues on content light pages (e.g. image-heavy but not text-heavy eCommerce pages) using such things as internal linking."
As she points out, in some cases, the internal linking and website structure will play a significant role in the process of understanding the content.
It means, if your website is poorly written or doesn't have a clear structure, it may get lost on the 2nd page or further. I would consider "the structure" regarding the whole domain as much as a single subpage.
Optimizing the website's structure means:
- Taking care of internal linking,
- Internal anchor text unification,
- Usage of the comprehensive navigation,
- In some cases, breadcrumbs implementation,
Optimizing the single article entails:
- A proper schema implementation,
- The keyword stuffing elimination,
- Improvement of headers' structure,
- Providing the sources of data and author,
- Taking care about topic's coverage by comparing against best competitors,
- Include proper entities related to the topic,
- Upholding preferred sentiment not only for the entire content but for each entity as well
Internal and external backlinks
Thanks to NLP, Google's algorithm evaluates the context of internal and external links. For now, the links structure and placement are more important than ever before.
It's not a mystery that both internal and external backlinks are one of the most substantial SEO factors. BERT can be used to redefine a backlinks' profile as well as an internal website's architecture.
A link placed in the right context has a much higher value than a randomly placed one. So your link, internal or external, will bring you more juice if it’s:
- Placed on a website with a niche connected to yours
- Placed on a page that has a similar context
- Placed in a paragraph that is logically related to the content of your page
Hunting for snippets
Along with the update, there are new opportunities for those who will adjust the SEO process to the new reality. It's a chance to take the advantage and take over many featured snippets by providing the best answers for frequently asked questions in every niche.
Competitors' Analysis—SEO Process with NLP Factors
After analyzing any text written in the supported language (through using Google’s NLP API), you will get all the described metrics, including (plus others):
- Salience score
Step 1: Select Your Main Keyword
There is no rocket science. Choose your primary keyword you'd like to rank for.
Step 2: Select Your Competitors
It may seem obvious, but many SEO experts think every website listed in the top ten is a page they compete with. This kind of approach may dilute your dataset, so I strongly recommend to treat this step seriously.
To select them properly you need to:
- Define your content type. Even if you’re going for a “research” search intent, you need to know if you’re writing a blog post, creating a video, or a landing page.
- Exclude pages that serve different intent.
- Exclude outliers: pages that are much longer or much shorter than other ones.
- Exclude pages that rank because of their authority and backlink profile
Here is an in-depth article about picking organic competitors for SEO analysis.
Step 3: Create a document to gather the data
To collect data in a comprehensive form, I’d recommend using a spreadsheet, text document, or whatever works for you best. Here is one of my templates for NLP competitors' analysis. It’s downloadable, so you can use it for your own purposes.
Step 4 - Compare Your Content Against Competitors
Especially, in the case of one-sided SERPs, be aware of using the preferred sentiment since using the reverse can be a hand brake for your SEO efforts.
An easy example would be if all top search results of a product with positive reviews, and you created a negative review, that might negatively impact your rankings. Google has taken the historical data of the more relevant sites and apparently it favors positive reviews over the negative ones.
Even though Google experiments with the sentiment and, in many cases, it fluctuates over time, it can be highly challenging to get into the top ten with an unexpected undertone. Similarly, sometimes it is impossible to overtake one of the highest organic positions in Google with the content short on entities' appearance.
Comparing yourself with direct competitors allows us to fulfill the content gap, estimate the user's intent behind the search query, and make the content more comprehensive and complementary.
How Well Does Google Assess Sentiment?
As much as I love the entities' selection, the sentiment valuation leaves room for improvement.
I analyzed a bunch of keywords to find a one-sided negative SERP.
It was tougher than I expected.
Although in most cases there are positive and neutral sentiments, Google is very reluctant to assign the negative sentiment to websites.
On the other hand, we have to keep in mind that BERT is just one of the pre-trained models. We don't have any confirmation that it's the same version that is used by Google's algorithm. I assume that Google doesn't have much interest in sharing the newest version of their pre-trained models with us. After all, why share the best-performing resources?
Let's take a look at an example that shows sentiment analysis for very negative content.
BERT alternatives for sentiment analysis
As I mentioned previously, BERT is just one of the NLP models. Moreover, Google isn't the only company that develops NLP techniques. A couple of BERT's alternatives are:
- Watson (IBM)
- OpenAI’s GPT-2
IBM Watson sentiment analysis
I decided to give IBM's Watson a shot. After a few tests it turns out, it's pretty amazing and performs much better than Google and it gained some popularity in SEO communities. I observed that Watson recognizes negative statements much better than Google.
This time, I tested SERPs as a whole to find a correlation in the sentiment value.
Possible Technical Difficulties for Less Technically Advanced Users
Sending API requests directly from the Macbook terminal requires basic technical knowledge.
The request must be preceded by generating your unique API key, etc. In return, we get the data in a particular order.
Here is the fragment of the response for the same piece of content about the dangerous cities that I used for Google API example.
Technical knowledge isn't necessary for data interpretation. For Watson, the same piece of content is strongly negative, whereas Google’s BERT sees it as neutral or even positive.
Now let's compare the top 50 on the chart forkKeyword: "most dangerous cities in the world 2019" in the United States. The chart below presents the NLP analysis according to Google and to IBM Watson Sentiment.
In Surfer, we decided to use both Google (for True Density entities and entities sentiment) and IBM Watson (for the general sentiment). The NLP Analysis will be released any time soon.
As you see, there is a sentiment correlation in the analysis conducted by Watson. At the same time, there is no correlation in Google's NLP chart.
Correlation is not causation, that's for sure and it is always good to remind ourselves of that, but Watson’s negative sentiment assessment looks much more probable.
Note that I compared a single piece of content with an averaged sentiment score. I know that some of you disagree with this kind of approach, because:
- IBM's Watson may use a completely different methodology than Google's NLP.
- It's more important to put entities in a proper vicinity than keep a whole article in particular sentiment.
Even though both statements shed some new light on content, there is no evidence that Watson's sentiment assessment is or isn't credible. I have tested many queries and Watson looks much more reliable.
Using entities in the right sentiment neighborhood seems to be fundamental when it comes to NLP. But for less demanding queries I would advise you to write naturally for humans and don't pay attention to such details. However, a granulated entities’ sentiment report can be very helpful with the most competitive queries.
Watson's NLP can be applied in many different use cases—IBM generates a meaningful income out of providing API access, and there is no reason to hide a well-performing solution.
NLP and January 2020 SEO Case Study
As I said at the beginning, we have conducted a bunch of tests around NLP. One of them was held during the January 2020 Core Update.
You all know that the best tests are done on real-life use cases. I asked my teammate, Michał Suski, to encourage a few of Surfer’s users to create the NLP audit. Then he implemented entity-related tweaks on their pages and the results were pretty interesting.
The analyzed article was already ranking quite well. (primary terms on the bottom of the first page), so I didn't want to do a huge overhaul of the page.
1. Colin Ma, Digital Entrepreneur & SEO, website: Nimble Made
I added almost all of the terms Surfer NLP Analysis suggested, but I added a little less for the first page update. For example, Surfer suggested adding the word “dress” 5 times and I added it just 2 or 3 times.
I didn’t want to change the content too much since, in this case, it was clearly doing quite well already. Overall, the additional entities allowed me to add in some extra keywords by allowing me to see product names, and names of other items related to the article.
I was impressed by the large boost I got from the NLP optimization as I moved up 5-6 spots on page 1 within 2 days.
2. John Pinedo, CEO of Freedom Bound Business
The Surfer team wasn’t lying when they said the new NLP-enabled Surfer beta would be a game-changer.
I had a keyword stuck at position 5-6 for months! A week after running an audit (NLP entities enabled) my target keyword, along with the secondary keywords, moved up in the SERP.
Aside from the better rankings, I really like the new NLP column that's included in the audit. It shows examples of how competitors use phrases/words without having to go to their page which just makes optimizing for true density a lot easier!
3. Matt Diggity diggitymarketing.com
A small teaser of ranking boost on one of the blogposts from Matt Diggity. Pure NLP optimization using Content Editor.
NLP is a very complex concept. As an SEO specialist, you don't need to be a machine learning expert to utilize available tools to enhance the SEO methodology. Let the tools do the hard work for you. But at the same time, having some basic knowledge, which is confirmed by SEO tests and observing SERP fluctuations will help you to keep up with the industry.
I'm curious about your tests’ results and position changes caused by the BERT Update. Have you already included NLP components in your analysis? Are you using NLP analysis when conducting deep SEO Audits? It is this one of those situations where early adopters are going to reap the rewards in the SERPs.
Did you like the article? Please share it on social media!
Natural language processing (#NLP) is changing #SEO. How to optimize your website for success?
👉 Restructure your website and pages
👉 Analyze the sentiment
👉 Add relevant entities you're missing
Here's how: https://t.co/6RapSUmbvx
— Surfer (@surfer_seo) January 21, 2020