React, SEO and rendering

October 17, 2020

Why you should make your React app static using server-side rendering with Next.js or generate with Gatsby 👉

The problem

If you are building single page applications with React you will probably run into these problems:

Search engines don’t show the webpages properly or at all in search results
Links to the website on social media sites like Facebook and Twitter don’t show the content and look scammy
You aren’t getting any traffic to your pages

If this React app is a company or government website with a lot traffic you will quickly get feedback that people cannot find the site. For a business this means lost sales because customers cannot find the products you offer, and for governments this means citizens cannot find and access public services. This is the worst case scenario and has been a problem with React for many years.

In brief, the problem is caused by rendering your pages with javascript.

To understand the issue you need to understand how a website is rendered in a browser and what a bot from Google, Bing, Facebook and Twitter can see. Ultimately this is about how your website is rendered in the browser, where you store data and how the browser retrieves it. This is called rendering. The word stems from French and means to make. This term is used in IT subjects to explain how a data model is converted into something else such as computer graphics for example.

See my previous post to understand how search engines work: How search engines see

As it turns out search engines and other bots on the web are used to the old way of doing things, so achieving search engine optimization just became harder for your website built with Create React App. The solution is to deliver the content in a static format, and preferably make the entire website static.

We need to help the bots understand the website we are building. We solve this by delivering the content as static information. In this post I’ll explain the options for doing this with React and how it addresses the issue. I should add that it can also make the sites more usable for people who browse with javascript turned off.

If you want the quick summary then skip to the end for my recommendations, but I reckon some people want to understand the different options and how they work.

For now, let’s try to understand the ways a website can be built and rendered. There are 3 ways:

Static site generation (SSG)
Client-side rendering (CSR)
Server-side rendering (SSR)

As it happens, Gatsby is a framework for static site generation. Next.js is a framework for server-side rendering, and Create React App is a method to build client-side rendered apps. All three are made for React.

Let’s start with the traditional website which has all its data stored in HTML.

Static site generation

How does this work?

In a traditional website we write our content directly in HTML files or a tool that generates it for us. Think of writing your own blog post purely in HTML.

There are also tools that will turn text into HTML, simple CMS tools and frameworks such as Hugo and Gatsby which create pages from markdown. How does this happen?

In this method we request access to the data when we are generating our webpages. In other words, we build the site on compile-time. So we use our API in Gatsby (or Hugo in my case) to retrieve our data stored elsewhere and generate our site so it’s ready to be published on the web. Simply put, we are generating our webpages in advance of the user visiting the page. This is also called prerendering.

This produces our HTML files and usually in modern tools like Hugo and Gatsby we also get minified styling and script files in the same process. This process is intended to make our webpages small so they can load faster in a browser. People appreciate when webpages load fast and search engines will reward faster sites.

When someone visits our site their browser receives the generated HTML file along with styling and scripts. This requires minimal rendering within the browser, which we tend to call the client-side.

Client-side rendering

In contrast with static site generation, client-side rendering gives the browser both the task of rendering the page and adding this data into the HTML template.

When someone visits your site, the browser will request the HTML, CSS and javascript files from your server, then read and execute the javascript request from React to retrieve the missing data and render the site. This means the browser handles the task of fetching the data through the API call. This permits you to store data for your website somewhere else, in a database or a headless CMS for example.

Why is this a problem?

Simply put, crawlers and bots are not browsers and therefore can’t parse javascript. What looks fine in your browser for a human is a different matter for machines.

This means that when Bing sends its crawler to your website, it won’t find any text to make sense of, only javascript on the page and whatever static HTML tags you left there. Usually it’s just javascript. This doesn’t make any sense to Bing so it’s not exactly easy to know how to rank that website for a particular topic or search query. If you’re concerned this is happening to your site, try requesting the page using curl. For example

curl https://www.google.com

You can also compare results from the Google Mobile Friendliness Tool and the Bing Webmaster Tools to see the difference in each crawler.

Likewise, Facebook and Twitter have the same problem. When someone links to your website on those sites they won’t be able to display the page title and the content on the page. This is why it looks unprofessional. This can harm our reputation and we don’t want that.

Google will most likely understand your site but that’s not guaranteed either. You usually have to submit it using Google Search Console or by pinging Google to view your sitemap.

What if you have a lot of content and a lot of editors? That’s when using a server becomes really relevant.

Server-side rendering

Server-side rendering solves this problem a bit differently.

First of all, we have a live server hosting our site and the content. When someone visits the site, their browser sends a request to our server for HTML, CSS and javascript files. At this point our server composes the request to our content API to retrieve the data, and the site is built by the server for the end user visiting the site. At that point the browser can do things the old-fashioned way with the generated HTML, CSS and javascript files, and show the page to the user.

In this case it doesn’t matter if we use a headless CMS with React or a fully fledged CMS application either.

It also means that the bots visiting the webpage now get all that nice content served as static information. This can be downloaded, parsed and interpreted intelligently so they can rank your site based on the topic they think it matches for and related search queries. This is just as good as static site generation as far as bots are concerned. The added benefit is that multiple editors can write and publish content immediately.

Facebook and Twitter will also be able to retrieve the page title and data, so they can show something meaningful when people link to your site on their services.

Examples of switching to and from server-side rendering

Is there any proof that this matters? Yes and there’s lots of it.

Some sites that made the change to serving their content as static information have seen results.

Absolute slam-dunk. I think Spectrum is one of the few use cases where both React and SSR make perfect sense.

Attached graph: clicks to https://t.co/T9oO6x7R20 from Google search results. Guess when we introduced SSR? September. (ref: https://t.co/vtLi9Yf1Cn) pic.twitter.com/uHu7hVe5aq
— Max Stoiber (@mxstbr) April 14, 2018

The Spectrum Chat site switched to server-side rendering and saw immediate improvements in traffic from search engines.

The natural explanation here is that the search engines were finally able to crawl and understand the content of all their pages. That means they were able to discover pages they hadn’t known about previously and then understood the content well enough to rank it.

There are also examples of sites switching to client-side rendering and losing traffic as a result:

New SEO game... spot the day that this ecomm site migrated to 100% client side JS and started lazy loading all of the important content on the pages with onScroll() events 🧐 pic.twitter.com/h623957N92
— Kyle Blanchette (@kgblanchette) August 15, 2018

Consequences for small and large websites

You might not care about the issue, but a company, a charity, NGO and government organisations absolutely must address this.

A company will lose sales from this, possibly to your competition. A government can’t perform their task to the public if the public can’t find information on their civil rights and government services they are entitled to. Any organisation similar to an NGO that focuses on public fundraising and awareness will likewise also be hampered by poor search engine optimization.

This is just as harmful as intentionally concealing a website from search engine crawlers. There is a real risk of doing a lot of damage if the service is important to people.

In my experience this is especially true for websites with hundreds and thousands of pages. The more content there is, the harder it is to get it all added to the search engines and keep them updated on changes you make.

Why? Search engine crawlers will visit your site more than once per week and in some cases more than once per day.

However, they won’t necessarily discover your new pages, moved pages, changed pages and removed pages. This is because search engines have a lot of websites to visit each day so they impose a limit for how much time they will spend on your site. This is sometimes called a crawl budget.

If you are also switching the method of building and rendering your site then you must have a method to inform search engines about changes. If you’re moving pages you need to set redirects. Giving crawlers an overview with a sitemap is also a must have when your site contains thousands of pages. The crawler can use this to notice changes in the site information architecture. Crawlers can also travel through your website by following links to other pages, so make sure your links are in order.

When to use static site generation and server-side rendering?

Let’s assume you’ve set your mind on using React.

If you haven’t and this is just a personal blog then I recommend Hugo. It’s a simple framework for building a blog for personal needs and it can also work for a company website.

For those who really want to use React, consider these questions:

Do you need to promote this website? Should people be able to find it easily?

If no, you can try Create React App
If yes, you need SEO so go to the next question.

Do you want to publish a lot of new content frequently?

If no, you can try a static site generation solution like Gatsby
If yes, then you can try server-side generation solutions like Next.js

You can also mix different rendering models so your page is partially server-side rendered and partially client-side rendered.

The reason the amount of content matters is because static site generation tools like Gatsby can be slow when you are building thousands of pages. There are ways to optimize this.

Hugo handles lots of pages with ease. My site with a handful of pages is built in 17 milliseconds.

The benefit of picking a static site generation solution over server-side generation is that it is often cheaper. Running multiple servers continuously has a noticeable cost.

These questions will help you plan your site anyway so you really ought to figure this out before you pick your framework. Basically, you need to know the objective the website is made for, what kind of information you are producing, the amount and how many editors will work on it.

How many users can publish to the site, and do you need to regularly publish all the new content? How fast must you publish this content to achieve your goal for the website?

It is not impossible to rebuild a site after learning the answer to these questions but you will gain so much from knowing the answers in advance. Solving issues related to publishing, SEO, usability and performance becomes much easier.

To summarise, if there are many users and you need to publish their content immediately then server-side rendering is probably the best option. This is typical of a website using a CMS or community software that allows many editors to publish their own posts immediately on the site. In this case try Next.js.

If your data doesn’t change regularly then it is sufficient to publish it as static content and generate your pages before publishing. In this case I recommend trying Gatsby. This works well for personal blogs.

If you already built your app using Create React App and don’t want to invest time into making the entire site static then you can try to make parts of the pages static using React-helmet for meta tags and page description. This is not a solution for the whole site though. I noticed that even the site promoting Create React App has switched to using Gatsby, so that’s food for thought.

Note also that this is not a problem for a page that is only available for logged in users. Just make sure your landing page can show up as intended.

Do not expect this to work with Create React App on its own. Just because Googlebot can understand javascript it’s not reasonable to expect the other services will do the same. There is a significant cost to doing this for those companies and it’s not necessarily the future of the web either.

I personally think this is a big oversight by the Facebook team behind React and it’s unfortunate that developers need to learn additional frameworks to build a site that can be rendered in search engines and social media sites. On the plus side static sites are making a comeback, but the cost of learning this was significant and avoidable.

This not only puts more pressure on developers but it also limits the adoption of newer technologies when people realise that a single page application adds a lot of complexity that never existed in traditional websites. Every added burden that increases the chance of making a mistake causes people to become more careful and distrustful, even at the expense of making necessary change. That in turns makes companies hesitate to invest in their websites and fail to achieve their goals.

I hope this post will help you realise there are options to address these problems and achieve your goals on the web.

I should add that listening to advice from the Google webmaster community will not suffice since they will not help you make your site visible for any other search engine than their own. This is unfortunate but that means the value of their advice is declining. What developers need to know about SEO now is that the majority of the web still doesn’t support parsing javascript and we should act like it. The Google Developer guide Rendering on the Web is worth a read though.