When we teach professionals how to use the Google Search Console (GSC) data in BigQuery (as you can do with our online course), we notice that the prospect of blending this data with other sources usually seems to be the most exciting one.
In this article, we present 7 ideas of data blending that could be useful to get more insights from BigQuery but also to develop new analysis for different purposes.
We will be focusing on the Google Search Console data from the bulk export.
About data blending
This article doesn’t aim at explaining how data blending works. But we need to make sure you fully understand the concept.
Data blending is the practice of mixing data from multiple sources (or reports) to create a new dataset based on what we call a common key. A common key is basically the same information available in different tables/datasets. Thanks to them, you can basically tell BigQuery to create one row from multiple ones if the field you choose as a common key has a value which is the same in those multiple rows.
There are multiple ways to blend data with SQL and BigQuery. And when it comes to the Google Search Console data, the common key is usually the URL, or the query. But obviously, it all depends on what you want to do.
From crawl data to machine learning APIs
Let’s now see what you could do if you blend your GSC data.
Note: All the ideas provided here can, of course, be used together.
Crawls
If you use Screaming Frog or another crawler of that type, you probably noticed that they usually offer the possibility to connect to the Google Search Console using the API. This is a great feature that you can recreate in BigQuery with more granular and controlled data.
To do so, we usually export the Internal report from Screaming Frog, then import it (after having cleaned the columns) as a table in BigQuery, and then we blend this data with the GSC URL impression table.
The output is a mix of performance data from the GSC and the crawl. You can detect 404s with clicks or impressions, you can analyze the meta data of pages with poor CTR, and more!
SEMrush
The Google Search Console data reflects your property’s performances. Blending it with a source such as SEMrush can be very useful to cross this data with general information.
Having the search volume, as well as the difficulty score or the SERP features gives more context on a website's performance for a given keyword.
To go ahead with this idea, you also need to import some data in BigQuery (you could also build a script connected to your tool’s API and regularly feed BigQuery with the data you need).
Google Analytics 4
Blending GSC and Google Analytics 4 (GA4) data is not a silver bullet. No, you won’t be able to know which keywords generate the most conversions (“key events” in GA4), for instance. However, you can find a few interesting common keys within these two data sources, the main one being the URL.
This third idea allows you to mix performance and behavioral data. The good news is that GA4 also provides a free and automatic bulk export to BigQuery. Unfortunately, the data schema used for this export is quite complex to use and you have to be well familiarized with GA4 data to use it properly.
A piece of advice: Use segmented data from GA4 to only work with Google organic data (source: google / medium: organic).
NL and ML APIs
A big advantage of BigQuery is that it is part of Google Cloud and its suite of solutions. You don’t necessarily have to use Google’s solutions for the example we are going to discuss here, but Google Cloud has Machine Learning (ML) and Natural Language (NL) APIs that you can use in your projects.
With these types of APIs, you can do a lot: keywords categorization, sentiment analysis, entities identification and more.
You can use the APIs directly in BigQuery, or build a script. Keep in mind that those APIs are not free and can be quite costly for big datasets.
Google Ads
Another super interesting idea is data blending with Google Ads.
There are many potential applications for this scenario, but one that we find very interesting is blending the Search terms report from Google Ads (which provides very granular performance data for most queries you showed your ads on) and the one from GSC, using the query as a common key. This will allow you to deep dive into Google organic performances vs. Google paid results. This can also help you identify important keywords (from Google Ads, conversion wise) and directly see how you are performing with Google organic at the moment.
Here too, you can either import Google Ads data in BigQuery or use a script. You can even automatically export your Google Ads data in BigQuery.
Server logs
As you may know, server logs provide in-depth information about bot behaviors on a website and its pages which might be very important to better understand how search engines (Google in this case) analyze and index your content.
By blending your GSC data with server logs, you can, for instance, identify pages with no performance data and check if they are visited or not by bots, how frequently and more.
In this case as well, you will need to import your server logs data in BigQuery.
CRM and other internal data sources
Lastly, keep in mind that you can use any sort of data that you have in hand and that can be useful when blended with the GSC data. A CRM, if you use one, can be a great source of data. You can also use your own CSV or databases for page categorization and more.
The idea, just like for the other examples, is to provide more context to the GSC data and to address more granular questions for which you don’t have answers without data blending.
Comments