top of page

What can you get from the Google Search Console Bulk export

augustin259

We recently reviewed how to access Google Search Console (GSC) data in BigQuery once the bulk export is in place. Let’s now take a look at the information this feature allows us to access.


As you will see in this article, the Google Search Console Bulk export does provide more data than any other GSC data sources.


This is one advantage offered by this feature. But it is not the only one, the bulk export also provides a free way to get your raw GSC data in BigQuery, a tool which offers powerful ways to query and blend your data.

What’s unique with the GSC bulk export

Before going further, you have to remember that the data sent by the Google Search Console (GSC) to BigQuery with the builtin bulk export has its own particularities : 

  • No retroactivity (the data starts flowing the day you set up the bulk export).

  • More granularity (you will see that you get more columns (and dimensions) in comparison to the UI or even the API).

  • No sampling.

3 tables with unique dimensions and metrics

Let’s take a look at the 3 tables the bulk export automatically creates within your GC project and what information they gather.

Property level aggregated data

The searchdata_site_impression table provides data at the property level. That’s why it has relatively limited information.


Here are the fields you get:

  • data_date: The day for the row (Pacific Time).

  • site_url: The property domain(s)/subdomain(s).

  • query: The keyword.

  • is_anonymized_query: Is the row related to an anonymized query (returns a boolean).

  • country: ISO code of the country where the user was located.

  • search_type: Returns web, image, video, news, discover or googleNews depending on where the search was done.

  • device: The device type.

  • impressions: The number of impressions for the row.

  • clicks: The number of clicks for the row.

  • sum_top_position: “The sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results.” You can use it to calculate the average position = SUM(sum_top_position)/SUM(impressions) + 1


As you can see, you pretty much get the information you would if you were using the UI or the API. One thing you only have here is “is_anonymized_query”, which is unique to the bulk export.

Page level aggregated data



This second table, which aggregates data at the URL level, is the most interesting one. As you will see, it is very granular and offers a lot of details. That’s also with it that you will be able to create insightful analysis.


Here are the fields you get:

  • data_date: Same as for searchdata_site_impression, the day for the row (Pacific Time).

  • site_url: Same as for searchdata_site_impression, the domains/subdomains tied to your property.

  • url: The URL for which the data was recorded.

  • query: Same as for searchdata_site_impression, the keyword.

  • is_anonymized_query: Same as for searchdata_site_impression, this field returns a boolean to indicate if the query was anonymized or not.

  • is_anonymized_discover: This is similar to is_anonymized_query but for Discover.

  • country: Same as for searchdata_site_impression, it returns the ISO code for the row country.

  • search_type: Same as for searchdata_site_impression, where the search was conducted.

  • device: Same as for searchdata_site_impression, it returns the device type.

  • is_[search_appearance_type]: You get multiple columns, is_amp_top_stories, is_job_listing, and is_job_details, which return a boolean to flag an appearance type.

  • impressions: Same as for searchdata_site_impression, the number of impressions.

  • clicks: Same as for searchdata_site_impression, the number of clicks.

  • sum_position: This works just like the sum_top_position, it is a zero based position indicating the topmost position of this URL in the search results for the query. You also need to use it to calculate the average position.


As you can see, you get a lot of granularity. And the huge advantage here is that, thanks to BigQuery’s granularity, you can apply any segmentation you want and get it all for each row.

Export logs

This last table is purely functional and will help you keep track of your automatic exports, when they took place and if they had issues. Therefore, the information you get is quite limited.


Here are the fields you get:

  • agenda: It currently only returns “SEARCHDATA”.

  • namespace: The table name the data was exported to.

  • data_date: The data date, Pacific time zone.

  • epoch_version: It returns a number (INT) to indicate the amount of time the GSC exported data for the rows in question. It is 0 based, so if the field returns “0”, it means there was only 1 export. If the GSC had to adjust the data (because of anomalies), it might go up - “1” meaning 2 exports took place, “2” 3 exports…

  • publish_time: When the data was exported (time zone: PT).


If you want more details, feel free to check the official documentation.


Comments


bottom of page