Google BigQuery is arguably the most popular and powerful serverless data warehouses in the market. However, as powerful as it already is, improving BigQuery performance is constantly on the minds of data & analytics teams.
And while Google and others have given great tips for optimizing performance, in this post you’re going to learn how to take your BigQuery performance & productivity to the next level with superQuery.
Through our IDE for BigQuery, superQuery enables data & analytics teams to use BigQuery at peak performance.
However, the definition of “peak performance” varies, depending on your role.
- If you’re a data scientist or analyst, that might mean shortening your workflows in BQ to do more work in less time.
- In contrast, if you’re a Head of Data / Analytics / BI, peak performance could mean optimizing BigQuery resource usage across your team. Or having more fine-grain controls for managing costs.
Whatever your role is, if you want to improve the way you and your team use BigQuery, continue reading.
Let’s get started
BigQuery Performance Tip #1:
Work on multiple queries with Tabs
When working on several queries for a work assignment, you have several options:
Option #1: Go back and forth between queries, saving one before switching to work on another.
However, in this case you’ll lose the query results of your first query to execute the second.
Option #2: Open multiple Chrome tabs running BigQuery.
Yet, the more Chrome tabs open, the harder it becomes to find the query you’re looking for.
Query tabs in superQuery address the drawbacks of both options.
Query tabs allow you to work across multiple queries in a single window, but there’s much more beyond that:
- The SQL you write persists over time. If you close your session or refresh the page, they’ll still be there when you return. No more needing to remember to hit that “Save Query” button!
- Is one query taking 50 seconds to execute? Work on a different query in another tab while it runs.
- Run queries in parallel or, as you would in Airflow, in a sequence with one query building off the previous one’s result. You can even schedule these in a single scheduler!
- Each tab has its own results history, so you won’t need to re-execute queries to get previous results.
BigQuery Performance Tip #2:
Categorize your queries with Boards
So now that your queries don’t disappear automatically, how do you organize all of those query tabs?
The answer is by using Boards in superQuery.
Rather than having one list containing 50+ unrelated Saved Queries, split up your queries into Boards. As result, queries will be easier to find and you’ll work with more clarity.
BigQuery Performance Tip #3:
Write dynamic queries with variables
How much time do you spend repeatedly making small changes to your SQL? Or using CTRL-F + Replace All to change values that appear multiple times throughout a lengthy query?
When you find yourself in these situations, then it’s time to insert variables into your SQL.
Variables (sometimes called “parameters“) are simply placeholders for values. They make your queries dynamic and more readable. Plus, they allow you to perform quick changes to your query without modifying the SQL itself.
The best part? In superQuery, you can create variables without using any SQL!
Let’s see how.
How to set up variables in superQuery
- Select your variable’s data type.
- Give your variable a name.
- Assign it an initial value.
Then, insert variables into your query by substituting the literal value with @your_variable_name, as shown below.
Variables can represent DATE, DATETIME, or STRING data types.
Outside of WHERE clauses, they can also be used to substitute column, project, dataset, or table values — even partially, which is great if your table names end with a date like “daily_sales20191130”.
As seen above, using variables helps improve your BigQuery performance by enabling you to make quick edits to your queries. Less time using CTRL-F + Replace All means a more productive analyst.
BigQuery Performance Tip #4:
Explore your query results in one click
Sometimes viewing your query results in table format isn’t enough. It doesn’t tell the whole story.
Furthermore, you don’t want to go through exporting results and loading them into another tool.
You want a visualization of your results, and you want it now.
In superQuery, that’s achievable with one click of the Chart icon in your results grid.
Other times, you may want to quickly combine the results of all your queries into one dashboard.
For that, click on the Visualize toggle, as shown below. This toggle visualizes your entire Board, into shareable dashboards.
BigQuery Performance Tip #5:
Version-control your queries automatically
Have you ever written what you thought was a one-time query, only to have to recreate it from memory a few weeks later?
Or have you ever found yourself needing a previous version of a query you wrote?
With a query library — a repository for saving, updating, and tracking changes in your queries — you’ll never find yourself troubled with these situations again.
However as your query count piles, it can get tedious manually managing your query library.
But by connecting your GitHub repository to superQuery, your query library will automatically update itself.
How does it work?
Every time you execute a query and there is a change in the SQL, it will automatically get committed to your selected GitHub repository.
BigQuery Performance Tip #6:
Optimize your queries at scale
Improving query performance is always top of mind when you’re managing a team of BigQuery users.
So what can you do at scale to optimize BigQuery performance?
The answer lies in smarter caching.
Before we dive into that, let’s quickly examine the default caching you have in BigQuery.
In BigQuery, temporary, cached results tables are maintained per-user, per-project. Cached results are also best-effort (the opposite being assured), meaning that while typical cache lifetime is 24 hours, cached results may be invalidated sooner.
If you are repeatedly executing the same queries over a 24-hour period, then this cache might help you out.
But overall, you’re pretty limited to the scenarios this cache is usable.
- What if you could pull cached results from a query your colleague recently ran?
- Or pull partially cached results from partitions which were previously scanned?
- Finally, what if you could do both of the above, but also via your BI tool?
That’s what’s possible with superQuery’s optimization engine.
Enhanced Caching with superQuery
Utilizing Enhanced Caching, superQuery breaks down queries into parts and reconstructs them for maximum efficiency. This minimizes your data scanned without affecting results.
No longer will your team pay for repeat runs of the same query. Or repeat scans of the same partition.
This is especially beneficial in BI tools where the same dashboards — made up of many queries themselves — are repeatedly refreshed by different users.
The result? Query performance improves across your team, which translates into a much lower monthly bill — or reduced slots usage if you’re on a fixed price plan.
See how to set up Enhanced Caching for your team here.
BigQuery Performance Tip #7:
Control your team's BQ costs with query policies.
You can educate your team on optimizing BigQuery performance, but it doesn’t necessarily mean that they will write queries efficiently. There will be times where someone will forget to take data volume into consideration.
For these situations, setting up Query policies stops unnecessary scanning of data in its tracks.
Query policies specify a limit on data scanned or query cost for a given interval of time.
They’re amazing for effortlessly managing your team and can be customized according to your team’s usage habits.
Policies are applied either on a per-user basis or across the entire team you’ve set up in superQuery.
With policies you can block users from querying when limits are reached, or get notified.
Explore your BigQuery data in Dark Mode
Some prefer Dark Mode when working at night. Others for accessibility reasons. Or, you might simply like the aesthetics.
Whatever the reason, Dark Mode is just a click away in superQuery!
Besides, is it *really* an IDE if there isn’t a Dark Mode?