Google BigQuery is arguably the most popular and powerful serverless data warehouses in the market. However, as powerful as it already is, improving BigQuery performance is constantly on the minds of data & analytics teams.
And while Google and others have given great tips for optimizing performance, in this post you’re going to learn how to take your BigQuery performance & productivity to the next level with superQuery.
Through our IDE for BigQuery, superQuery enables data & analytics teams to use BigQuery at peak performance.
However, the definition of “peak performance” varies, depending on your role.
- If you’re a data scientist or analyst, that might mean shortening your workflows in BQ to do more work in less time.
- In contrast, if you’re a Head of Data / Analytics / BI, peak performance could mean optimizing BigQuery resource usage across your team. Or having more fine-grain controls for managing costs.
Whatever your role is, if you want to improve the way you and your team use BigQuery, continue reading.
Let’s get started
BigQuery Performance Tip #1:
Work on multiple queries with Tabs
When working on several queries for a work assignment, you have several options:
Option #1: Go back and forth between queries, saving one before switching to work on another.
However, in this case you’ll lose the query results of your first query to execute the second.
Option #2: Open multiple Chrome tabs running BigQuery.
Yet, the more Chrome tabs open, the harder it becomes to find the query you’re looking for.
Query tabs in superQuery address the drawbacks of both options.
Query tabs allow you to work across multiple queries in a single window, but there’s much more beyond that:
- The SQL you write persists over time. If you close your session or refresh the page, they’ll still be there when you return. No more needing to remember to hit that “Save Query” button!
- Is one query taking 50 seconds to execute? Work on a different query in another tab while it runs.
- Run queries in parallel or, as you would in Airflow, in a sequence with one query building off the previous one’s result. You can even schedule these in a single scheduler!
- Each tab has its own results history, so you won’t need to re-execute queries to get previous results.
BigQuery Performance Tip #2:
Categorize your queries with Boards
So now that your queries don’t disappear automatically, how do you organize all of those query tabs?
The answer is by using Boards in superQuery.
Rather than having one list containing 50+ unrelated Saved Queries, split up your queries into Boards. As result, queries will be easier to find and you’ll work with more clarity.
BigQuery Performance Tip #3:
Write dynamic queries with variables
How much time do you spend repeatedly making small changes to your SQL? Or using CTRL-F + Replace All to change values that appear multiple times throughout a lengthy query?
When you find yourself in these situations, then it’s time to insert variables into your SQL.
Variables (sometimes called “parameters“) are simply placeholders for values. They make your queries dynamic and more readable. Plus, they allow you to perform quick changes to your query without modifying the SQL itself.
The best part? In superQuery, you can create variables without using any SQL!
Let’s see how.
How to set up variables in superQuery
- Select your variable’s data type.
- Give your variable a name.
- Assign it an initial value.
Then, insert variables into your query by substituting the literal value with @your_variable_name, as shown below.
Variables can represent DATE, DATETIME, or STRING data types.
Outside of WHERE clauses, they can also be used to substitute column, project, dataset, or table values — even partially, which is great if your table names end with a date like “daily_sales20191130”.
As seen above, using variables helps improve your BigQuery performance by enabling you to make quick edits to your queries. Less time using CTRL-F + Replace All means a more productive analyst.
BigQuery Performance Tip #4:
Explore your query results in one click
Sometimes viewing your query results in table format isn’t enough. It doesn’t tell the whole story.
Furthermore, you don’t want to go through exporting results and loading them into another tool.
You want a visualization of your results, and you want it now.
In superQuery, that’s achievable with one click of the Chart icon in your results grid.
Other times, you may want to quickly combine the results of all your queries into one dashboard.
For that, click on the Visualize toggle, as shown below. This toggle visualizes your entire Board, into shareable dashboards.
BigQuery Performance Tip #5:
Version-control your queries automatically
Have you ever written what you thought was a one-time query, only to have to recreate it from memory a few weeks later?
Or have you ever found yourself needing a previous version of a query you wrote?
With a query library — a repository for saving, updating, and tracking changes in your queries — you’ll never find yourself troubled with these situations again.
However as your query count piles, it can get tedious manually managing your query library.
But by connecting your GitHub repository to superQuery, your query library will automatically update itself.
How does it work?
Every time you execute a query and there is a change in the SQL, it will automatically get committed to your selected GitHub repository.
Explore your BigQuery data in Dark Mode
Some prefer Dark Mode when working at night. Others for accessibility reasons. Or, you might simply like the aesthetics.
Whatever the reason, Dark Mode is just a click away in superQuery!
Besides, is it *really* an IDE if there isn’t a Dark Mode?