Share on facebook
Share on linkedin
Share on twitter
Share on reddit

7 ways to boost BigQuery performance with superQuery

Google BigQuery is arguably the most popular and powerful serverless data warehouses in the market. However, as powerful as it already is, improving BigQuery performance is constantly on the minds of data & analytics teams.

And while Google and others have given great tips for optimizing performance, in this post you’re going to learn how to take your BigQuery performance & productivity to the next level with superQuery.

Through our IDE for BigQuery, superQuery enables data & analytics teams to use BigQuery at peak performance.

However, the definition of “peak performance” varies, depending on your role.

  • If you’re a data scientist or analyst, that might mean shortening your workflows in BQ to do more work in less time.

     

  • In contrast, if you’re a Head of Data / Analytics / BI, peak performance could mean optimizing BigQuery resource usage across your team. Or having more fine-grain controls for managing costs.

Whatever your role is, if you want to improve the way you and your team use BigQuery, continue reading.

Let’s get started

BigQuery Performance Tip #1:

Work on multiple queries with Tabs

When working on several queries for a work assignment, you have several options:

Option #1: Go back and forth between queries, saving one before switching to work on another. 

However, in this case you’ll lose the query results of your first query to execute the second.

Option #2: Open multiple Chrome tabs running BigQuery. 

Yet, the more Chrome tabs open, the harder it becomes to find the query you’re looking for. 

Query tabs in superQuery address the drawbacks of both options.

Improve BigQuery Performance with Query Tabs

Query tabs allow you to work across multiple queries in a single window, but there’s much more beyond that:

  • The SQL you write persists over time. If you close your session or refresh the page, they’ll still be there when you return. No more needing to remember to hit that “Save Query” button!
  • Is one query taking 50 seconds to execute? Work on a different query in another tab while it runs.
  • Run queries in parallel or, as you would in Airflow, in a sequence with one query building off the previous one’s result. You can even schedule these in a single scheduler!
  • Each tab has its own results history, so you won’t need to re-execute queries to get previous results.
Query tab results
Results history of one tab

BigQuery Performance Tip #2:

Categorize your queries with Boards​

So now that your queries don’t disappear automatically, how do you organize all of those query tabs?

The answer is by using Boards in superQuery.

Boards are like folders where you store and manage your queries. They’re Saved Queries on steroids, allowing you to categorize your queries by department, data project, or however you’d like!

Rather than having one list containing 50+ unrelated Saved Queries, split up your queries into Boards. As result, queries will be easier to find and you’ll work with more clarity.

Diagram showing how to organize your BigQuery queries using superQuery

BigQuery Performance Tip #3:

Write dynamic queries with variables​

How much time do you spend repeatedly making small changes to your SQL? Or using CTRL-F + Replace All to change values that appear multiple times throughout a lengthy query?

When you find yourself in these situations, then it’s time to insert variables into your SQL.

Variables (sometimes called “parameters“) are simply placeholders for values. They make your queries dynamic and more readable. Plus, they allow you to perform quick changes to your query without modifying the SQL itself.

The best part? In superQuery, you can create variables without using any SQL

Let’s see how.

 

How to set up variables in superQuery

Rather than writing a DECLARE and SET clause, you just need to perform three quick steps to create a variable:
 
  1. Select your variable’s data type.
  2. Give your variable a name.
  3. Assign it an initial value.
Create Variables in superQuery
Creating a String variable for product category called "category" with a value of "Apparel"

Then, insert variables into your query by substituting the literal value with @your_variable_name, as shown below.

Inserting Variables SQ
How to insert your variable into a query

Changing variable values is also done without SQL — no SET clause necessary. Just edit the value in the text box below your variable’s name.

Change Variable Value SQ
Change variable values with zero SQL

Variables can represent DATE, DATETIME, or STRING data types. 

Outside of WHERE clauses, they can also be used to substitute column, project, dataset, or table values — even partially, which is great if your table names end with a date like “daily_sales20191130”.

Variables for Dates
Using variables for Date values
Using variables to substitute table values
Using variables to substitute table values

As seen above, using variables helps improve your BigQuery performance by enabling you to make quick edits to your queries. Less time using CTRL-F + Replace All means a more productive analyst.

BigQuery Performance Tip #4:

Explore your query results in one click

Sometimes viewing your query results in table format isn’t enough. It doesn’t tell the whole story.

Furthermore, you don’t want to go through exporting results and loading them into another tool.

You want a visualization of your results, and you want it now.

In superQuery, that’s achievable with one click of the Chart icon in your results grid.

Visualize query result with SuperQuery
Instantly visualize your query result

Other times, you may want to quickly combine the results of all your queries into one dashboard.

For that, click on the Visualize toggle, as shown below. This toggle visualizes your entire Board, into shareable dashboards.

Create dashboards SQ
Transform your queries into a dashboard

BigQuery Performance Tip #5:

Version-control your queries automatically

Have you ever written what you thought was a one-time query, only to have to recreate it from memory a few weeks later? 

Or have you ever found yourself needing a previous version of a query you wrote?

With a query library — a repository for saving, updating, and tracking changes in your queries — you’ll never find yourself troubled with these situations again.

However as your query count piles, it can get tedious manually managing your query library.

But by connecting your GitHub repository to superQuery, your query library will automatically update itself.

How does it work? 

Every time you execute a query and there is a change in the SQL, it will automatically get committed to your selected GitHub repository. 

Version History for a query with the superQuery integration for Github

 

Your queries are catalogued under a hierarchy of Boards > Query Tabs, making your library easy to navigate (as long as you give your boards & tabs good names).

BigQuery Performance Tip #6:

Optimize your queries at scale

Improving query performance is always top of mind when you’re managing a team of BigQuery users.

Popular tactics include optimizing query computation, managing query outputs, or avoiding SQL “anti-patterns”. However, these are difficult to enforce at scale.

So what can you do at scale to optimize BigQuery performance? 

The answer lies in smarter caching. 

Before we dive into that, let’s quickly examine the default caching you have in BigQuery.

 

BigQuery Caching

 

In BigQuery, temporary, cached results tables are maintained per-user, per-project. Cached results are also best-effort (the opposite being assured), meaning that while typical cache lifetime is 24 hours, cached results may be invalidated sooner.

If you are repeatedly executing the same queries over a 24-hour period, then this cache might help you out.

But overall, you’re pretty limited to the scenarios this cache is usable.

 

  • What if you could pull cached results from a query your colleague recently ran?

  • Or pull partially cached results from partitions which were previously scanned?

  • Finally, what if you could do both of the above, but also via your BI tool? 

 

That’s what’s possible with superQuery’s optimization engine.

 

Enhanced Caching with superQuery 

Utilizing Enhanced Caching, superQuery breaks down queries into parts and reconstructs them for maximum efficiency. This minimizes your data scanned without affecting results.

No longer will your team pay for repeat runs of the same query. Or repeat scans of the same partition. 

This is especially beneficial in BI tools where the same dashboards — made up of many queries themselves — are repeatedly refreshed by different users. 

The result? Query performance improves across your team, which translates into a much lower monthly bill — or reduced slots usage if you’re on a fixed price plan.

See how to set up Enhanced Caching for your team here.

BigQuery Performance Tip #7:

Control your team's BQ costs with query policies.

You can educate your team on optimizing BigQuery performance, but it doesn’t necessarily mean that they will write queries efficiently. There will be times where someone will forget to take data volume into consideration.

For these situations, setting up Query policies stops unnecessary scanning of data in its tracks.

Query policies specify a limit on data scanned or query cost for a given interval of time. 

They’re amazing for effortlessly managing your team and can be customized according to your team’s usage habits.

Policies are applied either on a per-user basis or across the entire team you’ve set up in superQuery. 

With policies you can block users from querying when limits are reached, or get notified. 

Policy limiting each user to $75 in queries per hour.

Bonus Tip:

Explore your BigQuery data in Dark Mode

Some prefer Dark Mode when working at night. Others for accessibility reasons. Or, you might simply like the aesthetics.

Whatever the reason, Dark Mode is just a click away in superQuery!

Besides, is it *really* an IDE if there isn’t a Dark Mode?

 

Explore BigQuery Data in Dark Mode
  • © 2019 superQuery. All rights reserved.

We use 🍪 to ensure you get the best experience.