It’s been a year since Google announced some new exciting features for G Suite Enterprise and it included a little gem for those companies that want to get the most out of their data: Gmail logs integration in BigQuery. Google stated this on February 1st 2017:
Today, we are making it easier to analyze Gmail logs with a preconfigured BigQuery integration so administrators can run sophisticated, high-performing custom queries, analyze their data and build custom dashboards.
That definitely sounds great! At Gmail Meter, we’re always researching and discussing about ways to bring you new powerful statistics about your email, so this is rather interesting! We’ve been doing tests for a while. Let’s see:
[warning: tech stuff ahead]
How it works:
- Gmail logs are stored in realtime into a preconfigured BigQuery integration with the following schema. (NOTE: the logs start after you check the option in GSuite Enterprise Administrator, read more at this link)
- BigQuery organizes the data in daily tables.
- You can run queries directly on the BigQuery console or write your own app. There are some libraries out there for different languages.
- Queries should be made using SQL Standard, although Google uses their own SQL (“Legacy”) at their examples.
Aaaaand, that’s all… In fact, that’s all the info you can find on the internet about this. For real. No one seems to have published any info about this, or shared any code, apart from these two posts in this blog (thanks Mannetje!) – Link + Link
Querying – getting real stats!
We ran a query over the whole domain, for received emails. Each action that occurs when Gmail is receiving an email, logs a single entry in BigQuery, so we must filter a little bit.
/* received emails */ (message_info.destination.service == 'gmail-ui' AND message_info.source.service == 'smtp-inbound') /* we only need the final entry + not spam */ AND message_info.action_type == 3 AND message_info.is_spam == FALSE
Wildcards are permitted here so we’re checking all the daily tables for the last 7 days.
Voilá! We got 4713 emails, each one with extensive data. We turned some of this data into a nice email report for one single day, and used some processing to throw reply time right into the mix. Also, we generated the report for the whole domain and also for one user. Useful, isn’t it?
At our tests, minimum response time was above 1s, reaching peaks of 5-6s or higher. This is due to Google’s BigQuery inner processes (it kind of escalates with your needs)
That might not seem impressive at first, but think of a really big company (+2000 employees) and 1 year data of email. That’s a lot of emails. Well, it still should take a few seconds for most queries! Now we’re talking. Google BigQuery is scalable, flexible and fast even for GBs of data.
Throughout our research we discovered some stuff that wasn’t well documented at the Schema, and we also found some data limitations (IE: it doesn’t log thread ID)
There’s also a lot of data available in the logs about deliverability, server spam rules, and email behavior.
Conclusions on integrating Gmail logs into BigQuery
Google released a year ago a feature for G Suite Enterprise that allow administrators to automatically store Gmail logs in BigQuery.
It seems that big companies are already using this feature, and BigQuery is already a established product, so the future for this product is definitely exciting.
Good news: if your company is using G Suite Enterprise already, or is thinking about upgrading, you’re just one check away from using Big Data techniques to analyze and get stats for your email usage data using a cloud-powered database. It’s powerful, fast and cheap.
Not so good news: there’s almost no info publicly available about how to build apps around this, and Google documentation is incomplete. Intensive research and development would be needed.
Are you interested? Any technical details to share or ask about BigQuery? Let’s talk further at firstname.lastname@example.org