Streaming vs. batched log-level data

By Kate Dye
Wednesday, September 2nd 2020

Log-or event-level data are huge. Like, terabytes an hour huge. And that might lead you to believe that speed/timeliness of delivery isn’t valuable or even possible. 

However, many of the advanced use cases that lead companies to log level data benefit from data that arrives as soon after the event occurs as possible. 

In our last post, What is log-level data, we spoke about the difference between batched versus streamed log level data. Often logs are processed in batches, even if they aren’t being aggregated. Alternatively, the data can be streamed in real-time meaning it is processed as it arrives and is sent directly to the destination database.

Log-Level Data Lengthy Batch Processing

Batch processing

  • Data is collected over time
  • Once data is collected, it’s sent for processing
  • Batch processing is lengthy and is meant for data that aren’t time-sensitive.
Log-Level Data Quick  Stream Processing

Stream processing

  • Data streams continuously.
  • Data is processed piece-by-piece.
  • Stream processing is fast and is meant for information that’s needed quickly

Which one is best? 

In adtech, log-level data comes as a never-ending stream of events. To do batch processing, you need to store it, stop data collection at some time and process the data. Then you have to do the next batch and then worry about aggregating across multiple batches. In contrast, streaming handles neverending data streams seamlessly without the intermittent starting, stopping and then aggregating. 

Batch processing works well in situations where you don’t need real-time analytics results and is often used when dealing with data sources from legacy systems. 

Stream processing is key to turning big data into fast data where you can feed analytics tools as soon as the event occurs and get real-time utility and insights. 

Fraud detection is a good example. With real-time data, you can detect anomalies that signal fraud in real time, then stop it before it can cause damage.

Is data freshness important?

When choosing a technology, consider the drop-off in usefulness the longer you have to wait—or as the ‘freshness’ or recency of data decreases. 

Are you looking for creative and campaign strategy performance indicators in near real-time to make adjustments in-flight? Or are you reviewing everything post-campaign/spend delivery? If you’re aiming for in-flight adjustment, then you want data as fast as possible. 

Even in-flight, knowing that one campaign strategy outperforms others is more useful early on, but less so once spend has been evenly allocated days later. 

Log Level Data Freshness & Usefullness

In some cases, like delivering campaigns in non-brand safe environments or preventing bad ads, the drop-off in utility from unfresh data could be much shorter. 

There are certainly times where batched or delayed data doesn’t impact the utility. If you aren’t planning to change a strategy or process and bill monthly, there are no benefits to streamed data and even daily batches can seem like a nice-to-have.

With the demise of third-party cookies on the horizon, there are also a number of use cases and applications for using log-level data to optimize campaigns and manage attribution. Here, streamed data would have a considerable advantage.

Check out our solutions in action

Get in touch to schedule a live demo of our platforms with one of our dedicated experts.

Get in touch with one of our experts

If you’re interested in learning more about how we can help your business, reach out to us!