Using Druid to monitor notification yield

The situation …

We send around 90 million notifications per day.

Now wanted to have a real-time system to send out emails to clients based on the number of notifications actually sent out(a few of the notifications would fail due to the users unsubscribing or errors while customizing the notification).

The challenge :

We use an event-driven microservice architecture to schedule and send out notifications. All the microservice are connected via Kafka (a message buffer).

writing a program that listens to Kafka to aggregate and counts the number of messages received and then splitting them by different engagements, clients and message status would have been a nightmare to develop from scratch! (not to mention the real-time part)

The solution:

This is where Druid proved itself really useful. It was the perfect fit.

I chose Druid because:

1.it was able to ingest data from pre-existing Kafka streams natively

2. it was able to return the results of a large number of simultaneous queries. (complex group bys, limits, and filters all in a matter of milliseconds)

3.the data returned was in real-time as soon as the notifications were sent.

We had a pre-existing Vertica Db but it had an issue in scaling up to real-time.

We queried the data from Druid to generate the email report, it worked like a charm.

The fact that there were other open-source solutions that allowed us to query druid to create visualizations was just icing on the cake!

To connect and visualize Druid we chose to use Metabase. https://www.metabase.com/

Metabase allowed us to query all the notification data in a user-friendly and intuitive manner. We can slice-n-dice the data, filter it, group it you name it!

so this also allowed us to filter notifications by the device id, get the exact error message, and pinpoint the microservice which failed. All this without any complex awk grep sed through the logs.

The Folks in Ops loved it! I personally liked this cause it stopped all the tapping on the shoulder if the notification’s yield was low. You could just use druid to analyze the logs. Now I could finally listen to my favorite Death Metal in peace.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store