Journey of Designing and Implementing Funnel Analytics

What is Funnel Analysis

Funnels are foundational-level insights for all digital businesses today.

Funnel analysis is a method used to visualise the flow of visitors across a set of your website pages or events. Much like a physical funnel, it narrows toward the end; So the volume of visitors at the top is larger than the volume of visitors at the bottom.

Eg: consider a music e-commerce site whose ultimate goal is to get users who visit their site to make a purchase. The steps required in purchase flow on the site would be; searching for a song, adding the song to cart, navigate to checkout screen and completing the purchase by making payment. So you would be adding each of these steps at each levels and build your funnel for understanding at what level users are getting dropped.

500 users had searched for a particular song but only 275 of them added the song to their cart, And only 125 users navigated to checkout page and finally only 100 users who made this purchase.

This visualisation gives clear insights to see where is the drop in the funnel happens and where we need to focus more to get higher conversion rate.

What are the Requirements given ?

  1. Introduce Funnel analysis feature in our existing data analytics tool.
  2. User should be able to add any steps/events to build funnel for a flow and the funnel should be built dynamically without pre aggregation.
  3. User can skip intermediate events and build funnel. — If it takes following steps/events in the order to purchase a song “login to the site”, “search for a song”, “select a song from the search list”, “add the song to cart”, “navigate to checkout page”, “add payments method”, “authenticate the payment”, “complete the purchase”, then user should be able to build funnel with all of these events or with some of the selected events say “search for a song”, “add the song to cart” and “complete the purchase”.
  4. Order to be maintained — when user builds the funnel all the defined steps/events must have taken place in the same order in the actual user session.
  5. Real time funnel with 6 months of data: Approximate daily volume of events is 4 billion events. So when you build a funnel for six months then approximately the system need to search and aggregate 720 billions of events in real time.
  6. User should be able to add additional filters to target a specific user segment: eg; users “who use Firefox browser”

What are the Challenges ?

  • Dynamic Funnel Querying in Real Time— A design that can query over 700+ billions of events with in few seconds, Pre-aggregation is not possible as the user can choose any steps/events while building the funnel.
  • Data Storage — Persist billions of events in the same order in which it was flowed to our system, And search events and aggregate the number of users who had taken the defined steps/events in their session.

Picking the Correct Data Store

Initial approach was to fit this use-case into our existing data store ‘Druid’, Druid is a real-time analytics database designed for fast slice-and-dice analytics (“OLAP” queries) on large data sets. The approach in mind was to store all events of a particular user in sequence in one column and then use RegEx to query from druid based on the events user wants to build funnel.

But this can not be a scalable solution as we are using RegEx to query and aggregating over billions of events.

The second approach was to use a Graph data base to store user events. Events as vertex and userIds of users who take the path as edges.

With this approach it is possible to query and aggregate edges that matches the events (vertex) defined for the funnel. But our requirement is to skip the intermediate events(vertex) and query. This query needs full table scan as we are skipping intermediate vertex, hence the solution can not scale.

Third approach was to use Elastic Search for this use case. The idea was to store all events of a user in a user session in one single document and query elastic to get the aggregated summery.

Elastic Search Approach in Detail

Elastic Index

Create an index to store documents, each document contains single user session details including all events in the session. Below is the elastic mapping for the index.

{
"mappings": {
"doc": {
"dynamic": "false",
"properties": {
"user_browser": {
"type": "keyword"
},
"user_country": {
"type": "keyword"
},
"user_os": {
"type": "keyword"
},
"ordered_session_events": {
"type": "text",
"analyzer": "whitespace"
},
"time": {
"type": "date",
"format": "strict_date_time"
},
"user_id": {
"type": "keyword"
}
}
}
}
}

Instead of storing the event name in elastic we are storing the event name’s hash in elastic. In this case “ordered_session_events” stores all the event’s hash in the same order events are flowed into our system, this is a white space analysed field hence each event’s hash is separated by a white space. An example elastic document is below.

{
"_index": "funnel",
"_type": "doc",
"_id": "u1234_session1",
"_source": {
"time": "2020-11-16T13:00:00.000-08:00",
"user_os": "ubuntu",
"user_country" : "india",
"user_browser" : "chrome",
"user_id" : "u1234",
"ordered_session_events": "7a1a5232528283d2 2b51b66d7863eeb4 85a68b39b3a99e41 7a1a5232528283d2 05501492fbe0c316 f5da72c7003c39f1 05501492fbe0c316 f5da72c7003c39f1 2b6e4272129223w8 e3322503efw1d428 f5da72c7003c39f1"
}

Elastic Query

Elastic Span Near query along with aggregation does the magic in querying and aggregating for our use case. Span Near query matches the documents which satisfies the events order in the query. We should also define the maximum event distance to be considered as “slop”. More on Span Near query can be read in below link.

Below is the sample query for the querying and aggregating number of users who had searched for a song, then the song was added to the their cart and then completing the purchase.

Assuming
hash(search for a song) = 2b51b66d7863eeb4
hash(add song to the cart) = 05501492fbe0c316
hash(complete the purchase) = e3322503efw1d428

{
"size": 0,
"timeout": "1m",
"query": {
"bool": {
"filter": [
{
"range": {
"time": {
"from": "2020-11-01T00:00:00.000Z",
"to": "2020-11-05T23:59:59.999Z",
"include_lower": true,
"include_upper": true,
"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
"boost": 1
}
}
},
{
"span_near": {
"clauses": [
{
"span_term": {
"ordered_session_events": {
"value": "2b51b66d7863eeb4",
"boost": 1
}
}
},
{
"span_term": {
"ordered_session_events": {
"value": "05501492fbe0c316",
"boost": 1
}
}
},
{
"span_term": {
"ordered_session_events": {
"value": "e3322503efw1d428",
"boost": 1
}
}
}
],
"slop": 500,
"in_order": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"count": {
"cardinality": {
"field": "user_id"
}
}
}
}

Query Result

Below result is the aggregated count of users which matches above query. Cardinality aggregation on the “user_id” field gives the distinct count of users who experienced all the three events as mentioned in the span near query.

{
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"timed_out": false,
"took": 835,
"hits": {
"total": 56,
"max_score": 0.0
},
"aggregations": {
"count": {
"value": 43
}
}
}

In order to get the count of distinct users at level one of the funnel (users who searched for a song) we need to fire additional queries with only one span near clause in the span query.

Conclusion

Arriving to Elastic Search for this use case took some time, Spent days in Graph database to store the data in different structure so that the query is fast enough by avoiding complete table scan. But that did not work as expected.

Today we are ingesting about 4 billion events per day to elastic and we have data of six months in elastic. The query takes only few seconds to respond with the results while querying.

Every technology has some limitations according to the architecture, Choosing and finding the correct technology is important to ensure the solution is a scalable one.

Architect at PayPal