In the world of databases, MongoDB has been widely used for its scalability and flexibility. Aggregation Pipeline is one of the feature of MongoDB which makes it so powerful. The knowledge of Aggregation Pipeline will help you to boost your data skills whether you're new to MongoDB or an experienced developer. Today In this blog we are going to dive in and discover how to make the most of MongoDB's Aggregation Pipeline!. This blog is inspired by the Playlist of Hitesh Choudhary on Youtube. You can access it from here.

Prerequisite

  • This blog assumes that you already have MongoDB setup done on your local machine.
  • You know basic MongoDB commands and are familiar with MongoDB tools.

Introduction:

Aggregation Pipeline is a framework for perfoming data transformation and aggregation tasks within MongoDB. We can assume it as a series of stages through which a document passes, where each stage applies a specific operation to the data.

Why is the Aggregation Pipeline Needed?

The method of querying by using 'find()' and 'findOne()' are only suitable for retrieving individual documents or subsets of the data but when complex analysis is needed these methods fall short and this is where the Aggregation Pipeline comes into the picture.

Key Concepts of the Aggregation Pipeline

  1. Stages: The pipeline consists of multiple stages, each performing a specific operation on the data. These stages can range from simple filtering and sorting to complex aggregation and grouping.
  2. Operators: MongoDB provides a rich set of operators that can be used within each stage to manipulate and transform data. These operators encompass a wide range of functionalities, including arithmetic operations, array manipulation, string manipulation, and more.
  3. Pipeline Execution:The stages in the pipeline are executed sequentially, with the output of one stage serving as the input for the next. This allows for a modular and flexible approach to data processing, where multiple operations can be chained together to achieve desired results.

In this blog let's learn MongoDB aggregation Pipeline in a more practical way rather than the old fashioned way. We will learn about the usecase of the MongoDB Aggregation Pipeline in various scenarios which will be useful while working on Projects. We will learn by answering the questions which will provide all the concept necessary to get started with MongoDB Aggregation Pipeline.

For simplicity of understanding let's take a sample data you can download the data from here:

 {
      "index": NumberInt(0),
      "name": "Aurelia Gonzales",
      "isActive": false,
      "registered": ISODate("2015-02-11T04:22:39+0000"),
      "age": NumberInt(20),
      "gender": "female",
      "eyeColor": "green",
      "favoriteFruit": "banana",
      "company": {
        "title": "YURTURE",
        "email": "aureliagonzales@yurture.com",
        "phone": "+1 (940) 501-3963",
        "location": {
          "country": "USA",
          "address": "694 Hewes Street"
        }
      },
      "tags": [
        "enim",
        "id",
        "velit",
        "ad",
        "consequat"
      ]
    },

The above is the format of the data we will consider in the blog. We have large number of similar data in the array stored in the MongoDB Database. Now we can start learning about the actual usecases of MongoDB Aggregation Pipeline.

Question 1. How many active Users are in the dataset?

  • This is an easy one we can just calculate the no of active users by counting the number of users whose isActive field contains true.
  • This can be done by the following:
[ 
  {   
    $match: 
      { 
        isActive: true 
      } 
  }, 
  { 
      $count: 'Active Users' 
  } 
]

Let's break down the above query

{   
    $match: 
      { 
        isActive: true 
      } 
 },

This is the first stage of the aggregation pipeline indicated by the '$match' operator. It basically filters out all the documents in the collection based on specified conditions. Here it filters out all the documents which doesn't contain the isActive field as true. So we have already filtered out all the active users. Now the only job remaining is to count the total number of users that are remaining from the first stage and we can do that in the next stage.

{ 
  $count: 'Active Users' 
}

This is the second stage of the aggregation pipeline. It calculates the total number of Users that has passed through the first stage. The string "Active Users" is just an alias for the count result. It will be the name of the field of the output document that holds the count value.

Together, these two stages forms an aggregation pipeline which filters out the users based on the 'isActive' field and count the number of user which meet the specified condition.

This was just the beginning of what we are going to learn in this series of blog. We will dive more into the MongoDB aggregation Pipeline in the next part of the blog . Till Then Stay Tuned! Bye

Link of the second part will be updated here once it get's released.