Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
219 views
in Technique[技术] by (71.8m points)

Fill day gaps of two-dimensional timeseries data in MongoDB with aggregate

I have a collection of two-dimensional timeseries data as follows:

[
  { 
    "value" : 9, 
    "timestamp" : "2020-12-30T02:06:33.000+0000", 
    "recipeId" : 15
  },
  { 
    "value" : 2, 
    "timestamp" : "2020-12-30T12:04:23.000+0000", 
    "recipeId" : 102
  },
  { 
    "value" : 5, 
    "timestamp" : "2020-12-30T15:09:23.000+0000", 
    "recipeId" : 102
  },
  ...
]

The records have a recipeId which is the first level of grouping I'm looking for. All values for a day of a recipe should be summed up. I want an array of timeseries per recipeId. I need the missing days to be filled with a 0. I want this construct to be created for a provided start and end date range.

Some like this for date range of 2020-12-29 to 2020-12-31:

[
 [
  { 
    "sum" : 0, 
    "timestamp" : "2020-12-29", 
    "recipeId" : 15
  },
  { 
    "sum" : 9, 
    "timestamp" : "2020-12-30", 
    "recipeId" : 15
  },
  { 
    "sum" : 0, 
    "timestamp" : "2020-12-31", 
    "recipeId" : 15
  },
  ...
 ],
 [
  { 
    "sum" : 0, 
    "timestamp" : "2020-12-29", 
    "recipeId" : 0
  },
  { 
    "sum" : 7, 
    "timestamp" : "2020-12-30", 
    "recipeId" : 102
  },
  { 
    "sum" : 0, 
    "timestamp" : "2020-12-31", 
    "recipeId" : 102
  },
  ...
 ]
]

This is what I currently have and it's only partially solving my requirements. I can't manage to get the last few stages right:

[
  {
    "$match": {
      "timestamp": {
        "$gte": "2020-12-29T00:00:00.000Z",
        "$lte": "2020-12-31T00:00:00.000Z"
      }
    }
  },
  {
    "$addFields": {
      "timestamp": {
        "$dateFromParts": {
          "year": { "$year": "$timestamp" },
          "month": { "$month": "$timestamp" },
          "day": { "$dayOfMonth": "$timestamp" }
        }
      },
      "dateRange": {
        "$map": {
          "input": {
            "$range": [
              0,
              {
                "$trunc": {
                  "$divide": [
                    {
                      "$subtract": [
                        "2020-12-31T00:00:00.000Z",
                        "2020-12-29T00:00:00.000Z"
                      ]
                    },
                    1000
                  ]
                }
              },
              86400
            ]
          },
          "in": {
            "$add": [
              "2020-12-29T00:00:00.000Z",
              { "$multiply": ["$$this", 1000] }
            ]
          }
        }
      }
    }
  },
  { "$unwind": "$dateRange" },
  {
    "$group": {
      "_id": { "date": "$dateRange", "recipeId": "$recipeId" },
      "count": {
        "$sum": { "$cond": [{ "$eq": ["$dateRange", "$timestamp"] }, 1, 0] }
      }
    }
  },
  {
    "$group": {
      "_id": "$_id.date",
      "total": { "$sum": "$count" },
      "byRecipeId": {
        "$push": {
          "k": { "$toString": "$_id.recipeId" },
          "v": { "$sum": "$count" }
        }
      }
    }
  },
  { "$sort": { "_id": 1 } },
  {
    "$project": {
      "_id": 0,
      "timestamp": "$_id",
      "total": "$total",
      "byRecipeId": {
        "$arrayToObject": {
          "$filter": { "input": "$byRecipeId", "cond": "$$this.v" }
        }
      }
    }
  }
]

which results in:

[
    {
        "timestamp": "2020-12-29T00:00:00.000Z",
        "total": 21,
        "byRecipeId": {}
    },
    {
        "timestamp": "2020-12-30T00:00:00.000Z",
        "total": 0,
        "byRecipeId": {
            "15": 9,
            "102": 7
        }
    },
    {
        "timestamp": "2020-12-31T00:00:00.000Z",
        "total": 0,
        "byRecipeId": {}
    }
]

I'm open to alternative solution of course. For examples I came across this post: https://medium.com/@alexandro.ramr777/fill-missing-values-using-mongodb-aggregation-framework-f011114e83e0 but it doesn't deal with multi-dimensions.

question from:https://stackoverflow.com/questions/65856702/fill-day-gaps-of-two-dimensional-timeseries-data-in-mongodb-with-aggregate

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could use the $redcue function. This code fills the gabs of Minutes for current day. Should be easy to adapt it to give missing Days.

{
   $addFields: {
      data: {
         $reduce: {
            input: { $range: [0, 24 * 60] },
            initialValue: [],
            in: {
               $let: {
                  vars: { 
                     ts: { 
                        $add: [
                           moment().startOf('day').toDate(), 
                           { $multiply: ["$$this", 1000 * 60] }
                        ] 
                     } 
                  },
                  in: {
                     $concatArrays: [
                        "$$value",
                        [{
                           $cond: {
                              if: { $in: ["$$ts", "$data.timestamp"] },
                              then: { 
                                 $first: { 
                                    $filter: { 
                                       input: "$data", 
                                       cond: { $eq: ["$$this.timestamp", "$$ts"] } 
                                    } 
                                 } 
                              },
                              else: { timestamp: "$$ts", total: 0 }
                           }
                        }]
                     ]
                  }
               }
            }
         }
      }
   }
}

In my opinion, $reduce is more elegant than $map, however based on my experience the performance is much worse with $reduce.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...