Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
87 views
in Technique[技术] by (71.8m points)

python - Predicting purchase probability based on prior orders?

Let's assume we have the following dataframe:

merged = pd.DataFrame({'week' : [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
                       'shopper' : [0, 0, 0, 1, 1, 0, 1, 1, 2, 0, 2, 2],
                       'product' : [63, 80, 91, 42, 77, 55, 77, 95, 77, 98, 202, 225],
                       'price' : [543, 644, 770, 620, 560, 354, 525, 667, 525, 654, 783, 662],
                       'discount' : [0, 0, 10, 12, 0, 30, 10, 0, 0, 5, 0, 0]
})
print(merged)

    week  shopper  product  price  discount
0      0        0       63    543         0
1      0        0       80    644         0
2      0        0       91    770        10
3      0        1       42    620        12
4      0        1       77    560         0
5      1        0       55    354        30
6      1        1       77    525        10
7      1        1       95    667         0
8      1        2       77    525         0
9      2        0       98    654         5
10     2        2      202    783         0
11     2        2      225    662         0

Can you think of a way to estimate the probability that each shopper will buy each product in week 3? I am looking for an end result that looks somewhat like this:


    week  shopper  product     y
0      3        0       55  0.32
1      3        0       63  0.66
2      3        0       80  0.77
3      3        0       91  0.54
4      3        0       98  0.23
5      3        1       42  0.24
6      3        1       77  0.51
7      3        1       95  0.40
8      3        2       77  0.12
9      3        2      202  0.53
10     3        2      225  0.39

I've thought of using the amount of time a customer-product combination has appeared in the past or the amount of time between the orders to forecast the probability that it reoccurs next week, but I don't know how to implement that.

I would be very thankful for any help!

question from:https://stackoverflow.com/questions/65541166/predicting-purchase-probability-based-on-prior-orders

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is not an easy task. The accuracy depends on the number of the past observations. Such a small data as you share is does not give accurate solutions. However, the code below might give you an idea. As you guess, you need to find relation between the products and should use these relations. At below, I have firstly get the averaged price to see how the shopper generally tends to pay.

idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_0_0 = np.average(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_1_0 = np.average(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_2_0 = np.average(merged['price'][idx_0_0 == 1])

total_paid_average_0 = (averaged_paid_price_2_0 + averaged_paid_price_1_0 + averaged_paid_price_0_0)/3

Then I have divided the each product price by total_paid_average_0 as below

merged_price_points_0 = merged['price'] / total_paid_average_0

I am basically trying to give them points.

After all I have looked is there any relation between tendency of the shoppers and discount

idx_0_0_discount = np.multiply(merged['week'] == 0,1) * np.multiply(merged['discount'] != 0,1) * np.multiply(merged['shopper'] == 0,1)
discount_exist_0_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
idx_0_0_discount = np.multiply(merged['week'] == 1,1) * np.multiply(merged['discount'] != 0,1) * np.multiply(merged['shopper'] == 0,1)
discount_exist_1_0 = np.sum(idx_0_0_discount) /  np.sum(np.multiply(merged['shopper'] == 0,1))
idx_0_0_discount = np.multiply(merged['week'] == 2,1) * np.multiply(merged['discount'] != 0,1) * np.multiply(merged['shopper'] == 0,1)
discount_exist_2_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
discount_point_0 = (discount_exist_0_0 + discount_exist_1_0 + discount_exist_2_0) / 3

Again, I have calculated points. After all I have tried to combine the all points.

You can find the all code at below.

import pandas as pd
import numpy as np
merged = pd.DataFrame({'week' :    [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
                       'shopper' : [0, 0, 0, 1, 1, 0, 1, 1, 2, 0, 2, 2],
                       'product' : [63, 80, 91, 42, 77, 55, 77, 95, 77, 98, 202, 225],
                       'price' : [543, 644, 770, 620, 560, 354, 525, 667, 525, 654, 783, 662],
                       'discount' : [0, 0, 10, 12, 0, 30, 10, 0, 0, 5, 0, 0]
})

idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_0_0 = np.average(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_1_0 = np.average(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_2_0 = np.average(merged['price'][idx_0_0 == 1])

total_paid_average_0 = (averaged_paid_price_2_0 + averaged_paid_price_1_0 + averaged_paid_price_0_0)/3


idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 1,1)
averaged_paid_price_0_1 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1) * np.multiply(merged['shopper'] == 1,1)
averaged_paid_price_1_1 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1) * np.multiply(merged['shopper'] == 1,1)
averaged_paid_price_2_1 = np.mean(merged['price'][idx_0_0 == 1])

total_paid_average_1 = (averaged_paid_price_2_1 + averaged_paid_price_1_1 + averaged_paid_price_0_1)/3

idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 2,1)
averaged_paid_price_0_2 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1) * np.multiply(merged['shopper'] == 2,1)
averaged_paid_price_1_2 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1) * np.multiply(merged['shopper'] == 2,1)
averaged_paid_price_2_2 = np.mean(merged['price'][idx_0_0 == 1])

total_paid_average_2 = (averaged_paid_price_2_2 + averaged_paid_price_1_2 + averaged_paid_price_0_2)/3


    merged_price_points_0 = merged['price'] / total_paid_average_0

idx_0_0_discount = np.multiply(merged['week'] == 0,1) * np.multiply(merged['discount'] != 0,1) * np.multiply(merged['shopper'] == 0,1)
discount_exist_0_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
idx_0_0_discount = np.multiply(merged['week'] == 1,1) * np.multiply(merged['discount'] != 0,1) * np.multiply(merged['shopper'] == 0,1)
discount_exist_1_0 = np.sum(idx_0_0_discount) /  np.sum(np.multiply(merged['shopper'] == 0,1))
idx_0_0_discount = np.multiply(merged['week'] == 2,1) * np.multiply(merged['discount'] != 0,1) * np.multiply(merged['shopper'] == 0,1)
discount_exist_2_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
discount_point_0 = (discount_exist_0_0 + discount_exist_1_0 + discount_exist_2_0) / 3

merged_price_points_0 = merged_price_points_0.T

points_list = list()
total_point = list()
for counter in range(len(merged['product'])):
    if merged['discount'][counter] != 0:
        points_list.append(discount_point_0)
    else:
        points_list.append(0)

    if merged_price_points_0[counter] > 1:
        merged_price_points_0[counter] = merged_price_points_0[counter] - 1
    else:
        merged_price_points_0[counter] = 1-merged_price_points_0[counter]
    total_point.append(merged_price_points_0[counter] +points_list[counter] )

sum_of_points = np.sum(total_point)
possibility_of_product_week3_for_0 = total_point / sum_of_points
print("Possibility of 3th Week for 0")
for counter in range(len(merged['product'])):
    print(str(merged['product'][counter]) + "||" + str(possibility_of_product_week3_for_0[counter]))

Output

 Possibility of 3th Week for 0
63||0.005959173323190062
80||0.05166730062127548
91||0.18671231139850392
42||0.10112843920375306
77||0.0037403321922150397
55||0.1769494104222138
77||0.07938379612019782
95||0.06479016102447063
77||0.01622923798656016
98||0.12052745023456325
202||0.13097502218841128
225||0.06193736528464565

I would suggest to search for what Chris commented on. This is not solid answer but might give you an idea. The main idea; defining relationship between products and why the shopper buy it, and giving them points.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...