Problem description
Currently I am working on a Python project to apply lots of business rules to fact datasets extracted from ERP systems so that I can be sure all registers are in compliance. I this case I am validating Tax values so that the company pay the right amount to IRS.
The project in the currently state is just a code with rules being applied as plain statements in the script. I don't think this is the right choice to strutturate this project, based on the idea that:
- The business rules changes overtime (based on internal and external influence)
- Those rules can grow to a considerable amount (about 50 rules per tax)
- With plain script and lots of rules the code maintenance can be a pain
In my scenario, each Tax to be validated has it set of rules. And lots of Tax need to be validated.
I am thinking of converting my project to the solution I am presenting to you below. I tried to look for a solution in the internet but I couldn't find it. I came across:
- Create my own DSL (domain specific language) to right business rules: Appears to be extremely time consuming.
- Using map function: I do not know if it fits my case.
- Using lambda function: Depending of the rule the code complexity can grown with the number of lines.
- Chaining function operations using Python Chain: I do not know if it fits my case.
I believe a good way to implement it is to use OOP (Object-oriented programming), creating a class to each Tax, applying rules to the class and calling a method to validate it.
I don't know if it is the more pythonic way of doing it. I know, for example, that Credit Card Companies validate each Purchase Approval under hundreds of Business Rules but I couldn't find a "code like pro way" of doing it so that the script can be flexible enough to add or delete rules based on future changes in the Business Flow.
That is...
I would like to kindly ask you for advice on how to properly (pythonic) structurate it. =)
What I'm thinking as a possible solution
Project directory [to be]
.
└───reconciliation
main.py
validate.py
The code [to be]
from validate import Tax
# initiate tax instances
tax_INSS = Tax(tp_tax='INSS')
tax_ISS = Tax(tp_tax='ISS')
tax_PISCOFINS = Tax(tp_tax='PIS/COFINS')
# define a business rule
def check_if_rows(df):
# this is just an example
result = len(df.index)
If result:
return True
Else
return False
# add the Business Rule as a new rule to the current tax
tax_INSS.add_rule(check_if_rows, df)
# apply all the rules to validate the tax
result = tax_INSS.validate()
# show the vector of results
# in this case, return only one result
print(result)
Part of the code [as is]
#####################
## TAX RECONCILIATION
## TAX 1
#####################
import pandas as pd
# import datasets that serve as input to the reconciliation
sap_data = pd.read_excel("Inputs/sap_data.xlsx")
synchro_data = pd.read_excel("Inputs/synchro_data.xlsx")
##########################################
# apply data wrangling step
### Cleaning data
### Transform data
### > Output df: sap_data_transformed
### synchro_data_transformed
##########################################
# [RULE 1] Filtering out Tax Payments and Accounting Movements###
sap_wdn = sap_data_transformed.query("`TP_DOC` in ['AB', 'SB']")
sap_result = sap_data_transformed.query("`TP_DOC` not in ['AB', 'SB']")
# [RULE 2] Filtering out AP Invoices returns from previous accounting period
sap_rpp = sap_result.query('`DT_DOC` < "20201201" and `AMOUNT_VAL` > 0')
sap_result = sap_result[~sap_result.index.isin(sap_rpp.index)]
# [RULE 3] Filtering out Postings for AP Invoice Reversals
sap_pir = sap_result[sap_result.duplicated(subset=['supplier_invoice_key'], keep='last')]
sap_result = sap_result[~sap_result.index.isin(sap_pir.index)]
# [RULE 4] Filtering out Late Entries of AP Invoices
sap_lei = sap_result.query('`DT_DOC` < "20201201" and `AMOUNT_VAL` < 0')
sap_result = sap_result[~sap_result.index.isin(sap_lei.index)]
### Compare the difference between both dfs
accounting_reconciliation = pd.merge(sap_result,
synchro_data_transformed,
how="outer",
on='supplier_invoice_key',
indicator=True,
validate='one_to_one',
)
#####################################################################
# after that, check if the dfs (sap_wdn, sap_rpp, sap_pir, sap_lei)
# has rows.
# if YES, then there are no errors and the reconciliation is True
# if NO, then there are errors and the reconciliation is False
#####################################################################
question from:
https://stackoverflow.com/questions/65864982/validating-data-under-a-set-of-rules-or-operations-how-to-properly-architect-a