Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
123 views
in Technique[技术] by (71.8m points)

python - Validating data under a set of rules or operations. How to properly architect a script?

Problem description

  • Business Case

Currently I am working on a Python project to apply lots of business rules to fact datasets extracted from ERP systems so that I can be sure all registers are in compliance. I this case I am validating Tax values so that the company pay the right amount to IRS.

The project in the currently state is just a code with rules being applied as plain statements in the script. I don't think this is the right choice to strutturate this project, based on the idea that:

  • The business rules changes overtime (based on internal and external influence)
  • Those rules can grow to a considerable amount (about 50 rules per tax)
  • With plain script and lots of rules the code maintenance can be a pain

In my scenario, each Tax to be validated has it set of rules. And lots of Tax need to be validated.

  • Looking for solutions

I am thinking of converting my project to the solution I am presenting to you below. I tried to look for a solution in the internet but I couldn't find it. I came across:

  1. Create my own DSL (domain specific language) to right business rules: Appears to be extremely time consuming.
  2. Using map function: I do not know if it fits my case.
  3. Using lambda function: Depending of the rule the code complexity can grown with the number of lines.
  4. Chaining function operations using Python Chain: I do not know if it fits my case.
  • Currently state

I believe a good way to implement it is to use OOP (Object-oriented programming), creating a class to each Tax, applying rules to the class and calling a method to validate it.

I don't know if it is the more pythonic way of doing it. I know, for example, that Credit Card Companies validate each Purchase Approval under hundreds of Business Rules but I couldn't find a "code like pro way" of doing it so that the script can be flexible enough to add or delete rules based on future changes in the Business Flow.

That is...

I would like to kindly ask you for advice on how to properly (pythonic) structurate it. =)


What I'm thinking as a possible solution

Project directory [to be]

.
└───reconciliation
        main.py
        validate.py

The code [to be]

from validate import Tax

# initiate tax instances
tax_INSS = Tax(tp_tax='INSS')
tax_ISS = Tax(tp_tax='ISS')
tax_PISCOFINS = Tax(tp_tax='PIS/COFINS')

# define a business rule
def check_if_rows(df):
    # this is just an example
    result = len(df.index)
    If result:
        return True
    Else
        return False

# add the Business Rule as a new rule to the current tax
tax_INSS.add_rule(check_if_rows, df)

# apply all the rules to validate the tax
result = tax_INSS.validate()

# show the vector of results
# in this case, return only one result
print(result)

Part of the code [as is]

#####################
## TAX RECONCILIATION
## TAX 1
#####################

import pandas as pd

# import datasets that serve as input to the reconciliation
sap_data = pd.read_excel("Inputs/sap_data.xlsx")
synchro_data = pd.read_excel("Inputs/synchro_data.xlsx")

##########################################
# apply data wrangling step
### Cleaning data
### Transform data
### > Output df: sap_data_transformed
###              synchro_data_transformed
##########################################

# [RULE 1] Filtering out Tax Payments and Accounting Movements###
sap_wdn = sap_data_transformed.query("`TP_DOC` in ['AB', 'SB']")
sap_result = sap_data_transformed.query("`TP_DOC` not in ['AB', 'SB']")

# [RULE 2] Filtering out AP Invoices returns from previous accounting period
sap_rpp = sap_result.query('`DT_DOC` < "20201201" and `AMOUNT_VAL` > 0')
sap_result = sap_result[~sap_result.index.isin(sap_rpp.index)]

# [RULE 3] Filtering out Postings for AP Invoice Reversals
sap_pir = sap_result[sap_result.duplicated(subset=['supplier_invoice_key'], keep='last')]
sap_result = sap_result[~sap_result.index.isin(sap_pir.index)]

# [RULE 4] Filtering out Late Entries of AP Invoices
sap_lei = sap_result.query('`DT_DOC` < "20201201" and `AMOUNT_VAL` < 0')
sap_result = sap_result[~sap_result.index.isin(sap_lei.index)]

### Compare the difference between both dfs
accounting_reconciliation = pd.merge(sap_result,
    synchro_data_transformed,
    how="outer",
    on='supplier_invoice_key',
    indicator=True,
    validate='one_to_one',
)

#####################################################################
#   after that, check if the dfs (sap_wdn, sap_rpp, sap_pir, sap_lei)
#   has rows. 
#   if YES, then there are no errors and the reconciliation is True
#   if NO, then there are errors and the reconciliation is False
#####################################################################

question from:https://stackoverflow.com/questions/65864982/validating-data-under-a-set-of-rules-or-operations-how-to-properly-architect-a

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...