The problem in your code is here:
row = ''
for t in j:
row = row + t
if j.index(t) != len(j)-1:
row = row + ','
because j
is a string when numberOfAttributes
is 1, while it is a tuple with numberOfAttributes
items if numberOfAttributes
is greater than 1.
So, you can fix your code by changing the way row
is computed, based on the type of j
:
if isinstance(j, str):
row = j
else:
row = ''
for t in j:
row = row + t
if j.index(t) != len(j)-1:
row = row + ','
However, you can significantly simplify your code, making it easier to read:
import pandas as pd
import itertools
def get_attributes_set(filepath, n_attributes, support_threshold):
df = pd.read_csv(filepath)
required = support_threshold * len(df.index)
final = []
for i in itertools.combinations(df.columns, n_attributes):
g = df.groupby(list(i)).size().sort_values(ascending=False)
satisfied = list(g[g > required].index)
if len(satisfied):
final.append(satisfied[0] if isinstance(satisfied[0], str) else ','.join(satisfied[0]))
return final
Testing the previous code with the following lines:
print(get_attributes_set('census.csv', 1, .6))
print(get_attributes_set('census.csv', 4, .6))
you get:
['sex=Male', 'native-country=United-States', 'race=White', 'workclass=Private', 'income=Small', 'capital-gain=None', 'capital-loss=None']
['native-country=United-States,race=White,capital-gain=None,capital-loss=None', 'native-country=United-States,income=Small,capital-gain=None,capital-loss=None']