I'm trying to compare 4 columns in a pandas dataframe and populate 5'th column based on the result. In normal SQL it would be something like this:
if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'
I read that this could be achieved using np.where but my code is failing. Can someone please advice me what I'm doing wrong. This is what I wrote:
masterDf['Match'] = np.where(
masterDf[speciality_new].isnull() & masterDf[location_new].isnull(), 'No match found',
masterDf[speciality] != masterDf[speciality_new] & masterDf[location] != masterDf[location_new], 'Both specialty and location didnt match',
masterDf[speciality] != masterDf[speciality_new], 'Specialty didnt match',
masterDf[location] != masterDf[location_new], 'Location didnt match',
True)
The error message is TypeError: unsupported operand type(s) for &: 'str' and 'str'
which doesn't make any sense since '&' is the syntax for 'and'
dfsample is what I have and dfFinal is what I want
dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})
dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})