0

I have a dataframe which includes both numeric and non numeric values (It includes some special characters like -, space etc). I want to encode that Non Numeric value to run corr(). Non numeric Column name eg: 'Department', 'Location', etc. I used Label Encoder(). But it shows a TypeError;

TypeError: '<' not supported between instances of 'int' and 'str'

I used this code :

le = preprocessing.LabelEncoder()

X_train['Department'] = le.fit_transform(X_train['Department'])

1 Answer 1

0

If the data is not ordinal, I wouldn't use LabelEncoder with corr(), as that will yield false insight.

pd.getdummies(X_train['Department']) has been adequate for using pd.DataFrame.corr() for me. It will create as many columns as there are classifications, and mark 1 for each row where the classification matches the column label, otherwise 0.

The other issue is possibly mixed datatypes in 'Department', which can be fixed with df['Department'] = df['Department'].astype('str'). It's probably most efficient to do this before your train-test split.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.