Is there any way to encode Non Numeric values in a dataframe column

Question

I have a dataframe which includes both numeric and non numeric values (It includes some special characters like -, space etc). I want to encode that Non Numeric value to run corr(). Non numeric Column name eg: 'Department', 'Location', etc. I used Label Encoder(). But it shows a TypeError;

TypeError: '<' not supported between instances of 'int' and 'str'

I used this code :

le = preprocessing.LabelEncoder()

X_train['Department'] = le.fit_transform(X_train['Department'])

saliustripe · Accepted Answer · 2022-09-15 13:54:06Z

0

If the data is not ordinal, I wouldn't use LabelEncoder with corr(), as that will yield false insight.

pd.getdummies(X_train['Department']) has been adequate for using pd.DataFrame.corr() for me. It will create as many columns as there are classifications, and mark 1 for each row where the classification matches the column label, otherwise 0.

The other issue is possibly mixed datatypes in 'Department', which can be fixed with df['Department'] = df['Department'].astype('str'). It's probably most efficient to do this before your train-test split.

answered Sep 15, 2022 at 13:54

saliustripe

965 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Is there any way to encode Non Numeric values in a dataframe column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related