When casting from String
to Integer
in Spark 2 I found a bug and here I present my workaround. Later I verified the bug does not exist in Spark 3. The bug is related to the existence of spaces between commas and values.
To reproduce the error create a file with these contents, I named it test_feature_engineering.csv
:
ID,XACT_DT,DUE_DT,AFT_CHRG_XACT_AMI,BILL_AMT
1,2020-01-11,2020-01-11, 10, 10
2,2020-01-11,2020-01-12, 10, 10
3,2020-01-11 10:10:01,2020-01-12 10:20:00, 10, 10
Open the spark-shell
and type the following commands:
Inspecting the output above it is possible to verify that a cast from String
to Integer
results in res
column with null
values in df2
. df3
shows that casting first to Float
and then to Integer
produces the desired result.