python - Comparing rows of pandas dataframe (rows have some overlapping values) -
i have pandas dataframe 21 columns. focusing on subset of rows have same column data values except 6 unique each row. don't know column headings these 6 values correspond priori.
i tried converting each row index objects, , performed set operation on 2 rows. ex.
row1 = pd.index(sample_data[0]) row2 = pd.index(sample_data[1]) row1 - row2
which returns index object containing values unique row1. can manually deduce columns have unique values.
how can programmatically grab column headings these values correspond in initial dataframe? or, there way compare 2 or multiple dataframe rows , extract 6 different column values of each row, corresponding headings? ideally, nice generate new dataframe unique columns.
in particular, there way using set operations?
thank you.
you don't need index, compare 2 rows , use filter columns list comprehension.
df = pd.dataframe({"col1": np.ones(10), "col2": np.ones(10), "col3": range(2,12)}) row1 = df.irow(0) row2 = df.irow(1) unique_columns = row1 != row2 cols = [colname colname, unique_column in zip(df.columns, bools) if unique_column] print cols # ['col3']
if know standard value each column, can convert rows list of booleans, i.e.:
standard_row = np.ones(3) columns = df.columns unique_columns = df.apply(lambda x: x != standard_row, axis=1) unique_columns.apply(lambda x: [col col, unique_column in zip(columns, x) if unique_column], axis=1)
Comments
Post a Comment