python - Find duplicates of two columns from csv -
i want find duplicate values of 1 column , replaced value of column of csv has multiple columns. first put 2 columns csv dictionary. want find duplicate values of dictionary has string values , keys. tried solutions of remove duplicates of dictionary got error not hashable or no result. here first part of code.
import csv collections import defaultdict import itertools mydict = {} index = 0 reader = csv.reader(open(r"computing.csv", "rb")) i, rows in enumerate(reader): if == 0: continue if len(rows) == 0: continue k = rows[3].strip() v = rows[2].strip() if k in mydict: mydict[k].append(v) else: mydict[k] = [v] #mydict = hash(frozenset(mydict)) print mydict d = {} while true: try: d = defaultdict(list) k,v in mydict.iteritems(): #d[frozenset(mydict.items())] d[v].append(k) except: continue writer = csv.writer(open(r"old.csv", 'wb')) key, value in d.items(): writer.writerow([key, value])
your question unclear. hope got right.
please give example of input columns , desired output columns. please give printout of error , let know line caused error.
if column1=[1,2,3,1,4]
, column2=[a,b,c,d,e]
want output n_column1=[a,2,3,d,4]
, column2 =[1,b,c,d,e]
i imagine exception in d[v].append(k)
since v list. cannot use list key in dictionary.
in [1]: x = [1,2,3,1,4] in [2]: y = ['a','b','c','d','e'] in [5]: collections import defaultdict in [6]: d = defaultdict(int) in [7]: in x: ...: d[a] += 1 in [8]: d out[8]: defaultdict(<type 'int'>, {1: 2, 2: 1, 3: 1, 4: 1}) in [9]: x2 = [] in [10]: a,b in zip(x,y): ....: x2.append(a if d[a]==1 else b) ....: in [11]: x out[11]: [1, 2, 3, 1, 4] in [12]: x2 out[12]: ['a', 2, 3, 'd', 4]
in case, guess if had change code fit. i'd that:
import csv collections import defaultdict import itertools mydict = {} index = 0 reader = csv.reader(open(r"computing.csv", "rb")) histogram = defaultdict(int) k = [] v = [] i, rows in enumerate(reader): if == 0: continue if len(rows) == 0: continue k.append(rows[3].strip()) v.append(rows[2].strip()) item = k[-1] histogram[item] += 1 output_column = [] first_item, second_item in zip(k,v): output_column.append(first_item if histogram[first_item]==1 else second_item) writer = csv.writer(open(r"old.csv", 'wb')) c1, c2 in zip(output_column, v): writer.writerow([c1, c2])
Comments
Post a Comment