python - Find duplicates of two columns from csv -

August 15, 2011

i want find duplicate values of 1 column , replaced value of column of csv has multiple columns. first put 2 columns csv dictionary. want find duplicate values of dictionary has string values , keys. tried solutions of remove duplicates of dictionary got error not hashable or no result. here first part of code.

import csv collections import defaultdict import itertools  mydict = {} index = 0 reader = csv.reader(open(r"computing.csv", "rb")) i, rows in enumerate(reader):     if == 0:      continue      if len(rows) == 0:         continue     k = rows[3].strip()           v = rows[2].strip()         if k in mydict:                 mydict[k].append(v)     else:         mydict[k] = [v]  #mydict = hash(frozenset(mydict))  print mydict  d = {} while true:     try:                 d = defaultdict(list)         k,v in mydict.iteritems():             #d[frozenset(mydict.items())]             d[v].append(k)     except:         continue  writer = csv.writer(open(r"old.csv", 'wb')) key, value in d.items():     writer.writerow([key, value])

your question unclear. hope got right.

please give example of input columns , desired output columns. please give printout of error , let know line caused error.

if column1=[1,2,3,1,4] , column2=[a,b,c,d,e] want output n_column1=[a,2,3,d,4] , column2 =[1,b,c,d,e]

i imagine exception in d[v].append(k) since v list. cannot use list key in dictionary.

in [1]: x = [1,2,3,1,4]  in [2]: y = ['a','b','c','d','e']  in [5]: collections import defaultdict  in [6]: d = defaultdict(int)  in [7]: in x:    ...:     d[a] += 1   in [8]: d out[8]: defaultdict(<type 'int'>, {1: 2, 2: 1, 3: 1, 4: 1})  in [9]: x2 = []  in [10]: a,b in zip(x,y):    ....:     x2.append(a if d[a]==1 else b)    ....:       in [11]: x out[11]: [1, 2, 3, 1, 4]  in [12]: x2 out[12]: ['a', 2, 3, 'd', 4]

in case, guess if had change code fit. i'd that:

import csv collections import defaultdict import itertools  mydict = {} index = 0 reader = csv.reader(open(r"computing.csv", "rb")) histogram = defaultdict(int) k = [] v = [] i, rows in enumerate(reader):     if == 0:          continue      if len(rows) == 0:         continue     k.append(rows[3].strip())     v.append(rows[2].strip())      item = k[-1]     histogram[item] += 1  output_column = []  first_item, second_item in zip(k,v):     output_column.append(first_item if histogram[first_item]==1 else second_item)  writer = csv.writer(open(r"old.csv", 'wb')) c1, c2 in zip(output_column, v):     writer.writerow([c1, c2])

Search This Blog

DIs

python - Find duplicates of two columns from csv -

Comments

Post a Comment

Popular posts from this blog

php - mySql Join with 4 tables -

css - Text drops down with smaller window -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -