merge - Hadoop - look for matching names in two customer lists -


i have 2 lists of people different events; matching names of people amongst lists, matching companies. understand potentially there people same name in each list not same people, find matches.

first list example:
name, company, title
john doe, acme corporation, elephant trainer
jane smith, acme corporation, ceo
john smith, widgets-r-us, janitor
+10,000's of rows

second list example:
name, company
fred smith, acme corporation
john smith, widgets-r-us
john smith, company xyz
jane smith, company xyz
+10,000's of rows

desired output
matching names:
john smith
jane smith

matching companies:
acme corporation
widgets-r-us

i running in aws environment, , new hadoop. programming language fine. know how in excel, want able scale on time more lists of names (each in own csv file).

thank kindly!

you need mapper implementation in emit name , company name text , intwritable.
protected void map(longwritable key, text value, context context) throws ioexception, interruptedexception{ /*some logic derive person name or company name.*/ string name = value.split(',')[0]; context.write(new text(value),new intwritable(1)); }

the implementation of reduce method in reducer similar to
public void reduce(text key, iterable<intwritable> values,context context)throws ioexception, interruptedexception{ int count = 1; for(intwritable val: values){count++;} //you unique names no of times repeated. context.write(key,new intwritable(count)); }
hope helps.


Comments

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -