hadoop - How to store and analyze timestamped logs in HDFS -


i have lot of log lines, each time, want store in hdfs , analyze. want run mapreduce jobs process lines within given time frame (last 5 minutes, last hour).

i'm looking pointers started. and, alternatives (e.g., storing lines in hbase? other platform?)

my 2 cents :

you use hbase that. read in each line of file, take out ts field , use rowkey , store rest of line in column. our table have 1 column. allow faster range queries, need(last 5 mins, last hour etc). , avoid regionserver hotspotting, create pre-splitted tables.

alternatively, store data in hive table partitioned ts , processing through hiveql. or bucket data based on ts. easy , straightforward.

hth


Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -