hadoop - How to store and analyze timestamped logs in HDFS -
i have lot of log lines, each time, want store in hdfs , analyze. want run mapreduce jobs process lines within given time frame (last 5 minutes, last hour).
i'm looking pointers started. and, alternatives (e.g., storing lines in hbase? other platform?)
my 2 cents :
you use hbase that. read in each line of file, take out ts field , use rowkey , store rest of line in column. our table have 1 column. allow faster range queries, need(last 5 mins, last hour etc). , avoid regionserver hotspotting, create pre-splitted tables.
alternatively, store data in hive table partitioned ts , processing through hiveql. or bucket data based on ts. easy , straightforward.
hth
Comments
Post a Comment