Hadoop代写:CS4008InstagramMicro


使用Hadoop的MapReduce,对数据进行归并分类处理。

Requirement

The instagram-micro.csv contains information of random 5 million photos on
Instagram. The content includes:

  1. userId - The ID of user
  2. photoId - The ID of the photo
  3. createdTime - Time of the photo posted by user
  4. filter - Filter type used in the photo
  5. likes - Number of comments
  6. comments - Number of likes
    In this assignment, you are asked to understand what kind of filter is popular
    and what is not. Specifically, you need to
  7. Develop a MapReduce program to count each type of filters.
  8. Design another pair of mapper and reducer, which takes the output from Step 1 as the input, and rank the filters by their frequencies in a decreasing order. (Hint: the default output of mapper is order by the keys (not the values), e.g., aa, ab, ac, ad, ae, etc.)
    The final output format is the default of Hadoop, which is key value pair
    separated by tab. For example, filterAAAA 9999 filterBBB 5555 filterCC 111

Submission

  1. Submit all the java code files needed for the task, including all the mappers, reducers, and driver.
  2. Change the name of your output file, part-r-00000, with your UIN without extension (e.g. 123456789) and upload it to Blackboard.
    Note: You can configure multiple Mapper/Reducer in Driver class. For example,
    Configuration conf1 = new Configuration();
    Job job1 = Job.getInstance(conf1, “whatever name of job1”);
    ……
    ……
    Configuration conf2 = new Configuration();
    Job job2 = Job.getInstance(conf2, “whatever name of job2”);
    FileInputFormat.addInputPath(job2, new Path(“path to output of job1”));
    ……
    ……
    job1.waitForCompletion(true); // execute job1
    job2.waitForCompletion(true); // execute job2 after job1 is done
    —|—

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录