728x90
๋ฐ˜์‘ํ˜•
๐Ÿ’ก
(Apache) Hadoop : High-Availability Distributed Object-Oriented Platform์˜ ์•ฝ์ž๋กœ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” Java๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ
Source : https://velog.io/@shinychan95/0323-2-Hadoop-Ecosystem
  • Apache(์•„ํŒŒ์น˜) ์žฌ๋‹จ์—์„œ ๊ด€๋ฆฌํ•˜๋Š” ํ”„๋ฆฌ์›จ์–ด์ธ Hadoop Project์—์„œ ๋งŒ๋“ค์–ด์ง€๋Š” ๋ชจ๋“  ์†Œํ”„ํŠธ์›จ์–ด ์†”๋ฃจ์…˜๋“ค์˜ ์ง‘ํ•ฉ์„ ์–˜๊ธฐํ•œ๋‹ค.
  • Hadoop์€ HDFS๋ผ๋Š” ํŒŒ์ผ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ๊ณผ Yarn์ด๋ผ๋Š” ๋ฆฌ์†Œ์Šค ๊ด€๋ฆฌ ์‹œ์Šคํ…œ, MapReduce๋ผ๋Š” ๋Œ€์šฉ๋Ÿ‰ ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.
  • ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•˜๋Š” ๋งŽ์€ ์†Œํ”„ํŠธ์›จ์–ด๋“ค์ด ํ•จ๊ป˜ ์žˆ๋Š” ํ”Œ๋žซํผ ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

HDFS (Hadoop Distributed File System)

Source : https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
  • Hadoop์—์„œ ๊ด€๋ฆฌํ•˜๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅ/๊ด€๋ฆฌํ•˜๋Š” ์‹œ์Šคํ…œ (์Šคํ† ๋ฆฌ์ง€ ๊ด€๋ฆฌ)
  • Namenode, Datanode๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ๊ด€๋ฆฌ ํ•œ๋‹ค.
  • node : ์ปดํ“จํ„ฐ ํ•œ ๋Œ€ rack : Node ์—ฌ๋Ÿฌ๋Œ€๋ฅผ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ฌถ์–ด๋†“์€ ๊ฒƒ, ๊ฐ™์€ ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜๋ฅผ ๊ณต์œ ํ•œ๋‹ค. cluster : ์—ฌ๋Ÿฌ rack์„ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ฌถ์€ ๊ฒƒ.
  • Namenode๋Š” ์‹ค์ œ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค Datanode์— ์ €์žฅ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ๋…ธ๋“œ์ด๋‹ค.
  • Datanode์—” ์‹ค์ œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ„์‚ฐ ์ €์žฅ๋œ๋‹ค. (replica๋“ค์ด ์กด์žฌํ•จ)

YARN (Yet Another Resource Negotiator)

  • Hadoop์—์„œ ์ฒ˜๋ฆฌ๋˜๋Š” ๋ชจ๋“  ๋ฆฌ์†Œ์Šค๋ฅผ ๊ด€๋ฆฌํ•ด์ฃผ๋Š” ์‹œ์Šคํ…œ.
  • ์–ด๋–ค ์š”์ฒญ์ด ์–ด๋–ค ํด๋Ÿฌ์Šคํ„ฐ์— ํ• ๋‹น๋˜์–ด์•ผ ํ•˜๋Š”์ง€ ๊ด€๋ฆฌํ•œ๋‹ค.
  • Resource Manager์™€ Node Manager๊ฐ€ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ด€๋ฆฌํ•œ๋‹ค.
  • Resource Manager๋Š” ๊ฐ Node Manager๊ฐ€ ๊ฐ Node์˜ ๋ฆฌ์†Œ์Šค๋ฅผ ์–ผ๋งˆ๋‚˜ ์‚ฌ์šฉํ•˜๋Š”์ง€๋ฅผ ๊ด€๋ฆฌํ•œ๋‹ค.
  • Node Manager๋Š” ๊ฐ Node๋ฅผ ๊ด€๋ฆฌํ•˜๋ฉฐ, Node์˜ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ด€๋ฆฌํ•œ๋‹ค. Resource Manager์˜ ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์—ญํ• ์ด๋‹ค.
  • Client๊ฐ€ ์ž‘์—…์„ ์š”์ฒญํ•˜๋ฉด Resource Manager๊ฐ€ Node Manager๋“ค์„ ๊ด€๋ฆฌํ•˜์—ฌ ๋ฆฌ์†Œ์Šค๋ฅผ ํ• ๋‹นํ•˜๊ณ , ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

MapReduce

  • Hadoop์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘์— ํ•˜๋‚˜์ด๋‹ค. (๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ž„)
  • Map task์™€ Reduce task๊ฐ€ ์žˆ๋‹ค.
  • Map task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋…ธ๋“œ๋ฅผ Mapper, Reduce task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋…ธ๋“œ๋ฅผ Reducer๋ผ๊ณ  ํ•œ๋‹ค.

Process

  1. ์ „์ฒ˜๋ฆฌ๊ธฐ๊ฐ€ Mapper๊ฐ€ ์ผ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์˜ํ•˜์—ฌ Mapper๋“ค์—๊ฒŒ ์ „๋‹ฌํ•œ๋‹ค.
  1. Mapper๋Š” ์ฒ˜๋ฆฌํ•ด์•ผํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ Reducer๊ฐ€ ์—ฐ์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. (mapping)
  1. Reducer์—๊ฒŒ ์ „๋‹ฌํ•˜๊ธฐ ์ „์— ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ์ค€์— ๋งž๊ฒŒ Sort & Merge ํ•œ๋‹ค. (Shuffring)
  1. Reducer๋Š” ์ „๋‹ฌ ๋ฐ›์€ ๋ฐ์ดํ„ฐ์— ์š”์ฒญํ•œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. (Reducing = sum)
  1. Reducer๋“ค์ด ์ฒ˜๋ฆฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ์นœ๋‹ค.

์š”์•ฝ

  • Hadoop Ecosystem์€ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์˜ ํ”Œ๋žซํผ(๋˜๋Š” ์ง‘ํ•ฉ์ฒด).
  • HDFS, YARN, MapReduce ํ•ต์‹ฌ ์ปจ์…‰์— ๋”ฐ๋ผ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•œ๋‹ค.
  • Hadoop Ecosystem์— ์žˆ๋Š” ๊ตฌ์„ฑ ํ”„๋กœ๊ทธ๋žจ๋“ค์€ ๋ชจ๋‘ ์˜คํ”ˆ์†Œ์Šค๋ฉฐ, ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด์„œ ๊ธฐ๋Šฅ๋ณ„๋กœ ๊ตฌํ˜„๋˜๊ณ  ์žˆ๋‹ค.


Hands-on

  1. HDFS์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌ๋œ ๋ธ”๋กœ๊ทธ๋ฅผ 3๊ฐœ ์ฝ๊ณ , ์•„์นด์ด๋น™ํ•ด๋ณด์„ธ์š”.
  1. YARN์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌ๋œ ๋ธ”๋กœ๊ทธ๋ฅผ 3๊ฐœ ์ฝ๊ณ , ์•„์นด์ด๋น™ํ•ด๋ณด์„ธ์š”.
  1. MapReduce์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌ๋œ ๋ธ”๋กœ๊ทธ๋ฅผ 3๊ฐœ ์ฝ๊ณ , ์•„์นด์ด๋น™ํ•ด๋ณด์„ธ์š”.

Uploaded by N2T

728x90
๋ฐ˜์‘ํ˜•

+ Recent posts