java - How does Hadoop split and combine its output data? -


I think my question will be explained with an example. Say you are storing an image on HDFS. This image is large enough that it is divided into four separate, small files on HDFS when you execute a task that returns the image, then Hadop returns the 4 small files that can be added back in the original image. Could? Or do the hopop automatically align 4 small files again?

Thank you!

The Hydrop Distributed File System (HDFS) stores each file in one or more blocks ( Each block is repeated one or more times).

For each file, you can configure the file block size and the replication factor (if not available then the default value is used).

When you do a file based work, you are working with the streams of data, name nodes block the central repository file path path and their location (data nodes).

By using the example, say that you have a 32 MB file block size and 50MB file - it will be divided into 2 blocks (32 MB and 18 MB). If the configurable replication of the file is called factor 3, then NameNode will try and ensure that each block is duplicated for 3 data nodes in your cluster.

When you try and read from this file, again an FSInputStream has returned, which is like most input streams, you can search for a certain byte status in the file. DFSCLent is separating you from the description, but it knows for a special byte offset, which connects this block and receives the bytes (as you move between block boundaries).

In short, and the customer reading from HDFS should know your question, it looks like a continuous input stream, but it actually needs 4 blocks together selection < / Div>

Comments