Return to site

Avro Data Format In Hadoop For Big Data Application

· Big Data,Hadoop,Hadoop Architechure

Professionals of Big data Hadoop introducethe Avro data format and its use in Hadoop and big data development.You can go through the article and learn the concept of Avro data format asexplained by professionals. 

 Introduction To Avro dataformat 

Avrois data serialization system. It provides fast, binary data format on schema.When we read the Avro format, schema always used to write the data. This makesthe serialization very fast with small time. 

Theinput data with Avro is defined format data type so it will help otherdevelopers ensure the input format. When we store the data with Avro, the schemafile will be stored at the same time so that files can be process with otherprogram from other developers. 

Avro

 Environment 

Java:JDK 1.7 

Avrotool: http://www.us.apache.org/dist/avro/avro-1.7.4/java/avro-tools-1.7.4.jar 

Clouderaversion: CDH4.6.0 

 Initial steps 

  1.  We needto prepare Avro tool at the above download link. 
  2.  We needto prepare some input data in JSON format to convert to Avro binary data andconvert from Avro binary data to JSON.

schemaFile.avscfile is schema file for Avro, this is sample avsc file:  

{

 "type" : "record",

 "name" : "twitter_schema",

 "namespace" : "com.hn.avro",

 "fields" : [ {

 "name" : "id",

 "type" : "string",

 "doc" : "id of user"

 }, {

 "name" : "name",

 "type" : "string",

 "doc" : "name of user"

 }, {

 "name" : "timestamp",

 "type" : "long",

 "doc" : "Unix epoch time in seconds"

 } ],

 "doc:" : "Basic information of user"

 }

bigbasicInfor.json file is the real data filewhich will be converted to Avro with Avro tool:

{"id":"1","name":"Jean","timestamp": 1366123856 }

{"id":"2","name":"Jang","timestamp": 1366489544 }

3. Put files to HDFS with the command 

      hadoop fs -mkdir -p /data/mysample/avroTable

      hadoop fs -put basicInfor.json/data/mysample/avroTable

      hadoop fs -put schemaFile.avsc/data/mysample/avroTableFormat

Code walk through 

This command will convert the data from json data to Avro format. 

 java -jar avro-tools-1.7.4.jar fromjson --schema-file schemaFile.avscbasicInfor.json>basicInfor.avro

Thesecommands will convert the data from avro format to JSON file format for bothcompression data and text data: 

java -jar avro-tools-1.7.4.jar tojsonbasicInfor.avro>basicInfor.json

java -jar avro-tools-1.7.4.jar tojsonbasicInfor.snappy.avro>basicInfor.json

Thesecommands will create the schema file from avro format to avsc file format forboth compression data and text data: 

java -jar avro-tools-1.7.4.jar getschemabasicInfor.avro>schemaFile.avsc

java -jar avro-tools-1.7.4.jar getschemabasicInfor.snappy.avro>schemaFile2.avsc

Thiscommand creates mapping table for Avro data in HDFS. After run this command, weare able to do the query with Avro data: 

CREATE EXTERNAL TABLE avroHive

 COMMENT "Mapping avro table in Hive"

 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'

 STORED AS

 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

 LOCATION '/data/mysample/avroTable'

 TBLPROPERTIES (

 'avro.schema.url'=/data/mysample/avroTableFormat'

 );

Verify the result 

Weare able to check in the local file to look the data with both avsc for schema,json file for the json data and avro file with avro format 

cat  filename

Wealso check the table in hive with command after access to hive client withcommand: 

hive

show  tables;

Hope that this blog can help you guys understand how we can interact with Avro file inHadoop and in big data application. 

Professionals of bigdata hadoop architect have introduced Avro data format inthis post. If you want to know more about the format or hadoop development, askfreely in comments. 

Related Post: