What is Hive
A powerful Data analysing application running on top of Hadoop.
Hive provides SQL like language HIVEQL to process data. Initially Hive was originated in Facebook for processing large data set of user and log data. Hive is now Apache Hadoop subproject with many contributors. Hive is providing Data warehouse solution on HADOOP like ETL(extract transform and load), Analysis and Reporting.
Hive is not designed for online transaction processing of data.
Hive has mechanism to impose structure on varieties of data.
Hive by default provides connectors for comma-separated values (CSV) text files but also supports more than one file format to store the data such as:- ORC(Optimized Row Column), RC(Row Column),AVRO,PARQUET, TEXT,SEQUENCE FILE.
Hive can be accessed by 3 way:-
- Web GUI (HUE browser).
- CLI (Command line) Prompt.
- JDBC interface.
- Hive can access files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase
- Hive Query execution is done via Apache Tez, Apache Spark, or MapReduce.