2.2.1 数据库

数据库 #

OLTP vs OLAP #

OLTP (On-line Transaction Processing) is involved in the operation of a particular system.

OLAP (On-line Analytical Processing) deals with Historical Data or Archival Data.

参考:

OLAP #

prestodb/presto

Github stars #

The official home of the Presto distributed SQL query engine for big data http://prestodb.github.io

Presto 是 Facebook 开发的分布式大数据 SQL 查询引擎,专门进行快速数据分析。

特点:

  1. 可以将多个数据源的数据进行合并,可以跨越整个组织进行分析。

  2. 直接从 HDFS 读取数据,在使用前不需要大量的 ETL 操作。

apache/druid

Github stars #

Apache Druid: a high performance real-time analytics database. https://druid.apache.org/

Druid 是广告分析公司 Metamarkets 开发的一个用于大数据实时查询和分析的分布式实时处理系统,主要用于广告分析,互联网广告系统监控、度量和网络监控。

apache/impala

Github stars #

Apache Impala https://impala.apache.org

cloudera/Impala
Github stars #

Real-time Query for Hadoop; mirror of Apache Impala http://impala.io


apache/kylin

Github stars #

Apache Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. http://kylin.apache.org.

Apache Kylin 最初由 eBay 开发并贡献至开源社区的分布式分析引擎,提供 Hadoop 之上的 SQL 查询接口及多维分析(OLAP)能力以支持超大规模数据。

apache/hive

Github stars #

Apache Hive https://hive.apache.org/

The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Built on top of Apache Hadoop (TM).