• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

dataux: 集成多种 NOSQL 方案的统一 SQL 代理中间件 Sql查询代理到 Elasticsearch,Mo ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

dataux

开源软件地址:

https://gitee.com/mirrors/dataux

开源软件介绍:

Sql Query Proxy to Elasticsearch, Mongo, Kubernetes, BigTable, etc.

Unify disparate data sources and files into a single Federatedview of your data and query with SQL without copying into datawarehouse.

Mysql compatible federated query engine to Elasticsearch, Mongo,Google Datastore, Cassandra, Google BigTable, Kubernetes, file-based sources.This query engine hosts a mysql protocol listener,which rewrites sql queries to native (elasticsearch, mongo, cassandra, kuberntes-rest-api, bigtable).It works by implementing a full relational algebra distributed execution engineto run sql queries and poly-fill missing featuresfrom underlying sources. So, a backend key-value storage such as cassandracan now have complete WHERE clause support as well as aggregate functions etc.

Most similar to prestodb but in Golang, and focused oneasy to add custom data sources as well as REST api sources.

Storage Sources

Features

  • Distributed run queries across multiple servers
  • Hackable Sources Very easy to add a new Source for your custom data, files, json, csv, storage.
  • Hackable Functions Add custom go functions to extend the sql language.
  • Joins Get join functionality between heterogeneous sources.
  • Frontends currently only MySql protocol is supported but RethinkDB (for real-time api) is planned, and are pluggable.
  • Backends Elasticsearch, Google-Datastore, Mongo, Cassandra, BigTable, Kubernetes currently implemented. Csv, Json files, and custom formats (protobuf) are in progress.

Status

  • NOT Production ready. Currently supporting a few non-critical use-cases (ad-hoc queries, support tool) in production.

Try it Out

These examples are:

  1. We are going to create a CSV database of Baseball data from http://seanlahman.com/baseball-archive/statistics/
  2. Connect to Google BigQuery public datasets (you will need a project, but the free quota will probably keep it free).
# download files to local /tmpmkdir -p /tmp/baseballcd /tmp/baseballcurl -Ls http://seanlahman.com/files/database/baseballdatabank-2017.1.zip > bball.zipunzip bball.zipmv baseball*/core/*.csv .rm bball.ziprm -rf baseballdatabank-*# run a docker container locallydocker run -e "LOGGING=debug" --rm -it -p 4000:4000 \  -v /tmp/baseball:/tmp/baseball \  gcr.io/dataux-io/dataux:latest

In another Console open Mysql:

# connect to the docker container you just startedmysql -h 127.0.0.1 -P4000-- Now create a new SourceCREATE source baseball WITH {  "type":"cloudstore",   "schema":"baseball",   "settings" : {     "type": "localfs",     "format": "csv",     "path": "baseball/",     "localpath": "/tmp"  }};show databases;use baseball;show tables;describe appearancesselect count(*) from appearances;select * from appearances limit 10;

Big Query Example

# assuming you are running local, if you are instead in Google Cloud, or Google Container Engine# you don't need the credentials or volume mountdocker run -e "GOOGLE_APPLICATION_CREDENTIALS=/.config/gcloud/application_default_credentials.json" \  -e "LOGGING=debug" \  --rm -it \  -p 4000:4000 \  -v ~/.config/gcloud:/.config/gcloud \  gcr.io/dataux-io/dataux:latest# now that dataux is running use mysql-client to connectmysql -h 127.0.0.1 -P 4000

now run some queries

-- add a bigquery datasourceCREATE source `datauxtest` WITH {    "type":"bigquery",    "schema":"bqsf_bikes",    "table_aliases" : {       "bikeshare_stations" : "bigquery-public-data:san_francisco.bikeshare_stations"    },    "settings" : {      "billing_project" : "your-google-cloud-project",      "data_project" : "bigquery-public-data",      "dataset" : "san_francisco"    }};use bqsf_bikes;show tables;describe film_locations;select * from film_locations limit 10;

Hacking

For now, the goal is to allow this to be used for library, so thevendor is not checked in. use docker containers or dep for now.

# run dep ensuredep ensure -v 

Related Projects, Database Proxies & Multi-Data QL

  • Data-Accessability Making it easier to query, access, share, and use data. Protocol shifting (for accessibility). Sharing/Replication between db types.
  • Scalability/Sharding Implement sharding, connection sharing
NameScalingEase Of Access (sql, etc)Comments
VitessYfor scaling (sharding), very mature
twemproxyYfor scaling memcache
Couchbase N1QLYYsql interface to couchbase k/v (and full-text-index)
prestodbYquery front end to multiple backends, distributed
cratedbYYall-in-one db, not a proxy, sql to es
codisYfor scaling redis
MariaDB MaxScaleYfor scaling mysql/mariadb (sharding) mature
Netflix DynomiteYnot really sql, just multi-store k/v
redishappyYfor scaling redis, haproxy
mixerYsimple mysql sharding

We use more and more databases, flatfiles, message queues, etc.For db's the primary reader/writer is fine but secondary readerssuch as investigating ad-hoc issues means we might be accessingand learning many different query languages.

Credit to mixer, derived mysql connection pieces from it (which was forked from vitess).

Inspiration/Other works

In Internet architectures, data systems are typically categorizedinto source-of-truth systems that serve as primary storesfor the user-generated writes, and derived data stores orindexes which serve reads and other complex queries. The datain these secondary stores is often derived from the primary datathrough custom transformations, sometimes involving complex processingdriven by business logic. Similarly data in caching tiers is derivedfrom reads against the primary data store, but needs to getinvalidated or refreshed when the primary data gets mutated.A fundamental requirement emerging from these kinds of dataarchitectures is the need to reliably capture,flow and process primary data changes.

from Databus

Building

I plan on getting the vendor getting checked in soon so the build will work. HoweverI am currently trying to figure out how to organize packages to allow use as both a libraryas well as a daemon. (see how minimal main.go is, to encourage your own builtins and datasources.)

# for just docker# ensure /vendor has correct versionsdep ensure -update # build binary./.build# build dockerdocker build -t gcr.io/dataux-io/dataux:v0.15.1 .

鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap