diff --git a/zh/01-intro.md b/zh/01-intro.md new file mode 100644 index 0000000..93a8ebb --- /dev/null +++ b/zh/01-intro.md @@ -0,0 +1,64 @@ +# Introduction + +## Downtime Roulette + +![Gambling With Uptime](../assets/decor/roulette.png) + +在赌场图片轮盘赌,任何特定的数字有一个37的机会被击中。想象一下,你可以单打一个给定的数字将*不会*被打击(约97.3%在你的支持),赢取将支付10倍你的赌注。你会打赌吗?我的钱包快到我的钱包,我的拇指会在我的口袋里开火。 + +现在想象你可以再次打赌,但是只有在这个轮对你有利的时候才能获得100次旋转,否则你会输。你还会玩吗赢得单打可能会很容易,但是在许多试验中,赔率并不符合你的利益。 + +人们随时随地都进行数据投注。单个服务器有很好的机会可用。当您运行具有数千台服务器或数十亿个请求的群集时,任何一个分解的可能性就成为规则。 + +根据十亿次机会,一次一百万次的灾难是很常见的。 + +## 什么是Riak + +Riak是用于高可用性,容错和近线性可扩展性的开源,分布式键/值数据库。简而言之,Riak具有非常高的正常运行时间,并与您一起成长。 + + +随着现代世界与日益复杂的联系紧密联系,信息管理发生重大转变。 网络和网络设备刺激了世界历史上看不到的数据收集和访问的爆炸式增长。 存储和管理的价值的数量继续以惊人的速度增长,同时,更多的人比以往需要快速可靠地访问这些数据。 这个趋势被称为*大数据*。 + + + +

总是在Riak上投注

+ +Riak的优点是大容量(可以在需要时读取和写入数据),高速度(容易响应增长)以及各种信息数据(您可以将任何类型的数据存储为值)。 + +基于*Amazon Dynamo*设计,Riak被建立为真正的Big Data问题的解决方案。 Dynamo是一种高度可用的设计,意味着它可以以非常大的规模快速响应请求,即使您的应用程序每天正在存储和提供数TB数据。 Riak在2009年被开放使用之前已经被用于生产中.Github,Comcast,Voxer,Disqus等人目前使用的是较大的系统,其中存储数百TB的数据,并且每个节点每天处理几GB。 +Riak是用Erlang编程语言编写的。 Erlang被选中,因为它强烈支持并发,稳固的分布式通信,热代码加载和容错能力。 它运行在虚拟机上,所以运行Riak需要安装Erlang。 + +那么你应该用Riak吗 潜在用户的一个很好的经验法则是询问自己,每一个停机时间将会以某种方式(金钱,用户等)花费你的钱。 并不是所有系统都需要这么多的正常运行时间,如果没有,Riak可能不适合你。 + +## 关于这本书 + +这不是一个“安装和跟进”指南。 这是一个“阅读和理解”指南。 在开始这本书的时候,不要害怕有Riak,甚至有电脑的方便。 您可能会觉得在某个时候安装,如果是这样,可以在[Riak docs](http://docs.basho.com)中找到说明. + + +在我看来,这本书最重要的部分是[概念篇](#concepts)。 如果你已经有了一些知识,可能会开始缓慢,但是赶快赶上。 在开展理论基础之后,我们将通过学习如何查询和修改某些设置来帮助[开发人员](#developers)使用Riak。 最后,我们将介绍[operators](#operators)应该知道的基本细节,例如如何设置Riak集群,配置一些值,使用可选工具等等。 + + +## 2.0新功能 + +Riak 2.0代表了Riak作为数据存储的能力和重点的重大转变。 Riak一直主要关注操作简单性,而且还没有改变。 但是,在设计决策时,操作始终优先于开发人员的需求。 这正在改变。 随着2.0的推出,我们添加了一些开发人员想要看到的功能。 也就是说,以下: + +* __Strong Consistency__ Riak仍然是最终一致的,但现在你有选择。 Riak现在是管理数据库的最简单的方法,可以平滑地调整AP和CP之间的每个数据桶的频谱。 +* __Better Search__。 Riak的制造商通过利用Solr搜索引擎的力量改进了搜索。您现在可以获得分布式Solr的所有可查询性,而无需手动索引的麻烦。 +* __Datatypes__。 Riak历史上通过允许存储任何二进制对象来提供存储灵活性。这仍然是这样,但现在您可以选择存储分布式地图,集合,计数器和标志,以便面对冲突自动收敛。 +* __Security__。一个长期的要求,一天终于来了。本地组/用户访问控制。 +* __Bucket types__。现在,您可以支持无限制的自定义桶属性,而不需要旧的八卦协议的开销。 +* __Ring调整大小__。最后!在过去,您被限制为固定的环尺寸,您现在可以选择动态增加/减少群集中的vnode数量。 +* __其他改进__我们还进行了许多其他改进,例如简化的配置管理(不再混淆`app.config`和`vm.args`),减少兄弟的爆炸(通过一个称为DVV的新逻辑时钟),改进内部元数据共享(减少八卦喋喋不休),更好的AAE等等。 + +本书还包括由John Daily编写的新篇章,以帮助指导开发人员使用Riak编写有效的应用程序。我们希望你喜欢新的,改进的,*不是很小的Riak书*。 diff --git a/zh/02-concepts.md b/zh/02-concepts.md new file mode 100644 index 0000000..e08134e --- /dev/null +++ b/zh/02-concepts.md @@ -0,0 +1,449 @@ +# 概念 + +相信我,亲爱的读者,当我建议以分布式的方式思考是尴尬的。当我第一次遇到Riak时,我没有准备好一些更前卫的概念。我们的大脑并不是用分布式异步的方式来思考的。理查德·道金斯(Richard Dawkins)创造了*中世纪* ---人类每天遇到的连续的人造地带,这个地区存在于极度陌生的夸克与极空的空间之间。 + +我们不清楚这些极端,因为我们每天都不会遇到它们,就像分布式计算和存储一样。所以我们创造了模型和工具,使分散的并行资源的物理行为符合我们更普通的同步条件。虽然Riak需要很大的努力来简化硬件部件,但它并不假装它们不存在。就像你永远不会希望在没有任何内存或CPU管理知识的情况下在专家级进行编程,所以如果不牢固掌握几个基本概念,您也不会安全地开发高可用性集群。 + + + +## 景观 + +像Riak这样的数据库的存在是两个基本趋势的结果:无障碍技术刺激了不同的数据需求,以及数据管理市场的差距。 + + + +首先,随着技术的不断提高以及成本的降低,大量的计算能力和存储空间现在几乎掌握在任何人手中。随着网络日益相互关联的世界和互联网的萎缩,便宜的计算机(如智能手机),这促使了数据的指数增长,以及对savvier用户更多的可预测性和速度的需求。换句话说,在前端正在创建更多的数据,而在后台管理更多的数据。 + +第二,关系数据库管理系统(RDBMS)已经成为多年来针对一系列用例(如商业智能)的重点。即使在廉价商品(和虚拟化)服务器的水平增长越来越具有吸引力的同时,他们也在技术上针对单个较大服务器的性能进行调整,例如优化磁盘访问。由于关系实现中的裂缝变得显而易见,自定义实现是针对关系数据库最初未设想的特定问题而产生的。 + +这些新的数据库是在绰号*NoSQL*下收集的,而Riak则是这样的。 + +

数据库模型

+ +现代数据库可以根据它们代表数据的方式进行松散分组。 虽然我提供了5种主要类型(最后4个被认为是NoSQL模型),但这些行通常是模糊的,你可以使用一些键/值存储作为文档存储,也可以使用关系数据库来存储键/值数据。 + + + + + + 1. **关系**。 传统数据库通常使用SQL来建模和查询数据。 +     它们对于可以存储在高度结构化模式中的数据有用 +     而非灵活查询。 扩展关系数据库(RDBMS) +     则是由更强大的硬件(垂直增长)发生。 + + 示例: *PostgreSQL*, *MySQL*, *Oracle* + 2. **图形**。 图形存在于高度互联的数据中。 他们擅长 +     建立节点之间的复杂关系,并且许多都可以实现 +     处理多达数十亿个节点和关系(或边和顶点)。 我倾向于将* triplestores *和* object DBs *作为特殊变体。 + + 示例: *Neo4j*, *Graphbase*, *InfiniteGraph* + 3. **文件**。 文档数据存储模型分层值称为文档, +     以JSON或XML格式表示,并且不强制执行文档模式。 +     它们通常支持跨多个服务器分布(横向增长)。 + +     示例:* CouchDB *,* MongoDB *,* Couchbase * + 4. **柱**。 由[Google的BigTable](http://research.google.com/archive/bigtable.html)推广, +     存在这种形式的数据库可以跨多个服务器进行扩展,并将类似的数据分组 +     列组。 列值可以单独版本化和管理,但组 +     预先定义,与RDBMS模式不同。 + +     示例:* HBase *,* Cassandra *,* BigTable * + 5. **键/值**。 键/值或KV存储,在概念上像哈希表, +     其中值由不可变键存储和访问。 他们的范围从 +     单服务器品种* Memcached *用于高速缓存,至 +     多数据中心分布式系统,如* Riak Enterprise *。 + +     示例:* Riak *,* Redis *,* Voldemort * + +## Riak组件 + +Riak是一个键/值(KV)数据库,从数据库构建的角度来看,可以将数据安全地分布在称为节点的物理服务器集群上。 +Riak群集也被称为戒指(我们将介绍以后的原因)。 + + + + + +

Key and Value

+ +![A Key is an Address](../assets/decor/addresses.png) + +Key/value is the most basic construct in all of computerdom. You can think of a key like a home address, such as Bob's house with the unique key 5124, while the value would be maybe Bob (and his stuff). + +```javascript +hashtable["5124"] = "Bob" +``` + +Retrieving Bob is as easy as going to his house. + +```javascript +bob = hashtable["5124"] +``` + +Let's say that poor old Bob dies, and Claire moves into this house. The address remains the same, but the contents have changed. + +```javascript +hashtable["5124"] = "Claire" +``` + +Successive requests for `5124` will now return `Claire`. + +

Buckets

+ + + +Addresses in Riakville are more than a house number, but also a street. There could be another 5124 on another street, so the way we can ensure a unique address is by requiring both, as in *5124 Main Street*. + +*Buckets* in Riak are analogous to street names: they provide logical [namespaces](http://en.wikipedia.org/wiki/Namespace) so that identical keys in different buckets will not conflict. + +For example, while Alice may live at *5122 Main Street*, there may be a gas station at *5122 Bagshot Row*. + +```javascript +main["5122"] = "Alice" +bagshot["5122"] = "Gas" +``` + +Certainly you could have just named your keys `main_5122` and `bagshot_5122`, but buckets allow for cleaner key naming, and have other benefits, such as custom properties. For example, to add new Riak Search 2.0 indexes to a bucket, you might tell Riak to index all values under a bucket like this: + +```javascript +main.props = {"search_index":"homes"} +``` + +Buckets are so useful in Riak that all keys must belong to a bucket. There is no global namespace. The true definition of a unique key in Riak is actually `bucket/key`. + +

Bucket Types

+ +Starting in Riak 2.0, there now exists a level above buckets, called bucket types. Bucket types are groups of buckets with a similar set of properties. So for the example above, it would be like a bucket of keys: + +```javascript +places["main"]["5122"] = "Alice" +places["bagshot"]["5122"] = "Gas" +``` + +The benefit here is that a group of distinct buckets can share properties. + +```javascript +places.props = {"search_index":"anyplace"} +``` + +This has practical implications. Previously, you were limited to how many custom bucket properties Riak could support, because any slight change from the default would have to be propogated to every other node in the cluster (via the gossip protocol). If you had ten thousand custom buckets, that's ten thousand values that were routinely sent amongst every member. Quickly, your system could be overloaded with that chatter, called a *gossip storm*. + +With the addition of bucket types, and the improved communication mechanism that accompanies it, there's no limit to your bucket count. It also makes managing multiple buckets easier, since every bucket of a type inherits the common properties, you can make across-the-board changes trivially. + +Due to its versatility (and downright necessity in some cases) and improved performance, Basho recommends using bucket types whenever possible from this point into the future. + +For convenience, we call a *type/bucket/key + value* pair an *object*, sparing ourselves the verbosity of "X key in the Y bucket with the Z type, and its value". + + +## Replication and Partitions + +Distributing data across several nodes is how Riak is able to remain highly available, tolerating outages and network partitions. Riak combines two styles of distribution to achieve this: [replication](http://en.wikipedia.org/wiki/Replication) and [partitions](http://en.wikipedia.org/wiki/Partition). + +

Replication

+ +**Replication** is the act of duplicating data across multiple servers. Riak replicates by default. + +The obvious benefit of replication is that if one node goes down, nodes that contain replicated data remain available to serve requests. In other words, the system remains *available*. + +For example, imagine you have a list of country keys, whose values are those countries' capitals. If all you do is replicate that data to 2 servers, you would have 2 duplicate databases. + +![Replication](../assets/replication.svg) + +The downside with replication is that you are multiplying the amount of storage required for every duplicate. There is also some network overhead with this approach, since values must also be routed to all replicated nodes on write. But there is a more insidious problem with this approach, which I will cover shortly. + + +

Partitions

+ +A **partition** is how we divide a set of keys onto separate physical servers. Rather than duplicate values, we pick one server to exclusively host a range of keys, and the other servers to host remaining non-overlapping ranges. + +With partitioning, our total capacity can increase without any big expensive hardware, just lots of cheap commodity servers. If we decided to partition our database into 1000 parts across 1000 nodes, we have (hypothetically) reduced the amount of work any particular server must do to 1/1000th. + +For example, if we partition our countries into 2 servers, we might put all countries beginning with letters A-N into Node A, and O-Z into Node B. + +![Partitions](../assets/partitions.svg) + +There is a bit of overhead to the partition approach. Some service must keep track of what range of values live on which node. A requesting application must know that the key `Spain` will be routed to Node B, not Node A. + +There's also another downside. Unlike replication, simple partitioning of data actually *decreases* uptime. If one node goes down, that entire partition of data is unavailable. This is why Riak uses both replication and partitioning. + +

Replication+Partitions

+ +Since partitions allow us to increase capacity, and replication improves availability, Riak combines them. We partition data across multiple nodes, as well as replicate that data into multiple nodes. + +Where our previous example partitioned data into 2 nodes, we can replicate each of those partitions into 2 more nodes, for a total of 4. + +Our server count has increased, but so has our capacity and reliability. If you're designing a horizontally scalable system by partitioning data, you must deal with replicating those partitions. + +The Riak team suggests a minimum of 5 nodes for a Riak cluster, and replicating to 3 nodes (this setting is called `n_val`, for the number of *nodes* on which to replicate each object). + +![Replication Partitions](../assets/replpart.svg) + + + +

The Ring

+ +Riak applies *consistent hashing* to map objects along the edge of a circle (the ring). + +Riak partitions are not mapped alphabetically (as we used in the examples above), but instead a partition marks a range of key hashes (SHA-1 function applied to a key). The maximum hash value is 2^160, and divided into some number of partitions---64 partitions by default (the Riak config setting is `ring_creation_size`). + +Let's walk through what all that means. If you have the key `favorite`, applying the SHA-1 algorithm would return `7501 7a36 ec07 fd4c 377a 0d2a 0114 00ab 193e 61db` in hexadecimal. With 64 partitions, each has 1/64 of the `2^160` possible values, making the first partition range from 0 to `2^154-1`, the second range is `2^154` to `2*2^154-1`, and so on, up to the last partition `63*2^154-1` to `2^160-1`. + + + + +We won't do all of the math, but trust me when I say `favorite` falls within the range of partition 3. + +If we visualize our 64 partitions as a ring, `favorite` falls here. + +![Riak Ring](../assets/ring0.svg) + +"Didn't he say that Riak suggests a minimum of 5 nodes? How can we put 64 partitions on 5 nodes?" We just give each node more than one partition, each of which is managed by a *vnode*, or *virtual node*. + +We count around the ring of vnodes in order, assigning each node to the next available vnode, until all vnodes are accounted for. So partition/vnode 1 would be owned by Node A, vnode 2 owned by Node B, up to vnode 5 owned by Node E. Then we continue by giving Node A vnode 6, Node B vnode 7, and so on, until our vnodes have been exhausted, leaving us this list. + +* A = [1,6,11,16,21,26,31,36,41,46,51,56,61] +* B = [2,7,12,17,22,27,32,37,42,47,52,57,62] +* C = [3,8,13,18,23,28,33,38,43,48,53,58,63] +* D = [4,9,14,19,24,29,34,39,44,49,54,59,64] +* E = [5,10,15,20,25,30,35,40,45,50,55,60] + +So far we've partitioned the ring, but what about replication? When we write a new value to Riak, it will replicate the result in some number of nodes, defined by a setting called `n_val`. In our 5 node cluster it defaults to 3. + +So when we write our `favorite` object to vnode 3, it will be replicated to vnodes 4 and 5. This places the object in physical nodes C, D, and E. Once the write is complete, even if node C crashes, the value is still available on 2 other nodes. This is the secret of Riak's high availability. + +We can visualize the Ring with its vnodes, managing nodes, and where `favorite` will go. + +![Riak Ring](../assets/ring1.svg) + +The Ring is more than just a circular array of hash partitions. It's also a system of metadata that gets copied to every node. Each node is aware of every other node in the cluster, which nodes own which vnodes, and other system data. + +Armed with this information, requests for data can target any node. It will horizontally access data from the proper nodes, and return the result. + +## Practical Tradeoffs + +So far we've covered the good parts of partitioning and replication: highly available when responding to requests, and inexpensive capacity scaling on commodity hardware. With the clear benefits of horizontal scaling, why is it not more common? + +

CAP Theorem

+ +Classic RDBMS databases are *write consistent*. Once a write is confirmed, successive reads are guaranteed to return the newest value. If I save the value `cold pizza` to my key `favorite`, every future read will consistently return `cold pizza` until I change it. + + + +But when values are distributed, *consistency* might not be guaranteed. In the middle of an object's replication, two servers could have different results. When we update `favorite` to `cold pizza` on one node, another node might contain the older value `pizza`, because of a network connectivity problem. If you request the value of `favorite` on either side of a network partition, two different results could possibly be returned---the database is inconsistent. + +If consistency should not be compromised in a distributed database, we can choose to sacrifice *availability* instead. We may, for instance, decide to lock the entire database during a write, and simply refuse to serve requests until that value has been replicated to all relevant nodes. Clients have to wait while their results can be brought into a consistent state (ensuring all replicas will return the same value) or fail if the nodes have trouble communicating. For many high-traffic read/write use-cases, like an online shopping cart where even minor delays will cause people to just shop elsewhere, this is not an acceptable sacrifice. + +This tradeoff is known as Brewer's CAP theorem. CAP loosely states that you can have a C (consistent), A (available), or P (partition-tolerant) system, but you can only choose 2. Assuming your system is distributed, you're going to be partition-tolerant, meaning, that your network can tolerate packet loss. If a network partition occurs between nodes, your servers still run. So your only real choices are CP or AP. Riak 2.0 supports both modes. + + + +

Strong Consistency

+ +Since version 2.0, Riak now supports strong Consistency (SC), as well as High Availability (HA). "Waitaminute!" I hear you say, "doesn't that break the CAP theorem?" Not the way Riak does it. Riak supports setting a bucket type property as strongly consistent. Any bucket of that type is now SC. Meaning, that a request is either successfully replicated to a majority of partitions, or it fails (if you want to sound fancy at parties, just say "Riak SC uses a variant of the vertical Paxos leader election algorithm"). + +This, naturally, comes at a cost. As we know from the CAP theorem, if too many nodes are down, the write will fail. You'll have to repair your node or network, and try the write again. In short, you've lost high availability. If you don't absolutely need strong consistency, consider staying with the high availability default, and tuning it to your needs as we'll see in the next section. + + +

Tunable Availability with N/R/W

+ +A question the CAP theorem demands you answer with a distributed system is: do I give up strong consistency, or give up ensured availability? If a request comes in, do I lock out requests until I can enforce consistency across the nodes? Or do I serve requests at all costs, with the caveat that the database may become inconsistent? + +Riak's solution is based on Amazon Dynamo's novel approach of a *tunable* AP system. It takes advantage of the fact that, though the CAP theorem is true, you can choose what kind of tradeoffs you're willing to make. Riak is highly available to serve requests, with the ability to tune its level of availability---nearing, but never quite reaching, strong consistency. If you want strong consistency, you'll need to create a special SC bucket type, which we'll see in a later chapter. + + + +Riak allows you to choose how many nodes you want to replicate an object to, and how many nodes must be written to or read from per request. These values are settings labeled `n_val` (the number of nodes to replicate to), `r` (the number of nodes read from before returning), and `w` (the number of nodes written to before considered successful). + +A thought experiment may help clarify things. + +![NRW](../assets/nrw.svg) + +

N

+ +With our 5 node cluster, having an `n_val=3` means values will eventually replicate to 3 nodes, as we've discussed above. This is the *N value*. You can set other values (R,W) to equal the `n_val` number with the shorthand `all`. + +

W

+ +But you may not wish to wait for all nodes to be written to before returning. You can choose to wait for all 3 to finish writing (`w=3` or `w=all`), which means my values are more likely to be consistent. Or you could choose to wait for only 1 complete write (`w=1`), and allow the remaining 2 nodes to write asynchronously, which returns a response quicker but increases the odds of reading an inconsistent value in the short term. This is the *W value*. + +In other words, setting `w=all` would help ensure your system was more likely to be consistent, at the expense of waiting longer, with a chance that your write would fail if fewer than 3 nodes were available (meaning, over half of your total servers are down). + +A failed write, however, is not necessarily a true failure. The client will receive an error message, but the write will typically still have succeeded on some number of nodes smaller than the *W* value, and will typically eventually be propagated to all of the nodes that should have it. + +

R

+ +Reading involves similar tradeoffs. To ensure you have the most recent value, you can read from all 3 nodes containing objects (`r=all`). Even if only 1 of 3 nodes has the most recent value, we can compare all nodes against each other and choose the latest one, thus ensuring some consistency. Remember when I mentioned that RDBMS databases were *write consistent*? This is close to *read consistency*. Just like `w=all`, however, the read will fail unless 3 nodes are available to be read. Finally, if you only want to quickly read any value, `r=1` has low latency, and is likely consistent if `w=all`. + +In general terms, the N/R/W values are Riak's way of allowing you to trade lower consistency for more availability. + +

Logical Clock

+ +If you've followed thus far, I only have one more conceptual wrench to throw at you. I wrote earlier that with `r=all`, we can "compare all nodes against each other and choose the latest one." But how do we know which is the latest value? This is where logical clocks like *vector clocks* (aka *vclocks*) come into play. + + + +Vector clocks measure a sequence of events, just like a normal clock. But since we can't reasonably keep the clocks on dozens, or hundreds, or thousands of servers in sync (without really exotic hardware, like geosynchronized atomic clocks, or quantum entanglement), we instead keep a running history of updates, and look for logical, rather than temporal, causality. + +Let's use our `favorite` example again, but this time we have 3 people trying to come to a consensus on their favorite food: Aaron, Britney, and Carrie. These people are called *actors*, ie. the things responsible for the updates. We'll track the value each actor has chosen along with the relevant vector clock. + +(To illustrate vector clocks in action, we're cheating a bit. Riak doesn't track vector clocks via the client that initiated the request, but rather, via the server that coordinates the write request; nonetheless, the concept is the same. We'll cheat further by disregarding the timestamp that is stored with vector clocks.) + +When Aaron sets the `favorite` object to `pizza`, a vector clock could contain his name and the number of updates he's performed. + +```yaml +bucket: food +key: favorite + +vclock: {Aaron: 1} +value: pizza +``` + +Britney now comes along, and reads `favorite`, but decides to update `pizza` to `cold pizza`. When using vclocks, she must provide the vclock returned from the request she wants to update. This is how Riak can help ensure you're updating a previous value, and not merely overwriting with your own. + +```yaml +bucket: food +key: favorite + +vclock: {Aaron: 1, Britney: 1} +value: cold pizza +``` + +At the same time as Britney, Carrie decides that pizza was a terrible choice, and tried to change the value to `lasagna`. + +```yaml +bucket: food +key: favorite + +vclock: {Aaron: 1, Carrie: 1} +value: lasagna +``` + +This presents a problem, because there are now two vector clocks in play that diverge from `{Aaron: 1}`. By default, Riak will store both values. + +Later in the day Britney checks again, but this time she gets the two conflicting values (aka *siblings*, which we'll discuss in more detail in the next chapter), with two vclocks. + +```yaml +bucket: food +key: favorite + +vclock: {Aaron: 1, Britney: 1} +value: cold pizza +--- +vclock: {Aaron: 1, Carrie: 1} +value: lasagna +``` + +It's clear that a decision must be made. Perhaps Britney knows that Aaron's original request was for `pizza`, and thus two people generally agreed on `pizza`, so she resolves the conflict choosing that and providing a new vclock. + +```yaml +bucket: food +key: favorite + +vclock: {Aaron: 1, Carrie: 1, Britney: 2} +value: pizza +``` + +Now we are back to the simple case, where requesting the value of `favorite` will just return the agreed upon `pizza`. + +If you're a programmer, you may notice that this is not unlike a version control system, like **git**, where conflicting branches may require manual merging into one. + +

Datatypes

+ +New in Riak 2.0 is the concept of datatypes. In the preceding logical clock example, we were responsible for resolving the conflicting values. This is because in the normal case, Riak has no idea what object's you're giving it. That is to say, Riak values are *opaque*. This is actually a powerful construct, since it allows you to store any type of value you want, from plain text, to semi-structured data like XML or JSON, to binary objects like images. + +When you decide to use datatypes, you've given Riak some information about the type of object you want to store. With this information, Riak can figure out how to resolve conflicts automatically for you, based on some pre-defined behavior. + +Let's try another example. Let's imagine a shopping cart in an online retailer. You can imagine a shopping cart like a set of items. So each key in our cart contains a *set* of values. + +Let's say you log into the retailer's website on your laptop with your username *ponies4evr*, and choose the Season 2 DVD of *My Little Pony: Friendship is Magic*. This time, the logical clock will act more like Riak's, where the node that coordinates the request will be the actor. + +```yaml +type: set +bucket: cart +key: ponies4evr + +vclock: {Node_A: 1} +value: ["MYPFIM-S2-DVD"] +``` + +Once the DVD was added to the cart bucket, your laptop runs out of batteries. So you take out your trusty smartphone, and log into the retailer's mobile app. You decide to also add the *Bloodsport III* DVD. Little did you know, a temporary network partition caused your write to redirect to another node. This partition had no knowledge of your other purchase. + +```yaml +type: set +bucket: cart +key: ponies4evr + +vclock: {Node_B: 1} +value: ["BS-III-DVD"] +``` + +Happily, the network hiccup was temporary, and thus the cluster heals itself. Under normal circumstances, since the logical clocks did not descend from one another, you'd end up with siblings like this: + +```yaml +type: set +bucket: cart +key: ponies4evr + +vclock: {Node_A: 1} +value: ["MYPFIM-S2-DVD"] +--- +vclock: {Node_B: 1} +value: ["BS-III-DVD"] +``` + +But since the bucket was designed to hold a *set*, Riak knows how to automatically resolve this conflict. In the case of conflicting sets, it performs a set union. So when you go to checkout of the cart, the system returns this instead: + +```yaml +type: set +bucket: cart +key: ponies4evr + +vclock: [{Node_A: 1}, {Node_B: 1}] +value: ["MYPFIM-S2-DVD", "BS-III-DVD"] +``` + +Datatypes will never return conflicts. This is an important claim to make, because as a developer, you get all of the benefits of dealing with a simple value, with all of the benefits of a distributed, available system. You don't have to think about handling conflicts. It would be like a version control system where (*git*, *svn*, etc) where you never had to merge code---the VCS simply *knew* what you wanted. + +How this all works is beyond the scope of this document. Under the covers it's implemented by something called [CRDTs](http://docs.basho.com/riak/2.0.0/theory/concepts/crdts/) \(Conflict-free Replicated Data Types). What's important to note is that Riak supports four datatypes: *map*, *set*, *counter*, *flag* (a boolean value). Best of all, maps can nest arbitrarily, so you can create a map whose values are sets, counters, or even other maps. It also supports plain string values called *register*s. + +We'll see how to use datatypes in the next chapter. + +

Riak and ACID

+ + + +Unlike single node databases like Neo4j or PostgreSQL, Riak does not support *ACID* transactions. Locking across multiple servers would can write availability, and equally concerning, increase latency. While ACID transactions promise *Atomicity*, *Consistency*, *Isolation*, and *Durability*---Riak and other NoSQL databases follow *BASE*, or *Basically Available*, *Soft state*, *Eventually consistent*. + +The BASE acronym was meant as shorthand for the goals of non-ACID-transactional databases like Riak. It is an acceptance that distribution is never perfect (basically available), all data is in flux (soft state), and that strong consistency is untenable (eventually consistent) if you want high availability. + +Look closely at promises of distributed transactions---it's often couched in some diminishing adjective or caveat like *row transactions*, or *per node transactions*, which basically mean *not transactional* in terms you would normally use to define it. I'm not claiming it's impossible, but certainly worth due consideration. + +As your server count grows---especially as you introduce multiple datacenters---the odds of partitions and node failures drastically increase. My best advice is to design for it upfront. + +## Wrapup + +Riak is designed to bestow a range of real-world benefits, but equally, to handle the fallout of wielding such power. Consistent hashing and vnodes are an elegant solution to horizontally scaling across servers. N/R/W allows you to dance with the CAP theorem by fine-tuning against its constraints. And vector clocks allow another step closer to consistency by allowing you to manage conflicts that will occur at high load. + +We'll cover other technical concepts as needed, including the gossip protocol, hinted handoff, and read-repair. + +Next we'll review Riak from the user (developer) perspective. We'll check out lookups, take advantage of write hooks, and examine alternative query options like secondary indexing, search, and MapReduce. diff --git a/zh/03-developers.md b/zh/03-developers.md new file mode 100644 index 0000000..c5a9633 --- /dev/null +++ b/zh/03-developers.md @@ -0,0 +1,1002 @@ +# 开发者 + + + +_眼下我们打算暂缓安装Riak的细节部分。如果你想自己一个人开始,可以通过遵循网站 (http://docs.basho.com) 上的 [install documentation](http://docs.basho.com/riak/latest/) 轻松的开始. 不然的话,这会是一个很棒的阅读部分当你坐在没有网络的火车上时。_ + +一旦了解了一些更细微的点,用Riak数据库进行开发是非常容易的。在技术意义上,它是一个键/值存储(您将值与键相关联,并使用相同的键检索它们),但它为用户提供了更多。您可以在写入之前或之后嵌入写入钩子,或者用于快速检索的索引数据。Riak具有SOLR搜索功能,可让您运行MapReduce函数,以便在相对较短的时间范围内跨巨大的集群提取和聚合数据。 我们将显示一些可配置的具体的储存桶设置。 + +## 查找 Lookup + + + +由于Riak是KV数据库,因此最基本的命令是设置和获取值。我们将使用HTTP接口,通过curl,但我们可以很容易地使用Erlang,Ruby,Java或任何其他支持的语言。 + +Riak请求的基本结构是设置一个值,读取它, +也可能最终删除它。这些操作与HTTP方法有关 + (PUT, GET, POST, DELETE). + +```bash +PUT /types//buckets//keys/ +GET /types//buckets//keys/ +DELETE /types//buckets//keys/ +``` + +对于本章的例子,我们来调用一个指向我们访问节点的URL的环境变量`$RIAK`。 + +```bash +export RIAK=http://localhost:8098 +``` + +

PUT

+ +Riak中最简单的写入命令是放置一个值。它需要一个键,值和一个存储桶。在curl中,所有HTTP方法都以`-X`为前缀。把一个`pizza`的值放入在`food`储存桶下的`favorite` 键中,这个项目的`items`储存桶类型如下: + +```bash +curl -XPUT "$RIAK/types/items/buckets/food/keys/favorite" \ + -H "Content-Type:text/plain" \ + -d "pizza" +``` + +我在里面出了一些难题。`-d`标志表示下一个字符串将会是该值。我们用字符串`pizza`保持简单,用文本`-H 'Content-Type:text/plain'`声明为文本。这将此值的HTTP MIME类型定义为纯文本。我们可以设置任何值,无论是XML还是JSON ---即使是图像或视频。只要对象大小不超过4MB(软限制,超出限制是不明智的),Riak就不关心所上传的数据。 + +

GET

+ +下一个命令在`items`/`food`/`favorite`下读取值`pizza`. + +```bash +curl -XGET "$RIAK/types/items/buckets/food/keys/favorite" +pizza +``` + +这是最简单的读取形式,仅响应值。Riak包含了更多信息,如果你读取整个响应(包括HTTP标头),你可以访问它们。 + +在`curl`中,您可以通过`-i`标记访问完整的响应。我们再次执行上面的查询,添加该标志(`-XGET`是默认的curl方法,所以我们可以把它关掉)。 + +```bash +curl -i "$RIAK/types/items/buckets/food/keys/favorite" +HTTP/1.1 200 OK +X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgaUHjmdwZTImMfKcN3h1Um+LAA= +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted... +Last-Modified: Wed, 10 Oct 2012 18:56:23 GMT +ETag: "1yHn7L0XMEoMVXRGp4gOom" +Date: Thu, 11 Oct 2012 23:57:29 GMT +Content-Type: text/plain +Content-Length: 5 + +pizza +``` + +虽然HTTP的解剖结构有点超出这本小书的内容,但是我们还是可以来看几个值得注意的部分。 + +
状态码 Status Codes
+ +第一行给出HTTP版本1.1响应代码`200 OK`。 你可能熟悉常见的网站代码`404 Not Found`。有多种[HTTP 状态码](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html),Riak HTTP接口会对其目标保持true: **1xx Informational**, **2xx Success**, **3xx Further Action**, **4xx Client Error**, **5xx Server Error** + +不同的操作可以返回不同的响应/错误代码。 完整的列表可以在[官方API文档](http://docs.basho.com/riak/latest/references/apis/)中找到。 + +
校时 Timings
+ +一组标题表示对象或请求的不同时序。 + +* **最后更改Last-Modified** - 最后一次修改此对象(创建或更新)。 +* **被请求变量的实体值ETag** - 可用于客户端高速缓存验证的*[实体标签](http://en.wikipedia.org/wiki/HTTP_ETag)* 。 +* **日期Date** - 请求的时间。 +* **X-Riak-Vclock** - 一个逻辑时钟,我们将在后面详细介绍。 + +
内容 Content
+ +这些描述消息的HTTP主体(以Riak的术语,值)。 + +* **内容类型Content-Type** - 值的类型, 比如 `text/xml`. +* **内容长度Content-Length** - 消息体的长度(以字节为单位)。 + +其他一些标题如 `Link` 将在本章后面介绍。 + + +

POST

+ +与PUT类似,POST将保存一个值。但是用POST键是可选的。它需要的是一个桶名称(并且应该包括一个类型),它将为你生成一个键。 + +我们添加一个JSON值来表示一个人在`json`/`people` type/bucket下。 响应头是POST返回为你生成的键的位置。 + +```bash +curl -i -XPOST "$RIAK/types/json/buckets/people/keys" \ + -H "Content-Type:application/json" \ + -d '{"name":"aaron"}' +HTTP/1.1 201 Created +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.9.2 (someone had painted... +Location: /riak/people/DNQGJY0KtcHMirkidasA066yj5V +Date: Wed, 10 Oct 2012 17:55:22 GMT +Content-Type: application/json +Content-Length: 0 +``` + +你可以从 `Location` 值提取此键。除了不是很好之外,这个键被视为与通过PUT定义自己的键一样。 + +
主体Body
+ +你可能会注意到没有主体被响应返回。对于任何类型的写入,你可以添加`returnbody=true`参数来强制返回值,以及与`X-Riak-Vclock`和`ETag`等值相关的头。 + +```bash +curl -i -XPOST "$RIAK/types/json/buckets/people/keys?returnbody=true" \ + -H "Content-Type:application/json" \ + -d '{"name":"billy"}' +HTTP/1.1 201 Created +X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgaUHjmdwZTImMfKkD3z10m+LAA= +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted... +Location: /riak/people/DnetI8GHiBK2yBFOEcj1EhHprss +Last-Modified: Tue, 23 Oct 2012 04:30:35 GMT +ETag: "7DsE7SEqAtY12d8T1HMkWZ" +Date: Tue, 23 Oct 2012 04:30:35 GMT +Content-Type: application/json +Content-Length: 16 + +{"name":"billy"} +``` + +这对于PUT和POST是正确的。 + +

删除DELETE

+ +最后的基本操作是删除键,类似于获取值,但将DELETE方法发送到 `type`/`bucket`/`key`。 + +```bash +curl -XDELETE "$RIAK/types/json/buckets/people/keys/DNQGJY0KtcHMirkidasA066yj5V" +``` + +通过编写一个标记为墓碑的标记,将Riak中的一个已删除的对象内部标记为已删除。 除非另有配置,否则称为收割者(reaper)的另一过程将在以后完成删除标记对象。 + +这个细节通常不是重要的,除了理解两件事情: + +1. 在Riak中,*删除*实际上是一个*读取*和*写入*,并且在计算读/写比率时应该被认为是这样。 +2. 检查键的存在还不足以知道对象是否存在。删除后可能会读取键,因此您应该检查墓碑元数据。 + +

列表Lists

+ +Riak提供了两种列表。 第一个列出了群集中的所有*桶*,而第二个列出了特定桶下的所有*键*。这两个行动都是以同样的方式召唤的,有两种变化。 + +以下将给我们所有的桶作为JSON对象。 + +```bash +curl "$RIAK/types/default/buckets?buckets=true" + +{"buckets":["food"]} +``` + +这将给我们在`food`桶下的所有键。 + +```bash +curl "$RIAK/types/default/buckets/food/keys?keys=true" +{ + ... + "keys": [ + "favorite" + ] +} +``` + +如果我们有很多键,这显然可能需要一段时间。 因此,Riak还提供了流式传输键。`keys=stream`的功能,可以保持连接的打开状态,并以数组的形式返回结果。 当它已经用完列表时,它将关闭连接。 你可以通过curl在verbose(-v)mode下来查看详细信息(其中的大部分内容已被删除)。 + +```bash +curl -v "$RIAK/types/default/buckets/food/keys?keys=stream" +... + +* Connection #0 to host localhost left intact +... +{"keys":["favorite"]} +{"keys":[]} +* Closing connection #0 +``` + + + +你应该注意,列表操作不应该在作业中使用(它们确实是费力的操作)。但它们对于开发,调查或在非高峰时段进行偶尔的分析是有用的。 + +## 有条件的请求 Conditional requests + +可以使用Riak的条件请求,但是由于其可用性/最终一致性模型的性质,这些条件是脆弱的。 + +### GET + +通过HTTP从Riak检索值时,会包含最后修改的时间戳和[ETag](https://en.wikipedia.org/wiki/HTTP_ETag)。 这些可能用于将来的 `GET` 请求; 如果值未更改,将返回304未更改状态。 + +例如,假设你收到以下标题。 + +```bash +Last-Modified: Thu, 17 Jul 2014 21:01:16 GMT +ETag: "3VhRP0vnXbk5NjZllr0dDE" +``` + +请注意,引号是ETag的一部分。 + +如果ETag通过下一个请求中的`If-None-Match`头部使用: + +```bash +curl -i "$RIAK/types/default/buckets/food/keys/dinner" \ + -H 'If-None-Match: "3VhRP0vnXbk5NjZllr0dDE"' +HTTP/1.1 304 Not Modified +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) +ETag: "3VhRP0vnXbk5NjZllr0dDE" +Date: Mon, 28 Jul 2014 19:48:13 GMT +``` + +类似地,最后修改的时间戳可以与`If-Modified-Since`一起使用: + +```bash +curl -i "$RIAK/types/default/buckets/food/keys/dinner" \ + -H 'If-Modified-Since: Thu, 17 Jul 2014 21:01:16 GMT' +HTTP/1.1 304 Not Modified +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) +ETag: "3VhRP0vnXbk5NjZllr0dDE" +Date: Mon, 28 Jul 2014 19:51:39 GMT +``` + +### 放&删除 PUT & DELETE + +当添加,更新或删除内容时,HTTP头如果`If-None-Match`, `If-Match`, `If-Modified-Since`, and +`If-Unmodified-Since`可以用于指定ETag和时间戳。 + +如果无法满足指定的条件,则会导致`412 Precondition Failed`状态。 + +## 桶类型/桶 Bucket Types/Buckets + +尽管目前我们一直在使用桶类型和桶作为命名空间,但是它们有更多的功能。 + +不同的用例将决定桶是大量写入还是大部分读取。 您可以使用一个桶来存储日志,一个桶可以存储会话数据,而另一个可以存储购物车数据。 有时低延迟是重要的,而其他时候它的耐用性很高。 有时我们只是希望在发生写操作时,桶会产生不同的反应。 + +

法定人数 Quorum

+ +Riak的可用性和宽容性的基础是它可以读取或写入多个节点。 Riak允许您在每个桶的基础上调整这些N / R / W值(我们在概念下涵盖)。 + +

N/R/W

+ +N是要复制值的总节点数,默认为3.但是我们可以将此`n_val`设置为小于总节点数。 + +任何桶属性(包括`n_val`)可以通过将一个 `props` 值作为JSON对象发送到桶URL来设置。 让我们将`n_val`设置为5个节点,这意味着写入`cart`的对象将被复制到5个节点。 + +```bash +curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \ + -H "Content-Type: application/json" \ + -d '{"props":{"n_val":5}}' +``` + +你可以通过向桶中发出GET来窥探桶的属性。 + +*注意:Riak返回未格式化的JSON。 如果你安装了jsonpp(或json_pp)等命令行工具,则可以将通过管道的输出更容易阅读。 以下结果是所有`props`值的一个子集。* + +```bash +curl "$RIAK/types/default/buckets/cart/props" | jsonpp +{ + "props": { + ... + "dw": "quorum", + "n_val": 5, + "name": "cart", + "postcommit": [], + "pr": 0, + "precommit": [], + "pw": 0, + "r": "quorum", + "rw": "quorum", + "w": "quorum", + ... + } +} +``` + +你可以看到,`n_val`是5.这是可以预料的。 但是你也可能已经注意到,`props`将`r`和`w`两个作为`quorum`,而不是一个数字。 那么什么是*法定人数*? + +
符号价值 Symbolic Values
+ +*法定人数*是所有总复制节点的一半以上(`floor(N / 2)+ 1`)。 这个数字很重要,因为如果超过一半的节点被写入,并且超过一半的节点被读取,那么你将获得最新的值(在正常情况下)。 + +这里有一个例子,上面的`n_val`为5({A,B,C,D,E})。 你的`w`是一个法定人数(这是`3`,或`floor(5/2)+1`),所以在写入{A,B,C}({D,E} 最终被复制到)。 紧接着,读取法定人数可以从{C,D,E}获取值。 即使D和E具有较旧的值,您也从节点C中提取了一个值,这意味着您将收到最近的值。 + +重要的是你的读写重叠。 只要`r+w > n`,在没有草率的法定人数(下)的情况下,您将能够获得最新的值。 换句话说,你将有一个合理的一致性。 + +法定人数`quorum`是一个很好的默认值,因为你从平衡的节点读写。 但是,如果你有特定要求,例如通常写入但很少读取的日志,则可能会发现从单个节点等待成功写入更有意义,但从所有这些读取。 这会给你一个重叠 + +```bash +curl -i -XPUT "$RIAK/types/default/buckets/logs/props" \ + -H "Content-Type: application/json" \ + -d '{"props":{"w":"one","r":"all"}}' +``` + +* `all` - All replicas must reply, which is the same as setting `r` or `w` equal to `n_val` +* `one` - Setting `r` or `w` equal to `1` +* `quorum` - A majority of the replicas must respond, that is, “half plus one”. + +

草率的法定人数Sloppy Quorum

+ +理想的状况是严格的法定人数对于大部分的写入请求都是足够的。然而,在任何时刻,一个节点可能会下降,或者网络可能分区,或松鼠被捕获在管中,从而触发不可用的所需节点。 这被称为严格的法定人数。 Riak默认为所谓的草率的法定人数,这意味着如果任何主(预期)节点不可用,则环中的下一个可用节点将接受请求。 + +想像这样。比如你和你的朋友一起喝酒 您订购2杯饮料(W = 2),但在到达之前,她暂时离开。如果你是一个严格的法定人数,你只能拒绝这两种饮料,因为所需的人(N = 2)不可用。但是你宁愿喝醉了,呃,我的意思是草率的法定人数。不要拒绝喝酒,而是取两个,一个被接受(你也得付钱)。 + +![A Sloppy Quorum](../assets/decor/drinks.png) + +当她回来的时候,你把她的饮料倒过来。 这被称为暗示切换,我们将在下一章再次讨论。 现在,足够的是,默认的草率的法定人数(W)之间存在差异,并且需要主节点(PW)的严格法定人数。 + +
超过R和W More than R's and W's
+ +你在桶的`props`对象中可能注意到的其他一些值是`pw`,`pr`,和`dw`。 + +`pr`和`pw`确保在读取或写入之前有许多主节点可用。 由于网络分区或某些其他服务器中断,Riak将从备份节点读取或写入数据。 该p前缀将确保仅使用主节点,主要意味着与桶加N连接的v节点匹配的v节点。 + +(我们上面提到,`r+w > n`提供了一个合理的一致性,在涉及到草率的法定人数时被违反.`pr+pw > n`允许更坚定的一致性断言,尽管总是存在涉及写入冲突或重大磁盘故障的场景 那也可能还不够) + +最后,`dw`表示成功所需的最小持久写入。 对于正常的`w`写入来计算写入成功,v节点只需要承诺写入已经开始,不能保证写入已经写入磁盘,也是耐用的。 `dw`设置意味着后端服务(例如Bitcask)已同意写入该值。 虽然高`dw`值低于高`w`值,但有些情况下,这种额外的执行情况很好,例如处理财务数据。 + +
每个请求 Per Request
+ +值得注意的是,每个请求可以覆盖这些值(`n_val`除外)。 + +考虑一个你搜寻非常重要的数据(例如信用卡结帐)的场景,并希望确保在成功之前将其写入每个相关节点的磁盘。 你可以添加`?dw=all`到你的写作结尾。 + +```bash +curl -i -XPUT "$RIAK/types/default/buckets/cart/keys/cart1?dw=all" \ + -H "Content-Type: application/json" \ + -d '{"paid":true}' +``` + +如果当前负责数据的任何节点无法完成请求(即将数据切换到存储后端),则客户端将收到故障消息。 这并不意味着写入失败,必然:如果三个主要v节点中的两个成功写入了该值,那么它应该可用于将来的请求。 因此,通过强制一个高`dw`或`pw`值可以获得一致性可能会导致意外的行为。 + +

钩 Hooks

+ +桶的另一个实用程序是他们通过钩来执行写入行为的能力。 您可以附加函数以在值提交到桶的之前或之后运行。 + +预先钩是在调用写入之前运行的函数。如果输入的数据在某些方面被认为是坏的,则预先钩具有完全取消写入的能力。一个简单的预先钩是检查一个值是否存在。 + +我把我的自定义Erlang代码文件在riak中安装到 `./custom/my_validators.erl`下。 + +```java +-module(my_validators). +-export([value_exists/1]). + +%% Object size must be greater than 0 bytes +value_exists(RiakObject) -> + Value = riak_object:get_value(RiakObject), + case erlang:byte_size(Value) of + 0 -> {fail, "A value sized greater than 0 is required"}; + _ -> RiakObject + end. +``` + +然后编译文件。 + +```bash +erlc my_validators.erl +``` + +通过向Riak安装新的代码安装文件,并在每个节点中的`riak.conf`旁边安装`advanced.config`文件,然后重新启动每个节点。 + +```bash +{riak_kv, + {add_paths, ["./custom"]} +} +``` + +那么你需要设置Erlang模块(`my_validators`)和函数 (`value_exists`) 作为一个JSON值给桶的预占数组`{"mod":"my_validators","fun":"value_exists"}`。 + +```bash +curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \ + -H "Content-Type:application/json" \ + -d '{"props":{"precommit":[{"mod":"my_validators","fun":"value_exists"}]}}' +``` + +如果你尝试并将其发布到 `cart` 桶中,而无需值,则应该会发生故障。 + +```bash +curl -XPOST "$RIAK/types/default/buckets/cart/keys" \ + -H "Content-Type:application/json" +A value sized greater than 0 is required +``` + +你还可以在JavaScript中编写预提交函数,通过Erlang代码将执行得更快。 + +后提交在形式和功能上是相似的,只不过是在执行写入之后执行。主要差异: + +* 唯一支持的语言是Erlang。 +* 函数的返回值被忽略,因此不能将失败消息发送到客户端。 + + +## 数据类型 Datatypes + +Riak 2.0中的一个新功能是数据类型。 而不是过去的不透明值,这些新添加允许用户定义在给定桶类型下接受的值的类型。 除了上一章自动解决冲突中列出的好处之外,你还可以以不同的方式与数据类型进行交互。 + + + +在正常的Riak操作中,如我们所见,您将带有给定键的值放入类型/桶对象中。 如果要存储图,比如说,作为代表一个人的JSON对象,你可以把整个对象的每个字段/值作为一个操作。 + +```bash +curl -XPOST "$RIAK/types/json/buckets/people/keys/joe" \ + -H "Content-Type:application/json" + -d '{"name_register":"Joe", "pets_set":["cat"]}' +``` + +但是,如果你想添加一条`fish`作为宠物,你必须更换整个对象。 + +```bash +curl -XPOST "$RIAK/types/json/buckets/people/keys/joe" \ + -H "Content-Type:application/json" + -d '{"name_register":"Joe", "pets_set":["cat", "fish"]}' +``` + +正如我们在上一章中看到的那样,这样会有冲突的风险,从而创造了一个siblings。 + +``` +{"name_register":"Joe", "pets_set":["cat"]} +{"name_register":"Joe", "pets_set":["cat", "fish"]} +``` + +但是,如果我们使用图,我们只会发布更新来创建图。 因此,假设桶类型`map`是图数据类型(我们将在下一章中看到操作员可以如何将数据类型分配给桶类型)。 该命令将插入一个具有两个字段(`name_register` 和 `pets_set`)的图对象。 + +```bash +curl -XPOST "$RIAK/types/map/buckets/people/keys/joe" \ + -H "Content-Type:application/json" + -d '{ + "update": { + "name_register": "Joe" + "pets_set": { + "add_all": "cat" + } + } + }' +``` + +接下来,我们要更新`joe`地图中包含的`pets_set`。而不是设置joe的名字和他的宠物猫,我们只需要通知对象的变化。也就是说,我们想给他的`pets_set`添加一条`fish`。 + +```bash +curl -XPOST "$RIAK/types/map/buckets/people/keys/joe" \ + -H "Content-Type:application/json" + -d '{ + "update": { + "pets_set": { + "add": "fish" + } + } + }' +``` + +这有一些好处。 首先,我们不需要发送重复的数据。 其次,两个请求发生的顺序并不重要,结果将是一样的。 第三,由于操作是CmRDTs,所以没有数据类型返回siblings的可能性,使您的客户端代码更容易。 + +如前所述,有四个Riak数据类型:*map*, *set*, *counter*, *flag*。 对象类型设置为桶类型属性。 但是,如我们所见,填充地图时,您必须使用要存储的数据类型对字段名称进行后缀:\*\_map, \*\_set, \*\_counter, \*\_flag。 对于纯字符串值,有一个特殊的\*\_注册数据类型后缀。 + +您可以在[datatypes in the docs](http://docs.basho.com/riak/latest/dev/using/data-types)阅读更多信息。 + + +## 熵 Entropy + +熵是最终一致性的副产品。 换句话说:尽管最终的一致性表示写入将及时复制到其他节点,但是在所有节点不包含相同值的情况下可能会有一点延迟。 + +这个差异是*熵*,因此Riak已经创建了几个*反熵*策略(缩写为*AE*)。 当写入/读取请求与至少一个节点重叠时,我们已经讨论了R/W仲裁如何处理不同的值。 Riak可以修复熵,或者允许你自己选择这样做。 + +Riak有两个基本的策略来解决冲突的写入。 + +

最后一次写入胜利 Last Write Wins

+ +最基本的,最不可靠的解决熵的策略称为*最后一次写入胜利*。 这是一个简单的想法,基于节点的系统时钟的最后一个写入将覆盖旧的。 这是Riak中的默认行为(由于`allow_mult`属性默认为`false`)。 您也可以将`last_write_wins`属性设置为`true`,通过永久保留向量时钟历史来提高性能。 + +实际上,当你真的不在乎真正的操作顺序,或丢失数据的可能性时,这是为了提高速度和简单性。 由于不可能保持服务器时钟真正的同步(没有众所周知的与地质同步的原子钟),这是一个最好的猜测,关于“最后”是什么意思,关于最近的毫秒。 + +

Vector Clocks

+ +As we saw under [Concepts](#practical-tradeoffs), *vector clocks* are Riak's way of tracking a true sequence of events of an object. Let's take a look at using vector clocks to allow for a more sophisticated conflict resolution approach than simply retaining the last-written value. + +

原型Siblings

+ +*原型*发生时,你会有冲突值,没有明确的方式让Riak知道哪个值是正确的。 从Riak 2.0开始,只要你使用不是数据类型的自定义(不是“默认”)存储桶类型,冲突的写入应创建原型。 这是一件好事,因为它确保没有数据丢失。 + +在放弃自定义桶类型的情况下,如果`allow_mult`参数配置为`false`,Riak将尝试解决这些冲突本身。 你通常应该把你的桶设置为保留原型,由客户端解决,确保`allow_mult`是`true`。 + +```bash +curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \ + -H "Content-Type:application/json" \ + -d '{"props":{"allow_mult":true}}' +``` + +原型出现在几种情况下。 + +1. 客户端使用陈旧(或缺失)向量时钟写入一个值。 +2. 两个客户端以相同的向量时钟值同时写入。 + +当我们介绍矢量时钟的概念时,我们使用第二个场景来制造前一章的冲突,我们再次这样做。 + +

创建示例冲突 Creating an Example Conflict

+ +想象一下,我们为单个冰箱创建一个购物车,但是家里的几个人能够为其订购食物。 由于丢失订单会导致家庭不愉快,Riak正在使用自定义桶类型“购物”,它保留默认值`allow_mult=true`。 + +首先 Casey(素食主义者)将10羽羽衣甘蓝放在购物车中。 + +Casey 写入 `[{"item":"kale","count":10}]`. + +```bash +curl -i -XPUT "$RIAK/types/shopping/buckets/fridge/keys/97207?returnbody=true" \ + -H "Content-Type:application/json" \ + -d '[{"item":"kale","count":10}]' +HTTP/1.1 200 OK +X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgaUHjmTwZTImMfKsMKK7RRfFgA= +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted... +Last-Modified: Thu, 01 Nov 2012 00:13:28 GMT +ETag: "2IGTrV8g1NXEfkPZ45WfAP" +Date: Thu, 01 Nov 2012 00:13:28 GMT +Content-Type: application/json +Content-Length: 28 + +[{"item":"kale","count":10}] +``` + +注意Riak返回的不透明向量时钟(通过`X-Riak-Vclock`标题)。 对于该密钥发出的任何读取请求将返回相同的值,直到发生另一个写入。 + +他的室友Mark,读了订单,加了牛奶。 为了让Riak正确跟踪更新历史记录,Mark在他的PUT中包含了最近的矢量时钟。 + +Mark 写入 `[{"item":"kale","count":10},{"item":"milk","count":1}]`. + +```bash +curl -i -XPUT "$RIAK/types/shopping/buckets/fridge/keys/97207?returnbody=true" \ + -H "Content-Type:application/json" \ + -H "X-Riak-Vclock:a85hYGBgzGDKBVIcypz/fgaUHjmTwZTImMfKsMKK7RRfFgA="" \ + -d '[{"item":"kale","count":10},{"item":"milk","count":1}]' +HTTP/1.1 200 OK +X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgaUHjmTwZTIlMfKcMaK7RRfFgA= +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted... +Last-Modified: Thu, 01 Nov 2012 00:14:04 GMT +ETag: "62NRijQH3mRYPRybFneZaY" +Date: Thu, 01 Nov 2012 00:14:04 GMT +Content-Type: application/json +Content-Length: 54 + +[{"item":"kale","count":10},{"item":"milk","count":1}] +``` + +如果仔细观察,您会发现矢量时钟随着第二个写请求而改变 + +* a85hYGBgzGDKBVIcypz/fgaUHjmTwZTImMfKsMKK7RRfFgA= (after the write by Casey) +* a85hYGBgzGDKBVIcypz/fgaUHjmTwZTIlMfKcMaK7RRfFgA= (after the write by Mark) + +现在让我们考虑一个第三个室友,Andy,喜欢杏仁。 在Mark用牛奶更新共享车之前,Andy检索了Casey的羽衣甘蓝命令并附加了杏仁。 与马克一样,安迪的更新包括在凯西的原始写作之后存在的向量时钟。 + +Andy 写入 `[{"item":"kale","count":10},{"item":"almonds","count":12}]`. + +```bash +curl -i -XPUT "$RIAK/types/shopping/buckets/fridge/keys/97207?returnbody=true" \ + -H "Content-Type:application/json" \ + -H "X-Riak-Vclock:a85hYGBgzGDKBVIcypz/fgaUHjmTwZTImMfKsMKK7RRfFgA="" \ + -d '[{"item":"kale","count":10},{"item":"almonds","count":12}]' +HTTP/1.1 300 Multiple Choices +X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgaUHjmTwZTInMfKoG7LdoovCwA= +Vary: Accept-Encoding +Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted... +Last-Modified: Thu, 01 Nov 2012 00:24:07 GMT +ETag: "54Nx22W9M7JUKJnLBrRehj" +Date: Thu, 01 Nov 2012 00:24:07 GMT +Content-Type: multipart/mixed; boundary=Ql3O0enxVdaMF3YlXFOdmO5bvrs +Content-Length: 491 + + +--Ql3O0enxVdaMF3YlXFOdmO5bvrs +Content-Type: application/json +Etag: 62NRijQH3mRYPRybFneZaY +Last-Modified: Thu, 01 Nov 2012 00:14:04 GMT + +[{"item":"kale","count":10},{"item":"milk","count":1}] +--Ql3O0enxVdaMF3YlXFOdmO5bvrs +Content-Type: application/json +Etag: 7kfvPXisoVBfC43IiPKYNb +Last-Modified: Thu, 01 Nov 2012 00:24:07 GMT + +[{"item":"kale","count":10},{"item":"almonds","count":12}] +--Ql3O0enxVdaMF3YlXFOdmO5bvrs-- +``` + +哇! 那是什么 + +由于Mark和Andy设定冰箱值之间存在冲突,Riak保持了两者的值。 + +

VTag

+ +由于我们使用HTTP客户端,Riak用`multipart/mixed`MIME类型返回了一个`300 Multiple Choices`代码。 你可以解析结果(或者你可以通过它的Etag请求一个特定值,也称为Vtag)。 + +发布一个简单的`shopping/fridge/97207`键也将返回所有原型的vtags。 + +``` +curl "$RIAK/types/shopping/buckets/fridge/keys/97207" +Siblings: +62NRijQH3mRYPRybFneZaY +7kfvPXisoVBfC43IiPKYNb +``` + +你能用这个标签做什么? 也就是说,你通过它的`vtag`请求一个特定原型的价值。 要获得列表中的第一个原型(Mark的牛奶): + +```bash +curl "$RIAK/types/shopping/buckets/fridge/keys/97207?vtag=62NRijQH3mRYPRybFneZaY" +[{"item":"kale","count":10},{"item":"milk","count":1}] +``` + +如果要检索所有原型数据,请告诉Riak,你将通过添加`-H "Accept:multipart/mixed"`来接受multipart消息。 + +```bash +curl "$RIAK/types/shopping/buckets/fridge/keys/97207" \ + -H "Accept:multipart/mixed" +``` + + + +

解决冲突 Resolving Conflicts

+ +当我们有冲突的写入时,我们想解决它们。 由于该问题通常是*用例特定*,Riak将其延伸到我们,我们的应用程序必须决定如何进行。 + +对于我们的示例,让我们将值合并成一个单一的结果集,如果*类型*相同,则使用较大的*计数*。 完成后,使用多部分对象的vclock将新结果写回Riak,因此Riak知道你正在解决冲突,你将获得一个新的向量时钟。 + +连续读取将获得单个(合并)结果。 + +```bash +curl -i -XPUT "$RIAK/types/shopping/buckets/fridge/keys/97207?returnbody=true" \ + -H "Content-Type:application/json" \ + -H "X-Riak-Vclock:a85hYGBgzGDKBVIcypz/fgaUHjmTwZTInMfKoG7LdoovCwA=" \ + -d '[{"item":"kale","count":10},{"item":"milk","count":1},\ + {"item":"almonds","count":12}]' +``` + +

最后写入胜利对原型Last write wins vs. siblings

+ +您的数据和业务需求将决定哪种解决方案适合。 你不需要在全局选择一个策略; 相反,可以自由地利用Riak的桶来指定哪些数据使用原型,并且盲目保留写入的最后一个值。 + +简要介绍下你要设置的两个配置值: + +* `allow_mult`默认为`false`,这意味着最后一次写入胜利。 +* 将`allow_mult`设置为`true`指示Riak将冲突的写入保留为原型。 +* `last_write_wins`默认为`false`,(可能是反直觉的)仍然意味着行为是最后一次写入胜利:`allow_mult`是行为切换的关键参数。 +* 将`last_write_wins`设置为true将通过假设先前的向量时钟没有内在值来优化写入。 +* 将“allow_mult”和“last_write_wins”设置为“true”不受支持,并将导致未定义的行为。 + +

读取修复 Read Repair

+ +当成功读取发生时,但并非所有副本都符合该值,这将触发*读取修复*。 这意味着Riak将使用最新值更新副本。 当没有找到对象(vnode没有副本)或vnode包含较旧的值(旧的表示它是最新的向量时钟的祖先)时,可能会发生这种情况。 与`last_write_wins`或手动解决冲突不同,读取修复(显然是希望通过名称)由读取而不是写入触发。 + +如果您的节点不同步(例如,如果您增加了一个桶上的`n_val`),则可以通过执行所有该级别的键的读取操作来强制读取修复。 他们可能会第一次返回“未找到”,但后来的读取将拉出最新的值。 + +

主动反熵 Active Anti-Entropy (AAE)

+ +尽管通过读取修复获取请求期间解决冲突的数据足以满足大多数需求,但是由于节点失败并被替换,所以永远不会读取的数据最终可能会丢失。 + +Riak支持主动反熵(AAE),以主动识别和修复不一致的数据。 此功能还有助于在磁盘损坏或管理错误的情况下恢复数据丢失。 + +通过维护复杂的散列树(“Merkle tree”),可以轻松地比较vnodes之间的数据集,但是如果需要,可以禁用该功能,从而最大程度地减少了此功能的开销。 + +## 查询 Querying + +到目前为止,我们只处理了键值查找。 事实是,键值是跨越一系列用例的非常强大的机制。 然而,有时我们需要通过值查找数据,而不是键。 有时我们需要执行一些计算,聚合或搜索。 + +

二次索引 Secondary Indexing (2i)

+ +*二次索引*(2i)是降低成本的数据结构 +查找非关键值。 像许多其他数据库一样,Riak有 +索引数据的能力。 但是,由于Riak没有真正的认知 +它存储的数据(它们只是二进制值),它使用元数据 +索引由名称模式定义为整数或二进制值。 + +如果您的安装配置为使用2i(如下一章所示), +只需用一个标头将值写入Riak就会成为索引, +前提是它的前缀是'X-Riak-Index-`,后缀为`_int` +整数或`_bin`作为字符串。 + +```bash +curl -i -XPUT $RIAK/types/shopping/buckets/people/keys/casey \ + -H "Content-Type:application/json" \ + -H "X-Riak-Index-age_int:31" \ + -H "X-Riak-Index-fridge_bin:97207" \ + -d '{"work":"rodeo clown"}' +``` + +查询可以通过两种形式完成:完全匹配和范围。添加更多的人,我们会看到我们得到的东西:`mark`是`32`,和`andy`是`35`,它们共用`97207`。 + +什么人拥有“97207”? 这是一个快速查找接收 +具有匹配索引值的键。 + +```bash +curl "$RIAK/types/shopping/buckets/people/index/fridge_bin/97207" +{"keys":["mark","casey","andy"]} +``` + +使用这些键,这是一个简单的查找来获取主体。 + +另一个查询选项是包容性的范围匹配。 这找到所有 +`32`岁以下的人,通过在`0`和`32`之间进行搜索。 + +```bash +curl "$RIAK/types/shopping/buckets/people/index/age_int/0/32" +{"keys":["mark","casey"]} +``` + +就是这样 这是一个2i的基本形式,具有不错的效用。 + +

映射归约MapReduce

+ +MapReduce是一种通过分离大量数据来聚合大量数据的方法 +处理成两个阶段,映射和缩小,自己执行 +各自的部分。 每个对象将执行映射来转换/提取一些值, +那么那些映射的值将被减少为一些聚合结果。 +我们能从这个结构中获益什么吗? 这是基于它更便宜的想法 +将算法移动到数据所在的位置,而不是转移大量数据 +数据量到单个服务器运行计算。 + +这种由Google普及的方法可以在各种NoSQL数据库中看到。 +在Riak中,您可以在单个节点上执行MapReduce作业 +然后传播到其他节点。 结果被映射和缩小, +然后进一步减少到调用节点并返回。 + +![MapReduce Returning Name Char Count](../assets/mapreduce.svg) + +假设我们有一个存储消息的日志值桶 +以INFO或ERROR为前缀。 我们要计算包含单词“cart”的日志的INFO原型的数量。 + +```bash +LOGS=$RIAK/types/default/buckets/logs/keys +curl -XPOST $LOGS -d "INFO: New user added" +curl -XPOST $LOGS -d "INFO: Kale added to shopping cart" +curl -XPOST $LOGS -d "INFO: Milk added to shopping cart" +curl -XPOST $LOGS -d "ERROR: shopping cart cancelled" +``` + +MapReduce作业可以是Erlang或JavaScript代码。 这一次我们会去的 +容易的路由和写JavaScript。 您可以通过发布JSON来执行MapReduce +`/ mapred`路径。 + +```bash +curl -XPOST "$RIAK/mapred" \ + -H "Content-Type: application/json" \ + -d @- \ +<MR + 2i + +使用MapReduce时的另一个选择是将其与次要索引组合在一起。 +您可以将2i查询的结果管道传输到MapReducer中,只需指定 +索引,以及用于索引查找的`key`或 +对于远程查询的`end`值和`start`值。 + +```json + ... + "inputs":{ + "bucket":"people", + "index": "age_int", + "start": 18, + "end": 32 + }, + ... +``` + +在Riak中的MapReduce是拉出数据的强大方式对于间接键/值存储。 +但是我们还有一种寻找方法Riak中的数据。 + + + +

Search 2.0

+ +Search 2.0是一个完整的,从零开始的在Riak的重新分配搜索。 +它是Riak的扩展,可让您执行搜索查找Riak群集中的值。 +不像原来的Riak Search,Search 2.0 +利用分布式Solr执行倒排索引和管理检索匹配值。 + +在使用Search 2.0之前,您必须安装它并设置一个桶 +使用索引(这些细节可以在下一章中找到)。 + +最简单的例子是全文搜索。 在这里我们添加`ryan`到 +`people`表(带有默认索引)。 + +```bash +curl -XPUT "$RIAK/type/default/buckets/people/keys/ryan" \ + -H "Content-Type:text/plain" \ + -d "Ryan Zezeski" +``` + +要执行搜索,请与任何分布式请求`/ solr / / select` +[Solr 参数](http://wiki.apache.org/solr/CommonQueryParameters)。 在这里,我们 +查询包含以“zez”开头的单词的文档, +请求结果是以json格式(`wt = json`),只返回Riak键(`fl = _yz_rk`)。 + +```bash +curl "$RIAK/solr/people/select?wt=json&omitHeader=true&fl=_yz_rk&q=zez*" +{ + "response": { + "numFound": 1, + "start": 0, + "maxScore": 1.0, + "docs": [ + { + "_yz_rk": "ryan" + } + ] + } +} +``` + +使用匹配的`_yz_rk`键,您可以使用简单的Riak查找检索文本。 + +Search 2.0支持Solr 4.0,其中包括过滤器查询,范围,页面分数, +起始值和行(最后两个对分页有用)。 您还可以接收匹配的[高亮文本](http://wiki.apache.org/solr/HighlightingParameters)(`hl`,`hl.fl`), +这对构建搜索引擎非常有用(而且有用于 [search.basho.com](http://search.basho.com))。 +您可以使用分布式Solr执行搜索,统计,地理定位,边界形状或任何其他搜索。 + +

标记 Tagging

+ +搜索2.0的另一个有用的功能是标记值。 +标记值为Riak值提供附加上下文。 +当前的实现需要所有标记的值都以`X-Riak-Meta`开头, +并列在名为`X-Riak-Meta-yz-tags`的特殊标题下。 + +```bash +curl -XPUT "$RIAK/types/default/buckets/people/keys/dave" \ + -H "Content-Type:text/plain" \ + -H "X-Riak-Meta-yz-tags: X-Riak-Meta-nickname_s" \ + -H "X-Riak-Meta-nickname_s:dizzy" \ + -d "Dave Smith" +``` + +要通过`nickname_s`标记进行搜索,只需要在查询字符串前加一个冒号即可。 + +```bash +curl "$RIAK/solr/people/select?wt=json&omitHeader=true&q=nickname_s:dizzy" +{ + "response": { + "numFound": 1, + "start": 0, + "maxScore": 1.4054651, + "docs": [ + { + "nickname_s": "dizzy", + "id": "dave_25", + "_yz_ed": "20121102T215100 dave m7psMIomLMu/+dtWx51Kluvvrb8=", + "_yz_fpn": "23", + "_yz_node": "dev1@127.0.0.1", + "_yz_pn": "25", + "_yz_rk": "dave", + "_version_": 1417562617478643712 + } + ] + } +} +``` + +请注意,返回的`docs`还包含`"nickname_s":"dizzy"`作为值。 所有标记的值将在匹配结果中返回。 + +

数据类型 Datatypes

+ +Riak 2.0中更强大的组合之一是数据类型和搜索。 +如果您将数据类型和搜索索引都设置在一个存储桶类型的属性中, +那么您设置的值将按照您的预期进行索引。 +映射字段被索引为其给定类型,集合是多字段字符串, +计数器作为整数索引,并且标志是布尔值。 +嵌套图也被索引,以点分隔,并以这种方式可查询。 + +例如,还记的来自数据类型部分的Joe么? +我们假设这个`people`桶被编入索引。 +我们还添加另一只宠物。 + +```bash +curl -XPUT "$RIAK/types/map/buckets/people/keys/joe" \ + -H "Content-Type:application/json" + -d '{"update": {"pets_set": {"add":"dog"}}}' +``` + +然后让我们搜索`pets_set:dog`,只过滤`type / bucket / key`。 + +```bash +{ + "response": { + "numFound": 1, + "start": 0, + "maxScore": 1.0, + "docs": [ + { + "_yz_rt": "map" + "_yz_rb": "people" + "_yz_rk": "joe" + } + ] + } +} +``` + +Bravo 你现在找到了你想要的对象。 +感谢Solr的可定制模式, +如果保存第二个查找非常重要, +你甚至可以存储要返回的字段。 + +这里提供了两个世界最好的。 +您可以更新和查询值,而不用担心发生冲突, +并可以根据字段值查询Riak。 +不需要太多的想象力, +看到这种组合有效地将Riak转变为可扩展, +稳定,高可用性的文档数据存储。 +将强大的一致性投入到组合中(我们将在下一章中做), +您可以以任何方式存储和查询Riak中的任何内容。 + +如果你想知道自己,“Mongodb是怎么提供的,”那么,我没有问过。 你做过 但这是一个很好的问题... + +好了,让我们继续。 + + +## 综合报导 Wrap-up + +Riak是一个分布式数据存储,具有多个添加功能, +可以改进标准的键值查找,例如指定复制值。 +由于Riak中的值是不透明的, +所以这些方法中的许多方法都需要自定义代码来提取值并赋予值, +例如*MapReduce*m或允许头元数据为对象提供添加的描述性维度, +例如*辅助索引* 或*搜索*。 + +接下来,我们将进一步了解引擎盖,并向您展示如何设置和管理您自己的群集。...... diff --git a/zh/04-operators.md b/zh/04-operators.md new file mode 100644 index 0000000..275299e --- /dev/null +++ b/zh/04-operators.md @@ -0,0 +1,1809 @@ +# 运维者 + + + +在某些方面, Riak作为最易操作的一种NoSQl数据库,它所扮演角色绝对是通俗易懂的。想要更多的服务吗? 添加他们. 一个网络电缆会早晨两点被切断吗?几小时睡觉醒来才回去处理它. 但是处理Riak的可靠性,理解你的应用程序栈的主要部分依然很重要。 + +我们已经学习了Riak的核心概念,并且我已经提供了一系列如何使用它的说明,但是关于这个数据库的还有更多需要掌握的。如果你计划操作一个你自己的Riak集群,还有一些你需要了解的细节。 + +## 集群 + +到目前为止,你已经从概念上了解了有关“集群”和“环”的大概的总结。 我们究竟意图什么,这些细节对 riak 开发者和运营这的实际影响是什么? + +在Riak中一个 *集群* 是一个分享普通环的节点的管理集合。 + +

+ +在Riak中 *环* 实际上是一个双重概念。 + +Firstly, the Ring represents the consistent hash partitions (the partitions +managed by vnodes). This partition range is treated as circular, from 0 to +2^160-1 back to 0 again. (If you're wondering, yes this means that we are +limited to 2^160 nodes, which is a limit of a 1.46 quindecillion, or +`1.46 x 10^48`, node cluster. For comparison, there are only `1.92 x 10^49` +[silicon atoms on Earth](http://education.jlab.org/qa/mathatom_05.html).) + +首先,环表示一致的哈希分区 (分区由 vnodes 管理。此分区范围被视为循环, 从零到2^ 160-1 再回到零。(如果你想知道的话, 这意味着我们的分区 +限制为2^ 160 节点, 也是一个被限制为1.46乘以千的16次幂或者 +"1.46 x 10 ^48"的节点群集。为了比较, 只有 "1.92 x 10 ^49"。[地球上的硅原子](http://education.jlab.org/qa/mathatom_05.html).) + +When we consider replication, the N value defines how many nodes an object is +replicated to. Riak makes a best attempt at spreading that value to as many +nodes as it can, so it copies to the next N adjacent nodes, starting with the +primary partition and counting around the Ring, if it reaches the last +partition, it loops around back to the first one. + +当我们考虑复制时, n 值定义了一个对象被复制的节点数。riak 尽其所能尝试将该值传播为尽可能多的节点, 这样它就可以复制到相邻的 n 个节点, 从主分区开始, 并在环上计数, 如果它到达最后一个分区, 它会绕回第一个分区。 + +Secondly, the Ring is also used as a shorthand for describing the state of the +circular hash ring I just mentioned. This Ring (aka *Ring State*) is a +data structure that gets passed around between nodes, so each knows the state +of the entire cluster. Which node manages which vnodes? If a node gets a +request for an object managed by other nodes, it consults the Ring and forwards +the request to the proper nodes. It's a local copy of a contract that all of +the nodes agree to follow. + +其次, 这个环也被用作描述我刚才提到的循环哈希环(hash ring)的状态的简称。这个环 (aka *环状态* ) 是一个数据结构, 通过在节点之间传递, 让每个人都知道整个集群的状态。哪个节点管理哪个 vnodes?如果一个节点获取了由其他节点管理的对象的请求, 那它会咨询该环, 并将请求转发到适当的节点。这是一个所有的节点都同意遵循的合同的本地副本 。 + +Obviously, this contract needs to stay in sync between all of the nodes. If a node is permanently taken +offline or a new one added, the other nodes need to readjust, balancing the partitions around the cluster, +then updating the Ring with this new structure. This Ring state gets passed between the nodes by means of +a *gossip protocol*. + +显然, 此合同需要在所有节点之间保持同步。如果一个节点被永久性地下线或一个新的节点被添加, 其他节点需要重新调整来平衡集群周围的分区, 然后更新这个环的新结构。此环形状态通过 *"gossip协议"* 在节点之间传递。 + +

Gossip(办公室八卦)和CMD(集群元数据)

+ +Riak has two methods of keeping nodes current on the state of the Ring. The first, and oldest, is the *gossip protocol*. If a node's state in the cluster is altered, information is propagated to other nodes. Periodically, nodes will also send their status to a random peer for added consistency. + +riak有两种方法 使节点的状态保持环的状态。第一个, 最古老的, 是 *gossip协议*。如果群集中的节点状态被改变, 信息将传播到其他节点。周期性地, 节点也将它们的状态发送到随机对等点, 以增加一致性。 + +A newer method of information exchange in Riak is *cluster metadata* (CMD), which uses a more sophisticated method (plum-tree, DVV consistent state) to pass large amounts of metadata between nodes. The superiority of CMD is one of the benefits of using bucket types in Riak 2.0, discussed below. + + +Riak中更新的信息交换方法是 *集群元数据(CMD)*,它使用更复杂的方法(plum-tree,数据仓库一致状态)在节点之间传递大量的元数据。 CMD的优势是在Riak 2.0中使用储存桶类型的好处之一,将在下面讨论。 + +In both cases, propagating changes in Ring is an asynchronous operation, and can take a couple minutes depending on Ring size. + +在这两种情况下, 在环中传播变化是一个异步操作, 并且可能需要几分钟的时间,其时间取决于环形大小。 + + + +

如何重复使用环

+ +Even if you are not a programmer, it's worth taking a look at this Ring example. It's also worth +remembering that partitions are managed by vnodes, and in conversation are sometimes interchanged, +though I'll try to be more precise here. + +即使你不是一个程序员, 也值得看看这个环的例子。还值得记住的是, 分区是由 vnodes 管理的, 在对话中有时会互换, 但我在这里会尝试更精确。 + +Let's start with Riak configured to have 8 partitions, which are set via `ring_creation_size` +in the `etc/riak.conf` file (we'll dig deeper into this file later). + +让我们从被配置为有八分区的Riak开始, 分区是通过在`etc/riak.conf`文件中的"ring_creation_size"设置的(稍后我们将深入到此文件中)。 + +```bash +## Number of partitions in the cluster (only valid when first +## creating the cluster). Must be a power of 2, minimum 8 and maximum +## 1024. +## +## Default: 64 +## +## Acceptable values: +## - an integer +ring_size = 8 +``` + +In this example, I have a total of 4 Riak nodes running on `riak@AAA.cluster`, +`riak@BBB.cluster`, `riak@CCC.cluster`, and `riak@DDD.cluster`, each with two partitions (and thus vnodes) + +在本例中, 我共有有四个运行在 `riak@AAA.cluster`,`riak@BBB.cluster`, `riak@CCC.cluster`和`riak@DDD.cluster`上的riak节点,并且每个有两个分区 (因而是 vnodes) + +Riak has the amazing, and dangerous, `attach` command that attaches an Erlang console to a live Riak +node, with access to all of the Riak modules. + +riak 具有惊人的、危险的 "附加" 命令, 它将erlang控制台附加到实时的Riak节点, 并访问所有的Riak模块。 + +The `riak_core_ring:chash(Ring)` function extracts the total count of partitions (8), with an array +of numbers representing the start of the partition, some fraction of the 2^160 number, and the node +name that represents a particular Riak server in the cluster. + +"riak_core_ring: chash (环)" 函数提取分区的总数 (8), 其中一个数字数组表示分区的开头、2^160个数的一部分以及表示群集中特定riak服务器的节点名称。 + +```bash +$ bin/riak attach +(riak@AAA.cluster)1> {ok,Ring} = riak_core_ring_manager:get_my_ring(). +(riak@AAA.cluster)2> riak_core_ring:chash(Ring). +{8, + [{0,'riak@AAA.cluster'}, + {182687704666362864775460604089535377456991567872, 'riak@BBB.cluster'}, + {365375409332725729550921208179070754913983135744, 'riak@CCC.cluster'}, + {548063113999088594326381812268606132370974703616, 'riak@DDD.cluster'}, + {730750818665451459101842416358141509827966271488, 'riak@AAA.cluster'}, + {913438523331814323877303020447676887284957839360, 'riak@BBB.cluster'}, + {1096126227998177188652763624537212264741949407232, 'riak@CCC.cluster'}, + {1278813932664540053428224228626747642198940975104, 'riak@DDD.cluster'}]} +``` + +To discover which partition the bucket/key `food/favorite` object would be stored in, for example, +we execute `riak_core_util:chash_key( {<<"food">>, <<"favorite">>} )` and get a wacky 160 bit Erlang +number we named `DocIdx` (document index). + +例如, 为了发现存储桶/钥匙的 "食物/收藏" 对象将被存放在哪个分区, 我们执行 ' riak_core_util: chash_key ({<"food">, < $xmltag$ >}), 并得到一个古怪的160比特的erlang数值, 我们将其命名为 "DocIdx" (文档索引)。 + +Just to illustrate that Erlang binary value is a real number, the next line makes it a more +readable format, similar to the ring partition numbers. + +只是为了说明 erlang 二进制值是实数, 下一行使它成为一种更可读的格式, 类似于环分区号。 + +```bash +(riak@AAA.cluster)3> DocIdx = +(riak@AAA.cluster)3> riak_core_util:chash_key({<<"food">>,<<"favorite">>}). +<<80,250,1,193,88,87,95,235,103,144,152,2,21,102,201,9,156,102,128,3>> + +(riak@AAA.cluster)4> <> = DocIdx. I. +462294600869748304160752958594990128818752487427 +``` + +With this `DocIdx` number, we can order the partitions, starting with first number greater than +`DocIdx`. The remaining partitions are in numerical order, until we reach zero, then +we loop around and continue to exhaust the list. + +有了"DocIdx"这个数字, 我们可以使分区有序化, 从第一个大于"DocIdx"的数字开始。剩下的分区按数字顺序排列, 直到我们达到零, 然后我们循环遍历直到穷尽list。 + +```bash +(riak@AAA.cluster)5> Preflist = riak_core_ring:preflist(DocIdx, Ring). +[{548063113999088594326381812268606132370974703616, 'riak@DDD.cluster'}, + {730750818665451459101842416358141509827966271488, 'riak@AAA.cluster'}, + {913438523331814323877303020447676887284957839360, 'riak@BBB.cluster'}, + {1096126227998177188652763624537212264741949407232, 'riak@CCC.cluster'}, + {1278813932664540053428224228626747642198940975104, 'riak@DDD.cluster'}, + {0,'riak@AAA.cluster'}, + {182687704666362864775460604089535377456991567872, 'riak@BBB.cluster'}, + {365375409332725729550921208179070754913983135744, 'riak@CCC.cluster'}] +``` + +So what does all this have to do with replication? With the above list, we simply replicate a write +down the list N times. If we set N=3, then the `food/favorite` object will be written to +the `riak@DDD.cluster` node's partition `5480631...` (I truncated the number here), +`riak@AAA.cluster` partition `7307508...`, and `riak@BBB.cluster` partition `9134385...`. + +那么, 所有这些都与复制有关?在上面的list中, 我们只需复制一个写下的list N次。如果我们设置 N=3, 然后,“食物/喜爱”的对象将被写到“riak@DDD.cluster”点的分区 "5480631" (此处截断了数字), "riak@AAA.cluster"分区"7307508..."和"riak@BBB.cluster"分区 "9134385..."。 + +If something has happened to one of those nodes, like a network split +(confusingly also called a partition---the "P" in "CAP"), the remaining +active nodes in the list become candidates to hold the data. + +如果这些节点之一发生了某些变化,如网络划分(广义上也称为分区 ———“CAP”中的“P”),list中剩余的活动节点将成为保存数据的候选者。 + +So if the node coordinating the write could not reach node +`riak@AAA.cluster` to write to partition `7307508...`, it would then attempt +to write that partition `7307508...` to `riak@CCC.cluster` as a fallback +(it's the next node in the list preflist after the 3 primaries). + +因此,如果协调写入的节点无法到达节点“riak @ AAA.cluster”而写入分区“7307508 ...”,则会尝试将该分区“7307508 ...”写入"riak@CCC.cluster “作为回退(即list中的下一个节点,在3个基准之后的preflist)。 + +The way that the Ring is structured allows Riak to ensure data is always +written to the appropriate number of physical nodes, even in cases where one +or more physical nodes are unavailable. It does this by simply trying the next +available node in the preflist. + +Ring的结构方式允许Riak确保数据总是被写入适当数量的物理节点,即使在一个或多个物理节点不可用的情况下也是如此。 它通过简单地尝试preflist中的下一个可用节点来实现。 + +

提示移交(Hinted Handoff)

+ +When a node goes down, data is replicated to a backup node. This is +not permanent; Riak will periodically examine whether each vnode +resides on the correct physical node and hands them off to the proper +node when possible. + +当节点停机时, 数据将复制到备份节点。这不是永久性的;riak会定期检查是否每个vnode驻留在正确的物理节点上, 并在可能需要的时候将它们交给适当的节点。 + +As long as the temporary node cannot connect to the primary, it will continue +to accept write and read requests on behalf of its incapacitated brethren. + +只要临时节点无法连接到主点, 它就会继续代表其无行为能力的兄弟节点接受读写请求。 + +Hinted handoff not only helps Riak achieve high availability, it also facilitates +data migration when physical nodes are added or removed from the Ring. + +提示移交不仅有助于riak实现高可用性, 而且在从环中添加或删除物理节点时, 它也便于数据迁移。 + + +## 管理集群 + +Now that we have a grasp of the general concepts of Riak, how users query it, +and how Riak manages replication, it's time to build a cluster. It's so easy to +do, in fact, I didn't bother discussing it for most of this book. + +现在, 我们掌握了 riak 的一般概念, 用户如何查询它 以及riak如何管理复制, 所以是时候构建集群了。事实上,这是很容易做到的, 但是这本书的大部分我都没有费心讨论它。 + +

安装

+ +The Riak docs have all of the information you need to [install](http://docs.basho.com/riak/latest/tutorials/installation/) it per operating system. The general sequence is: +riak 文档的所有信息都需要 [安装] (http://docs.basho.com/riak/latest/tutorials/installation/) 每个操作系统。一般顺序是: + +1. Install Erlang + +1. 安装Erlang + +2. Get Riak from a package manager (a la `apt-get` or Homebrew), or build from source (the results end up under `rel/riak`, with the binaries under `bin`). + +2.从包管理器 (a la `apt-get` or Homebrew) 获取 riak, 或从源生成 (结果最终在 "riak" 下, 二进制文件在 "bin" 下)。 + +3. Run `riak start` + +3. 运行`riak start` + +Install Riak on four or five nodes---five being the recommended safe minimum for production. Fewer nodes are OK during software development and testing. + +在四或五个节点上安装 riak-五个是推荐的安全生产线。在软件开发和测试期间, 较少的节点是可行的的。 + +

命令行(Command Line)

+ +Most Riak operations can be performed though the command line. We'll concern ourselves with two commands: `riak` and `riak-admin`. + +大多数 riak 操作都可以通过命令行执行。我们将关注两个命令: “riak” 和“riak-admin”。 + +

riak

+ +Simply typing the `riak` command will give a usage list. If you want more information, you can try `riak help`. + +只需键入 "riak" 命令就会给出一个使用表。如果您需要更多信息, 可以尝试 "riak help"。 + +```bash +Usage: riak +where is one of the following: + { help | start | stop | restart | ping | console | attach + attach-direct | ertspath | chkconfig | escript | version | getpid + top [-interval N] [-sort { reductions | memory | msg_q }] [-lines N] } | + config { generate | effective | describe VARIABLE } [-l debug] + +运行"riak help"来得到更多细节上的信息. +``` + +Most of these commands are self explanatory, once you know what they mean. `start` and `stop` are simple enough. `restart` means to stop the running node and restart it inside of the same Erlang VM (virtual machine), while `reboot` will take down the Erlang VM and restart everything. + +这些命令大多是自我解释的, 一旦你知道它们的含义。“开始”和“停止”是很简单的。"重新启动" 意味着停止运行的节点并在同一个erlang vm(虚拟机) 内重新启动它, 而 "重新启动" 将占用erlang vm并重新启动所有内容。 + +You can print the current running `version`. `ping` will return `pong` if the server is in good shape, otherwise you'll get the *just-similar-enough-to-be-annoying* response `pang` (with an *a*), or a simple `Node X not responding to pings` if it's not running at all. + +您可以打印当前运行的 "版本"。如果服务器处于良好状态, "ping" 将返回 "pong". 否则,如果它没有运行,您将得到 *只是类似的足以令人讨厌* 响应 "pong"(带一个*a*), 或一个简单的"节点 x 没有响应 pings"。 + +`chkconfig` is useful if you want to ensure your `etc/riak.conf` is not broken +(that is to say, it's parsable). I mentioned `attach` briefly above, when +we looked into the details of the Ring---it attaches a console to the local +running Riak server so you can execute Riak's Erlang code. `escript` is similar +to `attach`, except you pass in script file of commands you wish to run automatically. + +"chkconfig"是有用的, 如果你想确保你的 "etc/riak.conf"没有被破坏(也就是说, 它是可解析的)。我在前面简要地提到了“附加” ,当我们看到环的细节-它附加一个控制台到本地运行的 riak 服务器, 以便您可以执行 riak 的 erlang 代码。"escript" 与 "附加" 类似, 只不过是在你希望自动运行的命令的脚本文件中传递。 + + + +

riak-admin命令

+ +The `riak-admin` command is the meat operations, the tool you'll use most often. This is where you'll join nodes to the Ring, diagnose issues, check status, and trigger backups. + +"riak-admin" 命令是meat操作, 你最常使用的工具。这是您将节点加入到环中、诊断问题、检查状态和触发备份的地方。 + +```bash +Usage: riak-admin { cluster | join | leave | backup | restore | test | + reip | js-reload | erl-reload | wait-for-service | + ringready | transfers | force-remove | down | + cluster-info | member-status | ring-status | vnode-status | + aae-status | diag | status | transfer-limit | reformat-indexes | + top [-interval N] [-sort reductions|memory|msg_q] [-lines N] | + downgrade-objects | security | bucket-type | repair-2i | + search | services | ensemble-status } +``` + +For more information on commands, you can try `man riak-admin`. + +更多关于命令的信息,你可以尝试“man riak-admin”。 + +A few of these commands are deprecated, and many don't make sense without a +cluster, but some we can look at now. + +这些命令中的大部分是不推荐的, 许多没有集群的意义, 但我们现在可以看到一些。 + +`status` outputs a list of information about this cluster. It's mostly the same information you can get from getting `/stats` via HTTP, although the coverage of information is not exact (for example, riak-admin status returns `disk`, and `/stats` returns some computed values like `gossip_received`). + +"状态" 输出有关此群集的信息表。通过 http 获取 "/stats" 的信息大部分是相同的, 尽管信息的覆盖率并不确切 (例如,riak-admin状态返回 "disk", "/stats" 返回一些计算值, 如 "gossip_received")。 + +```bash +$ riak-admin status +1-minute stats for 'riak@AAA.cluster' +------------------------------------------- +vnode_gets : 0 +vnode_gets_total : 2 +vnode_puts : 0 +vnode_puts_total : 1 +vnode_index_reads : 0 +vnode_index_reads_total : 0 +vnode_index_writes : 0 +vnode_index_writes_total : 0 +vnode_index_writes_postings : 0 +vnode_index_writes_postings_total : 0 +vnode_index_deletes : 0 +... +``` + +New JavaScript or Erlang files (as we did in the [developers](#developers) chapter) are not usable by the nodes until they are informed about them by the `js-reload` or `erl-reload` command. + +新的 javascript 或 erlang 文件 (正如我们在 [开发人员] (# 开发者) 章节中所做的那样), 这些节点都无法使用, 直到它们被 "js-reload" 或 "erl-reload"命令告知它们。 + +`riak-admin` also provides a little `test` command, so you can perform a read/write cycle +to a node, which I find useful for testing a client's ability to connect, and the node's +ability to write. + +"riak-admin" 还提供了一个小的 "测试" 命令, 因此您可以对节点执行读/写循环, 这对于测试客户端的连接能力和节点的写入能力非常有用。 + +Finally, `top` is an analysis command checking the Erlang details of a particular node in +real time. Different processes have different process ids (Pids), use varying amounts of memory, +queue up so many messages at a time (MsgQ), and so on. This is useful for advanced diagnostics, +and is especially useful if you know Erlang or need help from other users, the Riak team, or +Basho. + +最后, "top" 是一个分析命令, 实时检查特定节点的 erlang 细节。不同的进程具有不同的进程ids (pids), 使用不同数量的内存, 一次将这么多消息排队 (MsgQ) 等。这对于高级诊断非常有用, 如果您知道 erlang 或需要来自其他用户、riak团队或Basho的帮助, 则特别有用。 + +![Top](../assets/top.png) + +

制作群集(Making a Cluster)

+ +With several solitary nodes running---assuming they are networked and are able to communicate to +each other---launching a cluster is the simplest part. + +Executing the `cluster` command will output a descriptive set of commands. + +```bash +$ riak-admin cluster +The following commands stage changes to cluster membership. These commands +do not take effect immediately. After staging a set of changes, the staged +plan must be committed to take effect: + +下面的命令将对群集成员身份进行更改。这些命令不会立即生效。在进行一组更改后, 分级计划必须承诺生效: + + join Join node to the cluster containing + 将节点联接到包含 的群集 + + leave Have this node leave the cluster and shutdown + 让此节点离开群集并关闭 + + leave Have leave the cluster and shutdown + 已 离开群集并关闭 + + force-remove Remove from the cluster without + first handing off data. Designed for + crashed, unrecoverable nodes + 从群集中移除 , 而不首先交出数据。专为崩溃, 无法恢复的节点 + + replace Have transfer all data to , + and then leave the cluster and shutdown + 已 将所有数据传输到 , 然后离开群集并关闭 + + force-replace Reassign all partitions owned by + to without first handing off data, + and remove from the cluster. + 重新指派 拥有的所有分区, 而不先传递数据, 然后从群集中删除 + + +Staging commands: + plan Display the staged changes to the cluster + 显示对群集的分级更改 + + commit Commit the staged changes + 提交分级更改 + + clear Clear the staged changes + 清除分级更改 + +``` + +To create a new cluster, you must `join` another node (any will do). Taking a +node out of the cluster uses `leave` or `force-remove`, while swapping out +an old node for a new one uses `replace` or `force-replace`. + +要创建新群集, 您必须 "加入" 另一个节点 (任何能够执行的节点)。使用 "离开" 或 "强制删除"从集群中取出一个节点, 同时使用 "替换" 或 "强制替换"用旧节点换出一个新的节点。 + + +I should mention here that using `leave` is the nice way of taking a node +out of commission. However, you don't always get that choice. If a server +happens to explode (or simply smoke ominously), you don't need its approval +to remove it from the cluster, but can instead mark it as `down`. + +我应该在这里提到, 使用"离开" 是一个很好的方式使节点停止工作。但是, 你并不能总做那个选择。如果服务器碰巧发生爆炸 (或只是不祥的烟雾), 您将其从集群中删除不需要它的批准, 但可以标记为 "向下"。 + +But before we worry about removing nodes, let's add some first. + +但是在我们担心删除节点之前, 我们可以先添加一些节点。 + +```bash +$ riak-admin cluster join riak@AAA.cluster +Success: staged join request for 'riak@BBB.cluster' to 'riak@AAA.cluster' +$ riak-admin cluster join riak@AAA.cluster +Success: staged join request for 'riak@CCC.cluster' to 'riak@AAA.cluster' +``` + +Once all changes are staged, you must review the cluster `plan`. It will give you +all of the details of the nodes that are joining the cluster, and what it +will look like after each step or *transition*, including the `member-status`, +and how the `transfers` plan to handoff partitions. + +一旦所有更改都被暂存, 您必须审阅群集 "计划"。它将提供给你所有加入群集的节点的详细信息, 以及在每个步骤或 *过渡* 的状态, 包括 "成员状态", 以及 "传输" 计划如何切换分区。 + +Below is a simple plan, but there are cases when Riak requires multiple +transitions to enact all of your requested actions, such as adding and removing +nodes in one stage. + +下面是一个简单的计划, 但是当 riak 需要多个转换来制定所有请求的操作时, 例如在一个阶段添加和删除节点。 + +```bash +$ riak-admin cluster plan +=============================== Staged Changes ============== +Action Nodes(s) +------------------------------------------------------------- +join 'riak@BBB.cluster' +join 'riak@CCC.cluster' +------------------------------------------------------------- + + +NOTE: Applying these changes will result in 1 cluster transition + +############################################################# + After cluster transition 1/1 +############################################################# + +================================= Membership ================ +Status Ring Pending Node +------------------------------------------------------------- +valid 100.0% 34.4% 'riak@AAA.cluster' +valid 0.0% 32.8% 'riak@BBB.cluster' +valid 0.0% 32.8% 'riak@CCC.cluster' +------------------------------------------------------------- +Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 + +WARNING: Not all replicas will be on distinct nodes + +Transfers resulting from cluster changes: 42 + 21 transfers from 'riak@AAA.cluster' to 'riak@CCC.cluster' + 21 transfers from 'riak@AAA.cluster' to 'riak@BBB.cluster' +``` + +Making changes to cluster membership can be fairly resource intensive, +so Riak defaults to only performing 2 transfers at a time. You can +choose to alter this `transfer-limit` using `riak-admin`, but bear in +mind the higher the number, the greater normal operations will be +impinged. + +对群集成员身份进行更改可以相当占用资源, 因此 riak 默认只能一次执行2传输。你可以使用 “riak-admin”选择改变这个 “转移限制” , 但切记, 数字越高, 正常操作将受到的冲击越大。 + +At this point, if you find a mistake in the plan, you have the chance to `clear` it and try +again. When you are ready, `commit` the cluster to enact the plan. + +此时, 如果你在计划中发现了一个错误, 你就有机会 "清除" 并再试一次。当您准备就绪时, "提交" 群集来制定计划。 + +```bash +$ riak-admin cluster commit +Cluster changes committed +``` + +Without any data, adding a node to a cluster is a quick operation. However, with large amounts of +data to be transferred to a new node, it can take quite a while before the new node is ready to use. + +如果没有任何数据, 将节点添加到群集是一个快速操作。但是, 如果要将大量数据转移到新节点, 则在新节点准备使用之前可能需要一段时间。 + +

状态选项(Status Options)

+ +To check on a launching node's progress, you can run the `wait-for-service` command. It will +output the status of the service and stop when it's finally up. In this example, we check +the `riak_kv` service. + +要检查启动节点的进度, 可以运行 "wait-for-service" 命令。它将输出服务的状态, 并在最终结束时停止。在本例中, 我们检查 "riak_kv" 服务。 + +```bash +$ riak-admin wait-for-service riak_kv riak@CCC.cluster +riak_kv is not up: [] +riak_kv is not up: [] +riak_kv is up +``` + +You can get a list of available services with the `services` command. + +您可以使用 "services" 命令获取可用服务的列表。 + +You can also see if the whole ring is ready to go with `ringready`. If the nodes do not agree +on the state of the ring, it will output `FALSE`, otherwise `TRUE`. + +你也可以看到伴随着“ringready”整个环是否准备进行。如果节点不同意环的状态, 它将输出 "false", 否则为 "true"。 + +```bash +$ riak-admin ringready +TRUE All nodes agree on the ring ['riak@AAA.cluster','riak@BBB.cluster', + 'riak@CCC.cluster'] +``` + +For a more complete view of the status of the nodes in the ring, you can check out `member-status`. + +有关环中节点状态的更完整视图, 可以检查 "member-status"。 + +```bash +$ riak-admin member-status +================================= Membership ================ +Status Ring Pending Node +------------------------------------------------------------- +valid 34.4% -- 'riak@AAA.cluster' +valid 32.8% -- 'riak@BBB.cluster' +valid 32.8% -- 'riak@CCC.cluster' +------------------------------------------------------------- +Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 +``` + +And for more details of any current handoffs or unreachable nodes, try `ring-status`. It +also lists some information from `ringready` and `transfers`. Below I turned off the C +node to show what it might look like. + +有关任何当前切换或无法访问的节点的详细信息, 请尝试 "ring-status"。它还列出了一些来自 "ringready" 和 "transfers" 的信息。下面我关闭 c 节点, 以显示它可能的状态。 + +```bash +$ riak-admin ring-status +================================== Claimant ================= +Claimant: 'riak@AAA.cluster' +Status: up +Ring Ready: true + +============================== Ownership Handoff ============ +Owner: dev1 at 127.0.0.1 +Next Owner: dev2 at 127.0.0.1 + +Index: 182687704666362864775460604089535377456991567872 + Waiting on: [] + Complete: [riak_kv_vnode,riak_pipe_vnode] +... + +============================== Unreachable Nodes ============ +The following nodes are unreachable: ['riak@CCC.cluster'] + +WARNING: The cluster state will not converge until all nodes +are up. Once the above nodes come back online, convergence +will continue. If the outages are long-term or permanent, you +can either mark the nodes as down (riak-admin down NODE) or +forcibly remove the nodes from the cluster (riak-admin +force-remove NODE) to allow the remaining nodes to settle. + +警告: 在所有节点向上之前, 群集状态将不会收敛。一旦上述节点重新上线, 收敛就会继续下去。如果停机是长期的或永久性的, 您可以将节点标记为向下 (riak-admin down节点), 或者强行从集群中删除节点 (riak-admin force-remove节点)以允许余下的节点结算。 + +``` + +If all of the above information options about your nodes weren't enough, you can +list the status of each vnode per node, via `vnode-status`. It'll show each +vnode by its partition number, give any status information, and a count of each +vnode's keys. Finally, you'll get to see each vnode's backend type---something I'll +cover in the next section. + +如果有关节点的上述所有信息选项都不够, 您可以通过 "vnode-status" 列出每个节点的每个 vnode 的状态。它将按其分区号显示每个 vnode, 提供任何状态信息, 以及每个 vnode 的键的计数。最后, 您将看到每个 vnode 的后端类型--我将在下一部分中介绍。 + +```bash +$ riak-admin vnode-status +Vnode status information +------------------------------------------- + +VNode: 0 +Backend: riak_kv_bitcask_backend +Status: +[{key_count,0},{status,[]}] + +VNode: 91343852333181432387730302044767688728495783936 +Backend: riak_kv_bitcask_backend +Status: +[{key_count,0},{status,[]}] + +VNode: 182687704666362864775460604089535377456991567872 +Backend: riak_kv_bitcask_backend +Status: +[{key_count,0},{status,[]}] + +VNode: 274031556999544297163190906134303066185487351808 +Backend: riak_kv_bitcask_backend +Status: +[{key_count,0},{status,[]}] + +VNode: 365375409332725729550921208179070754913983135744 +Backend: riak_kv_bitcask_backend +Status: +[{key_count,0},{status,[]}] +... +``` + +Some commands we did not cover are either deprecated in favor of their `cluster` +equivalents (`join`, `leave`, `force-remove`, `replace`, `force-replace`), or +flagged for future removal `reip` (use `cluster replace`). + +一些我们没有覆盖的命令要么被否决来支持它们的 "集群" 等价性 ("join"、"leave"、"force-remove"、"replace"、"force-replace"), 或者标记为将来删除 "reip" (使用 "集群替换")。 + +I know this was a lot to digest, and probably pretty dry. Walking through command +line tools usually is. There are plenty of details behind many of the `riak-admin` +commands, too numerous to cover in such a short book. I encourage you to toy around +with them on your own installation. + +我知道这有很多需要消化, 可能相当干燥。通常是通过命令行工具行走。在“riak 管理”的命令背后有很多的细节, 太多以至于不能在这样一本短的书里全部涉及。我鼓励你们自己的安装后好好享受挖掘它的那份快乐。 + + +## 在 Riak 2.0的新的部分(New in Riak 2.0) + +Riak has been a project since 2009. And in that time, it has undergone a few evolutions, largely technical improvements, such as more reliability and data safety mechanisms like active anti-entropy. + +riak 自2009以来一直是一个项目。在这段时间里, 它经历了几次进化, 主要是技术上的改进, 比如更可靠的数据安全机制, 比如活跃的反熵。 + +Riak 2.0 was not a rewrite, but rather, a huge shift in how developers who use Riak interact with it. While Basho continued to make backend improvements (such as better cluster metadata) and simplified using existing options (`repair-2i` is now a `riak-admin` command, rather than code you must execute), the biggest changes are immediately obvious to developers. But many of those improvements are also made easier for operators to administrate. So here are a few highlights of the new 2.0 interface options. + +riak 2.0 不是一个重写, 而在开发人员如何使用 riak 与它互动上是一个巨大的转变。虽然Basho继续进行后端改进 (如更好的集群元数据), 并使用现有选项进行简化 ("repair-2i" 现在是一个 "riak-admin" 命令, 而不是您必须执行的代码), 但对开发人员来说, 最大的改变是显而易见的。但是, 这些改进的许多也使得运营商更容易管理。下面是新增2.0 界面选项的一些亮点。 + +

桶类型(Bucket Types)

+ +A centerpiece of the new Riak 2.0 features is the addition of a higher-level bucket configuration namespace called *bucket types*. We discussed the general idea of bucket types in the previous chapters, but one major departure from standard buckets is that they are created via the command-line. This means that operators with server access can manage the default properties that all buckets of a given bucket type inherit. + +新的 riak 2.0 功能的核心是增加一个更高层次的称为 *桶类型* 的桶配置命名空间。我们在前几章中讨论了的桶类型的一般概念, 但从标准桶中的一个主要偏离是它们是通过命令行创建的。这意味着具有服务器访问权限的运算符可以管理给定存储桶类型的所有存储桶继承的默认属性。 + +Bucket types have a set of tools for creating, managing and activating them. + +桶类型有一组用于创建、管理和激活它们的工具。 + +```bash +$ riak-admin bucket-type +Usage: riak-admin bucket-type + +The follow commands can be used to manage bucket types for the cluster: + + list List all bucket types and their activation status + status Display the status and properties of a type + activate Activate a type + create Create or modify a type before activation + update Update a type after activation +``` + +It's rather straightforward to `create` a bucket type. The JSON string accepted after the bucket type name are any valid bucket propertied. Any bucket that uses this type will inherit those properties. For example, say that you wanted to create a bucket type whose n_val was always 1 (rather than the default 3), named unsafe. + +"创建" 一个桶类型非常简单。在桶类型名称后接受的 json 字符串是任何有效的存储桶。任何使用此类型的桶都将继承这些属性。例如, 假设您要创建一个桶类型, 其 n_val 始终为 1 (而不是缺省值 3), 命名为 "不安全"。 + +```bash +$ riak-admin bucket-type create unsafe '{"props":{"n_val":1}}' +``` + +Once you create the bucket type, it's a good idea to check the `status`, and ensure the properties are what you meant to set. + +一旦你创建桶类型后, 最好检查 "status", 并确保属性是您要设置的。 + +创建桶类型后, 最好检查 "状态", 并确保属性是您要设置的。 + +```bash +$ riak-admin bucket-type status unsafe +``` + +A bucket type is not active until you propgate it through the system by calling the `activate` command. + +在您通过调用 "activate 命令 propgate 系统之前, 桶类型不处于活动状态 + +```bash +$ riak-admin bucket-type activate unsafe +``` + +If something is wrong with the type's properties, you can always `update` it. + +如果类型的属性出错, 则始终可以 "更新" 它。 + +```bash +$ riak-admin bucket-type update unsafe '{"props":{"n_val":1}}' +``` + +You can update a bucket type after it's actived. All of the changes that you make to the type will be inherited by every bucket under that type. + +您可以在他被激活后更新桶类型。您对该类型所做的所有更改将由该类型下的每个桶继承。 + +Of course, you can always get a `list` of the current bucket types in the system. The list will also say whether the bucket type is activated or not. + +当然, 您总是可以获得系统中当前桶类型的 "列表"。该列表还将说明该桶类型是否已激活。 + +Other than that, there's nothing interesting about bucket types from an operations point of view, per se. Sure, there are some cool internal mechanisms at work, such as propogated metadata via a path laied out by a plum-tree and causally tracked by dotted version vectors. But that's only code plumbing. What's most interesting about bucket types are the new features you can take advantage of: datatypes, strong consistency, and search. + +除了这一点, 从操作的角度来看, 关于桶形类型没有什么有趣的事, 本身就是这样. 当然, 在工作中有一些很酷的内部机制, 比如通过一条梅树状拟定的路径基值传播元数据, 并通过点式版本向量进行因果追踪。但这只是管道的代码。桶类型最有意思的是您可以利用的新功能: 数据类型、强一致性和搜索。 + + +

数据类型(Datatypes)

+ +Datatypes are useful for engineers, since they no longer have to consider the complexity of manual conflict merges that can occur in fault situations. It can also be less stress on the system, since larger objects need only communicate their changes, rather than reinsert the full object. + +数据类型对于工程师非常有用, 因为它们不再需要考虑在故障情况下可能发生的手动冲突合并的复杂性。它也可以减少系统的压力, 因为更大的对象只需要传达他们的变化, 而不是重新插入完整的对象。 + +Riak 2.0 supports four datatypes: *map*, *set*, *counter*, *flag*. You create a bucket type with a single datatype. It's not required, but often good form to name the bucket type after the datatype you're setting. + +riak 2.0 支持四种数据类型:*map*, *set*, *counter*, *flag*。您可以使用单个数据类型创建桶类型。这不是必需的, 但通常来说较好的形式是在您设置的数据类型之后命名桶类型。 + +```bash +$ riak-admin bucket-type create maps '{"props":{"datatype":"map"}}' +$ riak-admin bucket-type create sets '{"props":{"datatype":"set"}}' +$ riak-admin bucket-type create counters '{"props":{"datatype":"counter"}}' +$ riak-admin bucket-type create flags '{"props":{"datatype":"flag"}}' +``` + +Once a bucket type is created with the given datatype, you need only active it. Developers can then use this datatype like we saw in the previous chapter, but hopefully this example makes clear the suggestion of naming bucket types after their datatype. + +一旦使用给定数据类型创建桶类型后, 您只需激活它。然后, 开发人员可以像我们在上一章中看到的那样使用此数据类型, 但希望本示例明确说明在数据类型之后命名桶类型的建议。 + +```bash +curl -XPUT "$RIAK/types/counters/buckets/visitors/keys/index_page" \ + -H "Content-Type:application/json" + -d 1 +``` + + +

强一致性(Strong Consistency)

+ +Strong consistency (SC) is the opposite of everything that Riak stands for. Where Riak is all about high availability in the face of network or server errors, strong consistency is about safety over liveness. Either the network and servers are working perfectly, or the reads and writes fail. So why on earth would we ever want to provide SC and give up HA? Because you asked for. Really. + +强的一致性 (sc) 是 riak 所代表的一切的对立面。在网络或服务器错误的面前, riak 都是高可用性集群, 强一致性是关于活跃度之上的安全。网络和服务器工作正常, 或者读写失败。那么, 为什么我们要在地球上提供sc和放弃HA?因为这是你所需求的。这是事实. + +There are some very good use-cases for strong consistency. For example, when a user is completing a purchase, you might want to ensure that the system is always in a consistent state, or fail the purchase. Communicating that a purchase was made when it in fact was not, is not a good user experience. The opposite is even worse. + +有一些非常好的要保持强一致性的用例。例如, 当用户正在完成购买时, 您可能希望确保系统始终处于一致状态, 否则购买失败。当实际不是如此时, 购买所导致的沟通不是一个好的用户体验。相反的情况更糟。 + +While Riak will continue to be primarily an HA system, there are cases where SC is useful, and developers should be allowed to choose without having to install an entirely new database. So all you need to do is activate it in `riak.conf`. + +虽然 riak 将继续主要作为一个 HA 系统, 但有些情况下 sc 是有用的, 而开发者应该被允许选择不必安装一个全新的数据库。所以你需要做的就是在 "riak.conf" 中激活它。 + +```bash +strong_consistency = on +``` + +One thing to note is, although we generally recommend you have five nodes in a Riak cluster, it's not a hard requirement. Strong consistency, however, requires three nodes. It will not operate with fewer. + +要注意的一点是, 虽然我们通常建议您在 riak 集群中有五个节点, 但这不是一个硬性要求。但是, 强一致性需要三节点。少于三个节点它将不能操作。 + +Once our SC systme is active, you'll lean on bucket types again. Only buckets that live under a bucket type setup for strong consistency will be strongly consistent. This means that you can have some buckets HA, other SC, in the same database. Let's call our SC bucket type `strong`. + +一旦我们的sc系统是活跃的, 你会再次倾向于桶类型。只有在桶类型设置下的存储桶才能保持强一致性。这意味着您可以在同一数据库中拥有一些高可用性集群桶类型, 以及其他一些sc类型。让我们称之为 "强大" 的SC桶类型。 + +```bash +$ riak-admin bucket-type create strong '{"props":{"consistent":true}}' +$ riak-admin bucket-type activate strong +``` + +That's all the operator should need to do. The developers can use the `strong` bucket similarly to other buckets. + +这就是操作员需要做的所有事情。开发商可以使用 "强" 桶类比到其他桶类型。 + +```bash +curl -XPUT "$RIAK/types/strong/buckets/purchases/keys/jane" \ + -d '{"action":"buy"}' +``` + +Jane's purchases will either succeed or fail. It will not be eventually consistent. If it fails, of course, she can try again. + +简的购买要么成功要么失败。它最终不会是一致的。如果它失败了, 当然, 她可以再试一次。 + + +What if your system is having problems with strong consistency? Basho has provided a command to interrogate the current status of the subsystem responsible for SC named ensemble. You can check it out by running `ensemble-status`. + +如果您的系统有很强的一致性问题怎么办?Basho提供了一个命令来询问负责全部命名sc的子系统的当前状态。您可以通过运行 "ensemble-status" 来检查它。 + +```bash +$ riak-admin ensemble-status +``` + +It will give you the best information it has as to the state of the system. For example, if you didn't enable `strong_consistency` in every node's `riak.conf`, you might see this. + +它将为您提供已有的与系统状态有关的最佳信息。例如, 如果在每个节点的 riak.conf" 中没有启用 "strong_consistency", 则您可能会看到这一点。 + +```bash +============================== Consensus System =============================== +Enabled: false +Active: false +Ring Ready: true +Validation: strong (trusted majority required) +Metadata: best-effort replication (asynchronous) + +Note: The consensus subsystem is not enabled. + +================================== Ensembles ================================== +There are no active ensembles. +``` + +In the common case when all is working, you should see an output similar to the following: + +在常见的情况下, 当所有工作正常时, 您应该看到类似于以下内容的输出: + +```bash +============================== Consensus System =============================== +Enabled: true +Active: true +Ring Ready: true +Validation: strong (trusted majority required) +Metadata: best-effort replication (asynchronous) + +================================== Ensembles ================================== + Ensemble Quorum Nodes Leader +------------------------------------------------------------------------------- + root 4 / 4 4 / 4 riak@riak1 + 2 3 / 3 3 / 3 riak@riak2 + 3 3 / 3 3 / 3 riak@riak4 + 4 3 / 3 3 / 3 riak@riak1 + 5 3 / 3 3 / 3 riak@riak2 + 6 3 / 3 3 / 3 riak@riak2 + 7 3 / 3 3 / 3 riak@riak4 + 8 3 / 3 3 / 3 riak@riak4 +``` + +This output tells you that the consensus system is both enabled and active, as well as lists details about all known consensus groups (ensembles). + +此输出告诉您, 协商一致的系统既启用又有效, 并列出所有已知一致的组 (集成) 的详细信息。 + +There is plenty more information about the details of strong consistency in the online docs. + +在线文档中有远远更多的有关强一致性细节的详细信息。 + + +

搜索 2.0(Search 2.0)

+ +From an operations standpoint, search is deceptively simple. Functionally, there isn't much you should need to do with search, other than activate it in ``. + +从操作的角度来看, 搜索看似简单。在功能上, 除了在 "riak.conf" 中激活它之外, 您不需要使用搜索来做很多事情。 + +```bash +search = on +``` + +However, looks are deceiving. Under the covers, Riak Search 2.0 actually runs the search index software called Solr. Solr runs as a Java service. All of the code required to convert an object that you insert into a document that Solr can recognize (by a module called an *Extractor*) is Erlang, and so is the code which keeps the Riak objects and Solr indexes in sync through faults (via AAE), as well as all of the interfaces, security, stats, and query distribution. But since Solr is Java, we have to manage the JVM. + +然而, 外表是骗人的。在所涉及范围下, riak Search 2.0 实际上运行了称为 solr 的搜索索引软件。solr 作为java服务器运行。将你插入的对象转换成solr可以识别的文档(由称为 *Extractor* 的模块) 的被要求所有代码都是Erlang, 并且通过故障 (通过 AAE)使 riak 对象和 solr 索引同步的代码也一样, 包括所有的接口, 安全, 统计和查询分布。但是由于 solr 是 java, 我们必须运用jvm。 + +If you don't have much experience running Java code, let me distill most problems for you: you need more memory. Solr is a memory hog, easily requiring a minimum of 2 GiB of RAM dedicated only to the Solr service itself. This is in addition to the 4 GiB of RAM minimum that Basho recommends per node. So, according to math, you need a minimum of 6 GiB of RAM to run Riak Search. But we're not quite through yet. + +如果您没有运行 java 代码的经验, 请让我为您提炼出大多数问题: 您需要更多的内存。solr 是非常占用内存, 很容易需要至少2GiB的RAM专用于solr服务本身。还不算上4 GiB的Basho所建议每个节点的RAM最小值,。因此, 根据数学, 你需要至少6 GiB的 ram 来运行 riak 搜索。但我们还没有完成。 + +The most important setting in Riak Search are the JVM options. These options are passed into the JVM command-line when the Solr service is started, and most of the options chosen are excellent defaults. I recommend not getting to hung up on tweaking those, with one notable exception. + +riak 搜索中最重要的设置是 jvm 选项。当 solr 服务启动时, 这些选项将传递到 jvm 命令行中, 而被选择的大多数选项都是极佳的缺省值。我建议有一个明显的例外时不要挂断调整。 + +```bash +## The options to pass to the Solr JVM. Non-standard options, +## i.e. -XX, may not be portable across JVM implementations. +## E.g. -XX:+UseCompressedStrings +## +## Default: -d64 -Xms1g -Xmx1g -XX:+UseStringCache -XX:+UseCompressedOops +## +## Acceptable values: +## - text +search.solr.jvm_options = -d64 -Xms1g -Xmx1g -XX:+UseStringCache -XX:+UseCompressedOops +``` + +In the default setting, Riak gives 1 GiB of RAM to the Solr JVM heap. This is fine for small clusters with small, lightly used indexes. You may want to bump those heap values up---the two args of note are: `-Xms1g` (minimum size 1 gigabyte) and `-Xmx1g` (maximum size 1 gigabyte). Push those to 2 or 4 (or even higher) and you should be fine. + +在缺省设置中, riak 向 solr jvm 堆提供1 GiB的随机存取存储器。对于小的集群, 简单地使用索引是很好的。你可能想撞那些堆值--两个参数的注解是: '-Xms1g ' (最小大小 1 十亿字节) 和 '-Xmx1g ' (最大大小 1 十亿字节)。把这些推到2或 4 (甚至更高)也是可以的。 + +In the interested of completeness, Riak also communicates to Solr internally through a port, which you can configure (along with an option JMX port). You should never need to connect to this port yourself. + +在对完整性感兴趣的情况下, riak还通过一个你可以配置的端口 (一个选项 jmx 端口) 在内部与 solr 通信。您绝不需要亲自连接到此端口。 + +```bash +## The port number which Solr binds to. +## NOTE: Binds on every interface. +## +## Default: 8093 +## +## Acceptable values: +## - an integer +search.solr.port = 8093 + +## The port number which Solr JMX binds to. +## NOTE: Binds on every interface. +## +## Default: 8985 +## +## Acceptable values: +## - an integer +search.solr.jmx_port = 8985 +``` + +There's generally no great reason to alter these defaults, but they're there if you need them. + +I should also note that, thanks to fancy bucket types, you can associate a bucket type with a search index. You associate buckets (or types) with indexes by adding a search_index property, with the name of a Solr index. Like so, assuming that you've created a solr index named `my_index`: + +```bash +$ riak-admin bucket-type create indexed '{"props":{"search_index":"my_index"}}' +$ riak-admin bucket-type activate indexed +``` + +Now, any object that a developer puts into yokozuna under that bucket type will be indexed. + +There's a lot more to search than we can possibly cover here without making it a book in its own right. You may want to checkout the following documentation in docs.basho.com for more details. + +* [Riak Search Settings](http://docs.basho.com/riak/latest/ops/advanced/configs/search/) +* [Using Search](http://docs.basho.com/riak/latest/dev/using/search/) +* [Search Details](http://docs.basho.com/riak/latest/dev/advanced/search/) +* [Search Schema](http://docs.basho.com/riak/latest/dev/advanced/search-schema/) +* [Upgrading Search from 1.x to 2.x](http://docs.basho.com/riak/latest/ops/advanced/upgrading-search-2/) + +

Security

+ +Riak has lived quite well in the first five years of its life without security. So why did Basho add it now? With the kind of security you get through a firewall, you can only get coarse-grained security. Someone can either access the system or not, with a few restrictions, depending on how clever you write your firewall rules. + +With the addition of Security, Riak now supports authentication (identifying a user) and authorization (restricting user access to a subset of commands) of users and groups. Access can also be restricted to a known set of sources. The security design was inspired by the full-featured rules in PostgreSQL. + +Before you decide to enable security, you should consider this checklist in advance. + +1. If you use security, you must upgrade to Riak Search 2.0. The old Search will not work (neither will the deprecated Link Walking). Check any Erlang MapReduce code for invocations of Riak modules other than `riak_kv_mapreduce`. Enabling security will prevent those from succeeding unless those modules are available via `add_path` +2. Make sure that your application code is using the most recent drivers +3. Define users and (optionally) groups, and their sources +4. Grant the necessary permissions to each user/group + +With that out of the way, you can `enable` security with a command-line option (you can `disable` security as well). You can optionally check the `status` of security at any time. + +```bash +$ riak-admin security enable +$ riak-admin security status +Enabled +``` + +Adding users is as easy as the `add-user` command. A username is required, and can be followed with any key/value pairs. `password` and `groups` are special cases, but everything is free form. You can alter existing users as well. Users can belong to any number of groups, and inherit a union of all group settings. + + +```bash +$ riak-admin security add-group mascots type=mascot +$ riak-admin security add-user bashoman password=Test1234 +$ riak-admin security alter-user bashoman groups=mascots +``` + +You can see the list of all users via `print-users`, or all groups via `print-groups`. + +```bash +$ riak-admin security print-users ++----------+----------+----------------------+---------------------+ +| username | groups | password | options | ++----------+----------+----------------------+---------------------+ +| bashoman | mascots |983e8ae1421574b8733824| [{"type","mascot"}] | ++----------+----------+----------------------+---------------------+ +``` + +Creating user and groups is nice and all, but the real reason for doing this is so we can distinguish authorization between different users and groups. You `grant` or `revoke` `permissions` to users and groups by way of the command line, of course. You can grant/revoke a permission to anything, a certain bucket type, or a specific bucket. + +```bash +$ riak-admin security grant riak_kv.get on any to all +$ riak-admin security grant riak_kv.delete on any to admin +$ riak-admin security grant search.query on index people to bashoman +$ riak-admin security revoke riak_kv.delete on any to bad_admin +``` + +There are many kinds of permissions, one for every major operation or set of operations in Riak. It's worth noting that you can't add search permissions without search enabled. + +* __riak\_kv.get__ --- Retrieve objects +* __riak\_kv.put__ --- Create or update objects +* __riak\_kv.delete__ --- Delete objects +* __riak\_kv.index__ --- Index objects using secondary indexes (2i) +* __riak\_kv.list\_keys__ --- List all of the keys in a bucket +* __riak\_kv.list\_buckets__ --- List all buckets +* __riak\_kv.mapreduce__ --- Can run MapReduce jobs +* __riak\_core.get\_bucket__ --- Retrieve the props associated with a bucket +* __riak\_core.set\_bucket__ --- Modify the props associated with a bucket +* __riak\_core.get\_bucket\_type__ --- Retrieve the set of props associated with a bucket type +* __riak\_core.set\_bucket\_type__ --- Modify the set of props associated with a bucket type +* __search.admin__ --- The ability to perform search admin-related tasks, like creating and deleting indexes +* __search.query__ --- The ability to query an index + +Finally, with our group and user created, and given access to a subset of permissions, we have one more major item to deal with. We want to be able to filter connection from specific sources. + +```bash +$ riak-admin security add-source all| [