Skip to content

Commit

Permalink
25章323
Browse files Browse the repository at this point in the history
  • Loading branch information
qiangmzsx committed Jun 14, 2023
1 parent 59ffe9b commit 52a3d1a
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ Serving jobs are, in many ways, more naturally suited to failure resistance than

However, there are also multiple serving applications that do not naturally fit that pattern. The canonical example would be any server that you intuitively describe as a “leader” of a particular system. Such a server will typically maintain the state of the system (in memory or on its local filesystem), and if the machine it is running on goes down, a newly created instance will typically be unable to re-create the system’s state. Another example is when you have large amounts of data to serve—more than fits on one machine—and so you decide to shard the data among, for instance, 100 servers, each holding 1% of the data, and handling requests for that part of the data. This is similar to statically assigning work to batch job workers; if one of the servers goes down, you (temporarily) lose the ability to serve a part of your data. A final example is if your server is known to other parts of your system by its hostname. In that case, regardless of how your server is structured, if this specific host loses network connectivity, other parts of your system will be unable to contact it.[^11]

然而,也有多个服务应用程序不适合这种模式。最典型的例子是你直观地描述为特定系统的“领导者”的任何服务器。这样的服务器通常会维护系统的状态(在内存中或在其本地文件系统中),如果它所运行的机器出现故障,新创建的实例通常无法重新创建系统的状态。另一个例子是,当你有大量的数据需要服务--超过一台机器所能容纳的--于是你决定将数据分片,比如说,100台服务器,每台都持有1%的数据,并处理这部分数据的请求。这类似于将工作静态地分配给批处理工作的worker;如果其中一个服务器发生故障,你就会(暂时)失去为部分数据服务的能力。最后一个示例是,系统的其他部分是否知道服务器的主机名。在这种情况下,无论服务器的结构如何,如果此特定主机失去网络连接,系统的其他部分将无法与之联系。
然而,也有多个服务应用程序不适合这种模式。典型的例子是任何你直观地描述为某个特定系统“领导者”的服务器。这样的服务器通常会维护系统的状态(在内存中或在其本地文件系统中),如果它所运行的机器出现故障,新创建的实例通常无法重新创建系统的状态。另一个例子是,当你有大量的数据需要服务--超过一台机器所能容纳的--于是你决定将数据分片,比如说,100台服务器,每台都持有1%的数据,并处理这部分数据的请求。这类似于将工作静态地分配给批处理工作的worker;如果其中一个服务器发生故障,你就会(暂时)失去为部分数据服务的能力。最后一个示例是,系统的其他部分是否知道服务器的主机名。在这种情况下,无论服务器的结构如何,如果此特定主机失去网络连接,系统的其他部分将无法与之联系。

> [^8]: Like all categorizations, this one isn’t perfect; there are types of programs that don’t fit neatly into any of the categories, or that possess characteristics typical of both serving and batch jobs. However, like most useful categorizations, it still captures a distinction present in many real-life cases.
>
Expand Down Expand Up @@ -296,7 +296,7 @@ The simplest way of dealing with this is extracting all storage to an external s

处理这个问题的最简单方法是将所有的存储提取到外部存储系统。这意味着任何应该在服务单一请求(在服务工作的情况下)或处理一个数据块(在批处理的情况下)的范围内生存的东西都需要存储在机器外的持久性存储中。如果所有的本地状态都是不可变的,那么让应用程序具有抗故障能力应该是相对容易的。

Unfortunately, most applications are not that simple. One natural question that might come to mind is, “How are these durable, persistent storage solutions implemented are *they* cattle?” The answer should be “yes.” Persistent state can be managed by cattle through state replication. On a different level, RAID arrays are an analogous concept; we treat disks as transient (accept the fact one of them can be gone) while still maintaining state. In the servers world, this might be realized through multiple replicas holding a single piece of data and synchronizing to make sure every piece of data is replicated a sufficient number of times (usually 3 to 5). Note that setting this up correctly is difficult (some way of consensus handling is needed to deal with writes), and so Google developed a number of specialized storage solutions[^13] that were enablers for most applications adopting a model where all state is transient.
Unfortunately, most applications are not that simple. One natural question that might come to mind is, “How are these durable, persistent storage solutions implemented are *they* cattle?” The answer should be “yes.” Persistent state can be managed by cattle through state replication. On a different level, RAID arrays are an analogous concept; we treat disks as transient (accept the fact one of them can be gone) while still maintaining state. In the servers world, this might be realized through multiple replicas holding a single piece of data and synchronizing to make sure every piece of data is replicated a sufficient number of times (usually 3 to 5). Note that setting this up correctly is difficult (some way of consensus handling is needed to deal with writes), and so Google developed a number of specialized storage solutions[^13] that were enablers for most applications adopting a model where all state is transient.

不幸的是,大多数应用并不那么简单。可能会想到的一个自然而然问题是:"这些持久的存储解决方案是如何实现的—它们是**吗?" 答案应该是 "是的"。牛可以通过状态复制来管理持久状态。在不同的层面上,RAID阵列是一个类似的概念;我们将磁盘视为暂时的(接受其中一个可以消失的事实),同时仍保持主要状态。在服务器世界中,这可以通过多个副本来实现,多个副本保存一个数据段并进行同步,以确保每个数据段都被复制足够的次数(通常为3到5次)。请注意,正确设置此选项很困难(需要某种一致性处理方式来处理写操作),因此Google开发了许多专门的存储解决方案13,这些解决方案是采用所有状态都是瞬态的模型的大多数应用程序的推动者。

Expand Down

0 comments on commit 52a3d1a

Please sign in to comment.