Has there been any CP system passing a call-me-maybe test first time apart from ...

josephg · on April 22, 2015

FoundationDB. Kyle didn't bother running Jepsen against FDB because foundationdb's internal testing was much more rigorous that Jepsen. The foundationdb team ran it themselves and it passed with flying colors:

http://blog.foundationdb.com/call-me-maybe-foundationdb-vs-j...

Sadly, fdb has been bought by Apple[1], and you can't download it anymore. I sincerely hope foundationdb gets opensourced or something.

[1] http://techcrunch.com/2015/03/24/apple-acquires-durable-data...

takeda · on April 22, 2015

I believe the real reason was that FDB was not open sourced.

Jepsen tests cannot be used to prove the system is safe, but to prove it isn't.

It looks like very often he looks into source code in order to figure out how the system operates, that way he can find weaknesses and write test to his testing framework to demonstrate the issue.

I wouldn't trust any company that uses Jepsen to show that their product is safe.

on April 22, 2015

[deleted]

atombender · on April 22, 2015

He has said [1] that each test takes literally months to do. I am not surprised that he picks the most popular databases, given the amount of work.

[1] https://github.com/rethinkdb/rethinkdb/issues/1493#issuecomm...

timmaxw · on April 22, 2015

Actually, look at some of the things that @aphyr has tweeted about FoundationDB:

https://twitter.com/aphyr/status/542792308492484608

https://twitter.com/obfuscurity/status/405016890306985984

codahale · on April 22, 2015

Postgres: https://aphyr.com/posts/282-call-me-maybe-postgres

tracker1 · on April 22, 2015

Postgres can survive a network partition? I wasn't aware that master-master or sharding+replication was in the box yet?

atombender · on April 22, 2015

Did you read the article? It's not about PostgreSQL in a distributed setup:

> Even though the Postgres server is always consistent, the distributed system composed of the server and client together may not be consistent. It’s possible for the client and server to disagree about whether or not a transaction took place.

jeffdavis · on April 22, 2015

To me that's not surprising. If the client connection drops, you might not know whether the transaction committed or not... the only way to know is to reconnect and inspect to see what happened.

If you want to avoid that kind of problem, use 2PC.

Do you (or the author) see this as a bug, or just something that might surprise people who haven't thought through the guarantees?

cjbprime · on April 22, 2015

It looks like you didn't read the article either -- it describes how Postgres is using 2PC, but 2PC doesn't stop the server from committed something that the client is unaware of through missing the final ACK.

Quoting the article:

> The 2PC protocol says that we must wait for the acknowledgement message to arrive in order to decide the outcome. If it doesn’t arrive, 2PC deadlocks. It’s not a partition-tolerant protocol. Waiting forever isn’t realistic for real systems, so at some point the client will time out and declare an error occurred. The commit protocol is now in an indeterminate state.

jeffdavis · on April 22, 2015

What is the significance of that though? As you say, it's not partition-tolerant, so while partitioned, the system is down. As soon as the network issue is resolved, you can determine the state of your transaction.

It would be foolish for a client to issue a COMMIT and then assume the transaction aborted because of a connection drop. The client should wait until the connection can be reestablished and determine the real transaction state before making a decision based on it.

It's the same issue as with a power failure during fsync. The durability of that transaction is indeterminate, but it doesn't matter because the system is down. Before the system comes back, it will go through recovery, and either find the commit record or not, thus getting back to a determinate state.

ddlatham · on April 22, 2015

HBase. With the caveats that it's piggy backing off the success of Zookeeper, and the tests were not run by Kyle himself:

https://eng.yammer.com/call-me-maybe-hbase/

yobennett · on April 22, 2015

Note that Robert Yokota's addendum[1] points out that HBase "cannot achieve both consistency and availability." His earlier results did not take into account that HBase clients continuously retry failed ops. (According to Nicolas Liochon, upon its death a server's regions are moved to another node.) Failures start rolling in once network partition(s) extend beyond the configured timeout. Kyle [2] was not impressed:

"During the network partition, no requests are successful" is not the best result for a CP system, IMO."

HBase should provide partial availability in the face partitions.

[1] http://eng.yammer.com/call-me-maybe-hbase-addendum/

[2] https://twitter.com/aphyr/status/509841011816665088

krenoten · on April 22, 2015

No system can achieve 100% consistency and 100% availability in the presence of partitions. It's kind of wacky that Aphyr compared those replicated consensus tools and eventually consistent stores with HBase. HBase does not use a consensus protocol for replication, it uses HDFS. HBase is not eventually consistent. There is a single authoritative server for reads and writes of a lexicographic range of keys (a region) which writes immutable store files and a WAL to HDFS. Partial availability may be achievable for reads with a significant amount of effort and latency and limiting of total cluster size by allowing non-authoritative regionservers to read the HDFS WAL + Storefiles, but this really isn't realistic. I've personally been burned by the client retry thing though, and there's not really a better solution when you consider the types of workloads HBase is actually used for and it's incredibly variable latency (it aims only for consistency and very high AVERAGE throughput, at the cost of extremely high HIGHEST latency). One solution here could be the use of a configurable filesystem queue for clients. This is how you build resilient high-throughput pipelines. HBase is used in some places for OLTP, but only when the readload is very very low. So the effort to make reads more highly available would be in vain.

ddlatham · on April 22, 2015

No database can "achieve both consistency and availability" during a partition.

Also, if you follow the rest of the twitter conversation you may realize, as they did, that only requests to the minority partition are unsuccessful - which is exactly what you want from CP.