Gaming

Just setup a git repository for gaming at github.com/dragonghy/game.git (private). Document is put at github wiki. Some notes here.

Server/VPS

myhosting.com providers a nice VPS service which only costs me ~20 dollars per month. It’s provided with full root access and reasonable network speed/delay.

Use github.com to host the project costs $7/month. Actually that’s the price for 5 private repositories. Now it supports issue tracking and updating with git commits like “fix #xxx”, “close #xxx” or just refer to an issue with “#xxx”.

Display Engine

Two engines appear nice: PyGame (local) and Crafty (web).

PyGame provides a featured example called “Dead Creature Rising”, which feeds my need perfectly. Here’s the link. http://pygame.org/project-Dead+Creatures+Rising-988-.html

Crafty is a simple framework that allows javascript web game development. Check out this tutorial with DEMO! http://dailyjs.com/2011/02/11/crafty/

Code Document

We’ll follow Python’s conventional doc format as in epy doc. It’s not so featured as other documentation tools but more straight forward and simple. http://epydoc.sourceforge.net/

Now I just need schedule some regular coding slots on it and maybe grab some other guys to work on it together!

Django

Django is yet another web host framework using Python. It has some interesting features.

1. it provides data model that allows you easily connect data structure to websites with automatic generated pages.

2. template pages with Python integrated.

3. caching integrated, which sounds awesome.

I believe the first feature is how it gets successful. Similar to Thrift data model, Django here provides the possibility of easy monitoring, easy sharing and easy debugging.

Git tips

“git rebase” is another way to resolve conflicts. It basically allows you to apply all the changes on another revision. Specifically, “git rebase trunk” allows you to merge into the trunk.codes.

In addition, “git rebase -i trunk” can be used to merge commits from the last fetch: simply replace all the “pick” with “s” except the first line.

“git amend <-a>” allows you to merge current diffs into the previous commit without creating a new one. Very useful.

“git reset –soft ^HEAD” cancels the most recent commit without changing the code.

Using x++ in pthread programs is dangerous

Ignore this post. It’s totally a mistake caused by operator precedence.

Consider the following c-function.

void worker_thread(int tid, pthread_mutex_t x, int *pcount) {

pthread_mutex_lock(x);

fprintf(stderr, “thread %d obtains mutex, count = %d, pcount = %p.\n”,

tid, *pcount, pcount);

*pcount++;

pthread_mutex_unlock(x);

}

This function basically locks a mutex, increase the value of *pcount by 1 then returns. If you run 4 threads on this function, however, you’ll find out that the value of *pcount is never changed.

thread 0 obtains mutex, count = 0, pcount = 0x7fff24c6e6f0.
thread 1 obtains mutex, count = 0, pcount = 0x7fff24c6e6f0.
thread 2 obtains mutex, count = 0, pcount = 0x7fff24c6e6f0.
thread 3 obtains mutex, count = 0, pcount = 0x7fff24c6e6f0.

If you use “*pcount = *pcount + 1″ instead of *pcount++, you will get the expected output as

thread 0 obtains mutex, count = 0, pcount = 0x7fff24c6e6f0.
thread 1 obtains mutex, count = 1, pcount = 0x7fff24c6e6f0.
thread 2 obtains mutex, count = 2, pcount = 0x7fff24c6e6f0.
thread 3 obtains mutex, count = 3, pcount = 0x7fff24c6e6f0.

Now let’s take a look at the assembly code when using *pcount = *pcount + 1.It’s like following.

279 400dbd: 48 8b 45 e8 mov -0x18(%rbp),%rax
280 400dc1: 48 89 c7 mov %rax,%rdi
281 400dc4: e8 07 fe ff ff callq 400bd0 <pthread_mutex_lock@plt>
282 400dc9: 48 8b 45 f0 mov -0x10(%rbp),%rax
283 400dcd: 8b 08 mov (%rax),%ecx
284 400dcf: 48 8b 05 2a 13 20 00 mov 0x20132a(%rip),%rax # 602100 <stderr@@GLIBC_2.2.5>
285 400dd6: 48 8b 75 f0 mov -0x10(%rbp),%rsi
286 400dda: 8b 55 fc mov -0x4(%rbp),%edx
287 400ddd: 49 89 f0 mov %rsi,%r8
288 400de0: be 80 16 40 00 mov $0x401680,%esi
289 400de5: 48 89 c7 mov %rax,%rdi
290 400de8: b8 00 00 00 00 mov $0x0,%eax
291 400ded: e8 fe fd ff ff callq 400bf0 <fprintf@plt>
292 400df2: 48 8b 45 f0 mov -0x10(%rbp),%rax
293 400df6: 8b 00 mov (%rax),%eax
294 400df8: 8d 50 01 lea 0x1(%rax),%edx
295 400dfb: 48 8b 45 f0 mov -0x10(%rbp),%rax
296 400dff: 89 10 mov %edx,(%rax)
297 400e01: 48 8b 45 e8 mov -0x18(%rbp),%rax
298 400e05: 48 89 c7 mov %rax,%rdi
299 400e08: e8 23 fe ff ff callq 400c30 <pthread_mutex_unlock@plt>

In line 292~296, the value of *pcount is increased by 1. However, in the assembly code using *pcount++, we cannot find similar instructions.

279 400dfd: 48 8b 45 e8 mov -0x18(%rbp),%rax
280 400e01: 48 89 c7 mov %rax,%rdi
281 400e04: e8 07 fe ff ff callq 400c10 <pthread_mutex_lock@plt>
282 400e09: 48 83 45 f0 04 addq $0x4,-0x10(%rbp)
283 400e0e: 48 8b 45 f0 mov -0x10(%rbp),%rax
284 400e12: 8b 08 mov (%rax),%ecx
285 400e14: 48 8b 05 f5 12 20 00 mov 0x2012f5(%rip),%rax # 602110 <stdout@@GLIBC_2.2.5>
286 400e1b: 48 8b 75 f0 mov -0x10(%rbp),%rsi
287 400e1f: 8b 55 fc mov -0x4(%rbp),%edx
288 400e22: 49 89 f0 mov %rsi,%r8
289 400e25: be b0 16 40 00 mov $0x4016b0,%esi
290 400e2a: 48 89 c7 mov %rax,%rdi
291 400e2d: b8 00 00 00 00 mov $0x0,%eax
292 400e32: e8 f9 fd ff ff callq 400c30 <fprintf@plt>
293 400e37: 48 8b 45 e8 mov -0x18(%rbp),%rax
294 400e3b: 48 89 c7 mov %rax,%rdi
295 400e3e: e8 2d fe ff ff callq 400c70 <pthread_mutex_unlock@plt>

The reason is likely to be, “*pcount++” is optimized out by the compiler. So, using x++ in pthread programs is dangerous.

GCC version: 4.6.3

A survey on popular storage systems

BigTable, Google File System / Colossus

GFS and BigTable, the first two papers to describe a large scale, distributed file system or storage system, come from Google’s internal storage system serving indexing and many other application.

Google File System adopts a very straightforward master server to chunk server design and is implemented in user space. It partitions files into 64-MB chunks and stores copies of each chunks in several chunk servers. The master node, doesn’t necessarily store the actual chunks, but rather all the metadata associated with the chunks, e.g. the tables mapping chunk-label to chunk server, the location of copies. In GFS, read requests are forwarded to corresponding chunk servers and replied from chunk server to client directly for scalability. Write-request is a little bit complicated because the master node has to give one chunk server the permission of modification. This is handled by a “lease protocol”, in which the Master grants permission to a process for a finite period of time during which no other process can will be granted the permission.

GFS

GFS achieves great scalability and throughput: 100MB/s and even more for read request. However, it adopts a single serialized master-node design which could become the potential bottleneck of the whole system. This issue is address in following papers as I remembered. After all, as the authors claimed, making things simple is the most important lesson they learned.

BigTable is another Google internal product built on GFS and Chubby to provide a compressed, high performance and proprietary data storage system. It adopts a master server to tablets server design, where each tablet server takes care of 10 to 1,000 tablets (like pages) and master server maintains the metadata of tablet locations and handles (forwards) client request. Different from GFS’s single-master design, BigTable leverages Chubby to maintain a master-server in a group of 5 nodes. When a client request comes, the master node looks up for corresponding tablet server in one disk operation and forward the request to that tablet server. In addition, BigTable is able to recover from crash by reconstructing tablets from logs stored in GFS.

Hmm, a pretty good and brief design for large scale, distributed tables.

“In 2012, Google moved its search to a revamped file system known as Colossus. Whereas GFS was built for batch operations (i.e. happened in background before actually applied to a live website), Colossus is built for realtime services.” Another interesting thing here is that “Google has also dropped MapReduce but uses a new platform called Caffeine that operates more like a database”. One change in Colossus is that now they’re using multiple master nodes instead of the single one in GFS! Here’re some internal notes of Colossus.

  • Next-generation cluster-level file system
  • Automatically sharded metadata layer
  • Data typically written using Reed-Solomon (1.5x)
  • Client-driven replication, encoding and replication
  • Metadata space has enabled availability analyses

      Google developed a bunch of new storage systems or computation system but published none of them. From this document, we can see how quick the requirement is changing.

      HBase

      “HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as part of Apache Software Foundation‘s Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.” – wiki

      Tables in HBase serve as input and output for MapReduce jobs run in Hadoop and may be accessed by Java API.  HBase is not a directly replacement of SQL Database because of performance. However, nowadays it’s serving data-driven websites including Facebook’s Messaging Platform.

      Facebook moves to HBase+MySQL from Cassandra in 2010, but why it seems to be a better way to serve Facebook’s large data-driven website? Instant Messenger itself  does not likely to require any large scale storage system but the message history does. Those could be quite simple structured queries like “SELECT message WHERE username=Alex”. In this scenario, Cassandra’s hybrid database won’t benefit much.

      “In my opinion, these differing histories have resulted in HBase being more suitable for data warehousing, and large scale data processing and analysis (for example, such as that involved when indexing the Web) and Cassandra being more suitable for real time transaction processing and the serving of interactive data.” from Dominic’s Blog.

      In my understanding, HBase provides a better throughput, fault tolerance, compression rate and locality, whereas Cassandra is doing better in latency. The former advantages could be achieved by leveraging a huge number of data servers, a better data partition strategy and replication strategy. To reduce latency, you need better caching strategy, fewer round-trip between master nodes and data servers.  

      Amazon Dynamo

      Dynamo is a highly available key-value store service used in Amazon.com’s ecommerce operations. One of Dynamo’s authors, Avinash Lakshman, joined Facebook after the project and started the Cassandra project.

      Different from previous key-value store system and databases, Dynamo sacrifices isolation and weakens consistency to guarantee high availability even in the event of failure, network partitions and errors. In addition, Dynamo provides the “always writable” service with extensive object versioning, and leaves conflict resolution to the read queries.

      To achieve those goals, Dynamo leverages distributed hash table for the underlying architecture and guarantees eventual consistency instead of strong consistency. Instead of resolving conflicts during writes, Dynamo creates a new object (actually a new versioning of the object) for any write. Each object instance is associated with a vector clock which includes the information of where it comes from. During the read operation, the node merges all versions in *its* current knowledge base through *user-defined* functions. Object information is propagated through the p2p network to obtain eventual consistency.

      Another interesting design of Dynamo is its “ring” partition schema. Each Dynamo instance has a “ring” space, where the storage hosts are randomly distributed on the ring (sometimes a storage host can map to multiple virtual nodes on the ring). Any data will be hashed into a position on the ring, and it will be stored at the first virtual node whose position is equal or larger than the hash value. The “ring” schema allows fast node insertion/deletion and failure handling. In addition, it’s very flexible.

      However, the random “ring” schema is not a stable design intuitively. It’s possible that all replicas of a file all locate in the same data center. In the event of a entire database outage, you could loose all the copies of that file. On the other hand, if the underlying storage system guarantees to replicate the file in datacenter, this kind of data loss could be avoided.

      Linkedin Voldemort

      Leave empty

      Apache Cassandra

      As an Apache Software Foundation top-level project initially developed by Facebook, Apache Cassandra was described as a BigTable data model running on an *Amazon Dynamo-like infrastructure*. It’s used by Facebook to power their Index Search feature until 2010. Nevertheless, Cassandra is still continued to be used by Twitter but not Tweets.

      It’s a structured key-value store with following features.

      • It provides tunable consistency, which basically means the client application is able to determine the level of consistency for any given read/write operation.
        Cassandra allows different level of consistency for the tradeoff between efficiency and correctness. This could be a good approach but leaving the per-query choices to application developers who are more likely to commit incorrect codes. In addition, it’s hard to predict the influence of queries with different consistency level on the same piece of data.
        • Keys map to multiple values, which are grouped into column families. Column families are fixed when Cassandra is created but columns can be added to a family at any time. In addition, columns are added to specific keys, so different keys can have different number of columns in any given family. Values from a column family for each key are stored together. This is somehow similar to a row-oriented store.
          Let’s think about how this feature can make a difference. If a SQL query requires at most one column in each column family, Cassandra performs quite like a traditional key-value store. But if a query requires multiple columns in a column family, e.g. join two columns, Cassandra is able to accelerate the computation because of better locality.
          On the other hand, this data management strategy doesn’t make strong assumption for the number of columns so that it could be much easier for developers to misuse columns without a warning.
          • Additional features include: using the BigTable way of modeling, eventual consistency, and the Gossip protocol, a master-master way of serving read and write requests inspired by Amazon’s Dynamo.

          First Post

          Just to make it non-empty.

          Follow

          Get every new post delivered to your Inbox.