This project is created at 07-04-2007, located at https://sourceforge.net/projects/gridspace/, and is still under heavy development. So please fetch the newest version of this README file from the site. (If the site is not up, please wait one or two days to check later. A Novel Grid System With Cell Powered ======== ABSTRACT ======== The architecture of Cell makes it ideal for Grid application. We created a python based and cell powered Grid system. By extending Python's implementation, any node in grid can access any python object throughout the grid. And the python codes are segmented into pieces, spread to the other nodes to executed parallelly. What's more, the JIT (Just In Time) compiler can convert Python VM code into SPE instructions, executed in SPE cores to gain great acceleration. ============ INTRODUCTION ============ When first heard the name of "Cell", another associated word "Grid" comes up to my mind. A Cell BE(Cell Boardband Engine) has one PPE (PowerPc Processor Element) serves as manager, and eight SPEs (Synergistic Processor Element) to provide amazing computing power. Compared with general processor, each SPE is rather simple, and only has 256K local storage, which could only run a code segment. This architecture is ideal for Grid Application: code segments are 'roaming' from one Cell to other Cells. If we could divide a Grid Application into lots of code segments, then the more code segments we have, the more computing power we can gain. However, before this comes true, we have to solve two problems: 1. Programming for SPE is not easy for many programmer, especially for those who had never heard about Vector Processor, SIMD... Even for very experienced programmers, coding in assembly language costs too much of the programmers' effort. If a system is too hard to use, then it would be useless. We must reduce the difficulty, to allow more potential users to use. Grid System only works greatly in a large scale. 2. A new Grid infrastructure is needed. Most of the exists Grid infrastructure are designed for general platform, most of which are Scalar Processor based. And the dispatching is task based, which can only queue the tasks, never have the idea of dividing a program into code segment. We should create a new Grid infrastructure, which is lightweighted, easy to use and smart enought, require minimum configuration. After comparsion, we've decided to choose a modern language as the Grid Language. So, we've choosed Python. First of all, we extend Python to make it usable in Grid. ========= GRIDSPACE ========= Python has a very important conception, called namespace, described as: 1. naming: a naming is a map from a name to an object. 2. space: names are organized in namespace, which are limited to those own subspaces. Namespace is the bases of Python, but namespace has its limitation, which is subject to its process. Eg: we can never access a Python object in another Python process directly. The only way to access it is to transfer it here via process communication mechanism, such as IPC shared memory, or pipe, or UNIX socket, or TCP socket, etc. By extended Python's ability, we created GridSpace, which broken the limitation of process, and is accessable Grid wide. Any objects, including numbers, functions, classes, files, ... could be placed under GridSpace, and accessible in Python throughout the Grid. Examples.1 describes the idea: ---------------------- Machine A in Grid runs: import GridSpace def add_func(x, y): return x+y GridSpace.a.obj=1 GridSpace.add.obj=add_func ---------------------- Machine B in Grid runs: import GridSpace print GridSpace.add.obj(GridSpace.a.obj, 2) # should get '3' as the result ---------------------- Example.1 In this example, these two machines are connected to net, and configured as nodes of a Grid. Here we could conclude that Machine B fetch the data obj 'a' and the function obj 'add', which are exist in Machine A, and executed 'add' function with another argument '2' supplied locally, and get the result of '3' finally. This is a simple demostration of a Data Grid and Computing Grid, which GridSpace brings. ======================= PERFORMANCE ENHANCEMENT ======================= Example.1 just seems no different with WebService. It seems like a simple RPC procedure. Here we introduce the additional ability of GridSpace, which makes WebService nothing: ---------------------- Machine A in Grid runs: import GridSpace def add_func(x, y): return x+y GridSpace.a.obj=[1,2,3,4,......,1000000] # a very large object GridSpace.add.obj=add_func ---------------------- Machine B in Grid runs: import GridSpace GridSpace.a.obj.append(3) print sum(GridSpace.a.obj) ---------------------- Example.2 In this example, data 'a' is very large in size. So, instead of transfering data 'a' into Machine B, the 'append' operation and the 'sum' function are forwarded to Machine A, performed locally, and then transfer the 'sum' result into Machine B. What's more, the computing procedures could be executed parallelly automatically: ---------------------- Machine A runs: import GridSpace def VeryHeavyCalc(x): # This function costs lots of CPU power ... ... for i in range(10): print VeryHeavyCalc(i) ---------------------- Example.3 In this example, the function 'VeryHeavyCalc' costs lots of CPU power, but calcuating independently. So the GridSpace system automatically parallel the procedure, by dispatching VeryHeavyCalc(0) to node 1, VeryHeavyCalc(1) to node 2, VeryHeavyCalc(2) to node 3, etc. Collect, and print the result after all nodes finished the calcuation. ============ CELL POWERED ============ And we still have more! See the below example: ---------------------- import GridSpace def VeryHeavyCalc(x): # This function costs lots of CPU power def do_calc(y): if y>50: y=y+1 else: y=y-1 return y return map(do_calc, range(100*(x+1))) for i in range(10): print VeryHeavyCalc(i) ---------------------- Example.4 This example is very simillar to the former. The only difference is we writed out the implement of function 'VeryHeavyCalc'. In this example, the 'VeryHeavyCalc' check the value, and plus 1 for those bigger than 50, or minus 1 for the rest. We've created a tool to enable programmers write SPE C code in Python directly. Just like inline ASM in C. And we are developing a JIT compiler, to convert Python VM code into SPE C code, and then compiled by our tool, insert the code segment into Python, and dispatched into SPE to run. In this example, the function 'VeryHeavyCalc' will be detected by profiler, sent to SPE Compiler, and compiled into SPE instructions. The Python implementation of 'VeryHeavyCalc' will be replaced by SPE instructions, which runs darn fast. ========== CONCLUSION ========== With the help of this GridSpace system, The programmer only needs to writes Python code! The GridSpace system will analyze the code, compile the codes suited for SPE, transfer the code segments suited for parallel execution, optimize the data accessing pattern to minimize the data transfer. And, this GridSpace system extends no python syntax, has no library, and no API is needed. Which is full-compatible with the current Python language.