(Desktop) Grid Computing
toucheatout 2006-03-22 19:22 Grid Computing
A grid, as meant here, is the collection of a large number of computers, all connected and coordinated to execute a given computation while leaving each individual computer free of collaborating or withdrawing its support, expectedly or not, temporarily or not. The withdrawal can occur for a variety of reasons, and in a different sort of ways.
While leaving many skeptics at first, projects as SETI@home raised the doubts, and nowadays the power is patent: as of January 2006, Folding@home sustained over 200 TFLOPS, while SETI@home remains at a good 100. Several similar projects enjoy, to a different extent, such a supercalculator power.
General current work and challenges
The most basic and fundamental caracteristics of a desktop grid is the very large scale at which it operates and the inherent node volatility. This last property creates interesting problems in crash recovery and consistency constraints. It opens speculations on near-future available computing power. To add insult to injury, no only hosts, but also whole parts of the GC network may disappear without notice.
Assessing the potential of the grid for efficient usage
That is, monitoring the different components of the grid to be able to handle delays and cope with host volatilization. This clashes with the potential presence of firewalls (forcing the use of proxies, thus adding centralization). Indeed, as in the case of XtremWeb ([1]), while the dispatcher should be accessible from the internet, workers and clients well may be behind firewalls. This leaves no other choice than imposing a timeout, and/or a keepalive/heartbeat system. The firewalled hosts also will be responsible for pushing their general reports, results and demands of tasks.
- Build a model of the participating machines computational power (See [1] for the case of Soft Real Time)
- Jointly, build a network model as bandwidth can be as critical as CPU cycles and both are intimately woven together (see [1] and [2]).
Fault tolerance
Fault-tolerant MPI (flavors of MPICH-v, see [6] or MPICH-v at LRI)
Fault-tolerant RPC (RPC-v, see [8])
Mechanisms to prevent data loss or task scheduling discrepancies upon a host crash or timeout due to silence in out of range computing conditions.
A monitoring mechanism with the previously mentionned keepalives and timeouts. The interest is common to those two ends, performance and fault tolerance.
Redundant computing minimizes the chance of job failure at the expense of efficiency.
Redundant storage of data and indexes. This also can lead to performance embetterment if parallel upload/download is possible (a la raid or P2P).
(pessimistic) event logging to backtrack to last good known position (similar idea as journaled filesystems)
Security/Integrity
- Find result validation models that are more efficient than double checking.
- Find and refine security models, as sandboxing (implemented as a LKM in Linux and in mostly safechecks the system calls. Ptrace implementation for self-observation exists - the sole name of ptrace however reminds of its awful race condition threat allowing for code injection). This shields the worker processes on the host.
- Insure the inocuity of programs that run on the grid computing hosts against the hosts themselves.
- Parallel programmation of programs in a LSDS.
- Timely and transparent management for data needed by the grid's participant (see below).
- Usage of asymetric cryptography for securing transfers and/or identifying the resources (together with SHA1/md5 checksums).
- Use of user accounts.
- Event logging
Data distribution approaches for LSDS
FTP and bittorrent for data distribution
As shown in [3], The bittorrent method to share files does not suffer from the linearity of a single FTP server. For large ammounts of data that has to spread to more than a dozen of hosts, bittorrent is clearly the winner. This could lead to a hybrid protocol using FTP (possibly mirrored or dynamically pooled) for small code files (apps, "unintensive" app data), and bittorrent for bulkier, widespreadly needed files (e.g. the big file all the workers would need to have for processing). Intuitively, it is suitable for data that can't be part of the parallelism.
JUXMEM
JUXMEM ([5]) seems one of the most promising project dedicated to the data distribution for GC systems. It relies on an very appropriate problematic: Transparently and permanently keeping available the data necessary for a grid to function efficiently. Its inspiration is based on a blend of DSM and P2P methods (P2P for the large scale distributed, volatile environment and DSM for its efficient principles of transparent management and consistency).
IRISA website maintains a list of JUXMEM-related publications.
Current other projects of interest in GC
- CONDOR, originaly a cycle-stealing project that expanded to high throughput batch scheduler. It accepts participation of clusters and desktop machines. Some modules allow for secure storage, monitoring, ...
- BOINC, the very popular VC platform.
- OAR, a resource manager (or batch scheduler) for large clusters.
- Globus Alliance, a community around grid computing. They offer an opensource toolkit.It is however rated unsuitable for
XtremWeb Desktop Grid Computing
Regular XtremWeb operations
XtremWeb is based on a 3-tiers architecture: clients, workers and servers. The client submits applications, creates users, launches jobs to and retrieve results from the server, which schedules to the different workers the chunks of work and collects the results. The server not only has the charge of re-assembling the different outputs, it also has to validate the results (problem of trust in or reliability of the workers), manage the volatility of the nodes, garantee the correct and timely transmission of the necessary input data (code and data), or at least pointers to them.
To cope with the massive firewall eruption over the net, the generally used process is that the server must remain directly accessible and the workers will transmit their processing output, as they cannot be reached directly.
This limits the knowlege the server can have on the node availability, as active monitoring is unfeasible.
As real-time checks are impossible, it forces the use of timeouts or keepalives from the worker to
push availability information to the server. It is at the expense of messages that the server does not necessarily need, and forces the usage of somewhat outdated information.
This obviously poses additional constraints on the processing time boundaries (soft real time).
The same remark applies to client / server connexions, as the client will likely be behind a firewall. It reduces the reactivity, as the clients have to ask about their job status and are not informed directly, as soon as possible (security concern in case of FTP transfers as storage "theft").
Common cases when a desktop will not compute for the grid
- User intervention
- If some human being is using a machine - as shown by mouse and keyboard activity, it will not honor any further computation request, until the human being is out of the way.
- CPU usage is above a limit
- Even without human interaction, some scheduled tasks or punctual load will force the machine to release the resources occupied for the Grid.
- Task is outside allowed working hours
XtremWeb fault tolerance model
Based on mechanisms designed for garanteed successful recovery on crash without optimization, as described in [9].
XtremWeb security model
It is based on the sandbox
XtremWeb user, apps and job management
see XtremWeb Documentation.
Personal research interests in the domain of Grid Computing
- Enhance the network and host availability models, through (possibly distributed, peer-to-peer) monitoring.
- Based on a network and host model, try to instill auto-organization amongst the Grid to provide higher-level means of control and comprehension (possibility of managing at self-agregated group level). This is more interesting the event of real desktop computing and not securely connected clusters of machines.
- Define roles that machine can automatically take, as a function of their availability, resource potential and track record (a la GNUtella).
- Find reliable algorithms to scatter data across the grid so that their access time and their very presence is, if not "garanteed", at least reasonably bounded.
- Systems where contributed time could be punctually requested back by participants (or obtain distributed, "99.999% secured" temporary storage space) - get something out of "lost" cpu cycles or disk space by giving them and borrowing sometime later when in need - generates a pool of shared resources: Job latency and capacity of answer/efficiency?
- Code or partial results (processes ?) migration capacities. Tasks or bags of tasks relocation: self-organizing and readily available/pre-prepared backup plans.
Future directions and ideas
- Estimate job (resp. task) completion time
- Unify the server/worker model and allow for redundancy of data indexes, try to induce by interest for participants or way of usage a pyramidal self-organization (Would be easier to consider level 2 workers as having a pool of workers behind them, "recruited" by them or "self"-organized with bandwidth, estimated processing power, required storage, coherent set of applications, etc... requirements in mind...). It would allow to consider a set of machine, and in the case that they are not on the very same network(country?) (and thus are likely to all go down at same time, etc...), it would allow reasoning on some volume and not individuals.
- Enhance the security without inflicting too much penalty on the performance
- Find judicious models and organization to enhance the availability of the input data (ie manage the bandwidth and storage usage with respect to the consumption of data resources - governed by the processing speed and the input/output data size for a job of given execution time T - that is roughly (input_size + output_size) / speed). Output data may be left out in case time isn't a factor, and slow periods when jobs are scarce and thus results easily transmitted.
- Allow for jobs that are running constantly, as soon as new data are available (eg SETI). [I guess this is the case, easy enough with scripting]
- As above, but runs a program with an undefinite ammount of time (typically, a server, e.g. database). Needs worker relocation (a la COMET for instance), a sound, distributed service index/directory and is clearly linked with GITS.
- Automating the client job submission / retrieval / cleanup (macros with conditionals).
- Automating (partially, give a sticky option) server migration.
- Distributed service directory (link with hierarchical dispatcher/worker org), opens a "Ghost In The Shell©" opportunity
- Distributed (?) host evaluation (uptime, var(uptime), min(up), max(up), max/avg(CPUPower), statistics on when the resource is available - timespans, frequency of availability at a given hour, avg/var(CPU time consumed by hour)).
- Workers stats on task execution caracteristics by application (cpu time, avg time, std deviation etc...).
- ?? worker-side compilation/interpretation (of java/python/...) to bypass cross-compilation and platform dependance. Example: a XtremWeb macro for a job with gcc as binary them a second one with the compiled result. We suppose that the sheer number of clients allows for finding suitable host.
- Use, as in the configuration file of workers (but determine it at runtime), an array of servers to connect to, ordered by the potential bandwidth, storage space... and relevance with respect to the worker's already running apps.
- Maintain a distributed peer directory, for matching (self-organization ?). With the element below, be able to create groups from the measuresOleg Lodygensky.
- Periodically evaluate the potential of peers (bandwidth, storage, processing, etc...
- Give every element an evaluation of how he "feels" - that is, underused, underconnected, storage...
- Manage timing constraints: get a model of available power/CPU cycles and let the workers report an impossibility. It would then be useful to keep a pool of ready servers just to catch up in case of "unexpected" volatilization.
- Refining the computation abort system in case of common Desktops (e.g. authorize finishing of certain tasks, allow to renice to a high value when user interaction comes in, ...)
- Propose a system of fair exchange of resources in some cases (eg. company A buys cpu time, B sells surplus sometime, C does both. Exchange time with peers as a preferred way, and propose for people necessitating temporary horsepower a very small fee for the grid usage, redistributed to participants... (w/ paypal or similar ?). Leave open the general interests projects or allow a mix-and-match of usages (for free project 40%,20% to rent etc... -> propose that XW is running despite user action in some cases and to a certain extent).
- Alternatively, in a model with each node being server & worker, exchange jobs with others and keep track of the 'debts' (rather who could i fairly ask for time).
- Distributed storage: master copy + working copy or master + distributed chunks (lowers individual node data storage requirements). Or a comnbination of those 3 items...
- For modelling a la P2P, maintain a list of 'suitable partners' (wrt apps and data size): Idea of "distance" between nodes.
- Security workaround concerning FTP security problems as described in [3]: one-time passwords or write-only worker access (clients still have to get rw (or write only to put, read only to fetch if they are separate directories - opposite of the worker). The second solution leaves a breach only if client & workers can find a way of interacting with another channel than the grid.
- For soft real-time: keep a pool of resources on "power servers" for catching up in case of task failure and proximity of the deadline. The overall spare resources have to be defined and managed with respect to an estimation of the ratio of the tasks (or CPU cycles) that are likely to be in the red zone and therefore need the use of the spare resources.
Glossary
FLOPS: FLoating point Operations Per Second.
LSDS: Large Scale Distributed System.
GC: Global Computing. (also Garbage Collecting in other context)
DSM: Distributed Shared Memory.
P2P: Peer-to-Peer.
References
You may head on to Gilles Fedak's research page to encounter paper's downloadable PDFs and more.
- [1] Towards Soft Real-Time Applications on Enterprise Desktop Grids
- Derrick Kondo , Bruno Kindarji, Gilles Fedak et Franck Cappello.
In Proceedings of 6th International Symposium on Cluster Computing and the Grid CCGRID06 Singapore, 2006.
- [2] The Computational and Storage Potential of Volunteer Computing
- David Anderson & Gilles Fedak.
In Proceedings of 6th International Symposium on Cluster Computing and the Grid CCGRID06 Singapore, 2006.
- [3] Collaborative Data Distribution with BitTorrent for Computational Desktop Grids
- Baohua Wei, Gilles Fedak et Franck Cappello.
ISPDC'05 Lille, France, 2005.
- [4] Computing on Large Scale Distributed Systems: XtremWeb Architecture, Programming Models, Security, Tests and Convergence with Grid
- Franck Cappello, Samir Djilali, Gilles Fedak, Thomas Herault, Frédéric Magniette, Vincent Néri et Oleg Lodygensky. To appear in FGCS Future Generation Computer Science, 2004.
- [5] JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid
- Gabriel Antoniu, Luc Bougé, Mathieu Jan. Internal Publication, September 2003.
- [6] MPICH-V: a Multiprotocol Fault Tolerant MPI
- Aurélien Bouteiller, Thomas Herault, Géraud Krawezik, Pierre Lemarinier, Franck Cappello. To appear in International Journal of High Performance Computing and Applications.
- [7] XtremLab: A System for Characterizing Internet Desktop Grids
- Paul Malecot, Derrick Kondo et Gilles Fedak. Poster in International Symposium on High Performance Distributed Computing HPDC'06 Paris, 2006.
- [8] RPC-V: Toward Fault-Tolerant RPC for Internet COnnected Desktop Grids with Volatile Nodes
.
- Samir Djilali, Thomas Herault, Oleg Lodygenski, Tangui Morlier, Gilles Fedak and Franck Cappello. To appear in SC'04 Pittsburg, 2004.
- [9] XtremWeb & Condor : sharing resources between Internet connected Condor pools.
- O. Lodygensky, G. Fedak, F. Cappello, V. Neri, M. Livny, D. Thain. CCGRID2003, workshop on Global Computing on Personal Devices, IEEE Press, May 2003.
- . .
- . .
To be continued...
Webographie
To be finished, cleaned up and formated...
http://www.irisa.fr/paris/Biblio/www/Keyword/JUXMEM.html
https://www.irisa.fr/centredoc/publis/PI/2003/irisapublication.2006-01-27.8385812841
http://content.ix2.net/arc/t-8515.html
http://www.irisa.fr/paris/web/juxmem.html
http://www.irisa.fr/paris/web/P2P.html
http://www.irisa.fr/paris/web/G5K.html
http://hal.inria.fr/inria-00000987
https://gforge.inria.fr/projects/adage/
http://oar.imag.fr/
http://hal.inria.fr/inria-00000976
http://hal.inria.fr/inria-00000973
http://runtime.futurs.inria.fr/PadicoTM/
http://www-faculty.cs.uiuc.edu/%7Eindy/
https://www.grid5000.fr/mediawiki/index.php/Grid5000:Home
|
|
|
yro.slashdot.org - Your Rights online
|
|
|
|
nytimes.com New York Times - International
|
|
|
|