Thursday, November 12, 2009

Branch-and-Bound, Graphical Models, Structured Grids

Branch-and-Bound Pattern

The pattern has 4 basic operations branching, evaluation, bounding, and pruning operations. The problem is split into sub problems like branches and this decomposition can be done using some other patterns like geometric decomposition pattern or so, and then these pieces can be parallelized and run on multiple cores. Each sub problem get evaluated and if it has a feasible solution then proceed it for the bounding and then to prune so that we will have optimal solution.

I like the flowchart and it’s very nicely described. I like the algorithmic style of presenting the paper as it is appropriate for this one.

Graphical Models Pattern

Looks like this pattern is still in process and didn’t give me a feel for it be a parallel pattern. It only mentioned how transformations and searching of trees can be parallelized. I am not sure if I understand the whole problem that author is trying to solve. Looking forward for class and then reread it again…

Structured Grids Pattern

This pattern explains how to parallelize operations on a large grid or array. The pattern uses geometric decomposition to decompose the problem into smaller arrays and then operations are performed in parallel in each array. Ghost nodes are used to figure out any dependencies of neighboring nodes.

I like the double buffering and adaptive mesh refinement idea s to refine the rendering. I have read some adaptive mesh refinements before in gaming to reduce the number of vertices and edges in a 3D model. Usually this is performed during game development and cached somewhere. Overall I like this paper batter than others I have read for this week’s class.

Shared Queue, Speculation, Circuits

I would like to say a pattern paper with more than 8-10 pages may be boring and it may not keep users undivided attention in reading it. So Prof. Johnson is clever asking us to write a pattern with 6 or so pages.
All these 3 pattern papers are too long to me and for last few pages I struggled to keep my attention. Concepts for patterns are clear but they may have done a better job keeping it simple and clear with good examples. Here is my brief review on these patterns and looking forward for good discussion on these though…
Shared Queue Pattern

Shared queues are well known in application development community. A centralized share queues are easier to implement. A well known use of shared queue is with a thread pool. This can be implemented as individual threads take tasks directly from the queue or a master thread distributes the tasks to the threads.

This paper used the examples with separate locks for push and pull but I think it is easier to use and maintain a single lock for the entire queue. Two locks might be more depending on the number threads at a time in the queue and maximum size of the queue.


Speculation Pattern

This speculation is similar to branch prediction that I learned in CS421. Given example is not very clear to me and I may get better understanding of this pattern with respect to parallel world and speculation.

Circuits Pattern

As I am learning about the time bounds that depend on the input size from the algorithms class (As algorithm teaches to be linear or polynomial in execution of the algorithm with respect to input size) this pattern’s problem is eye catching for me.
I like the example but I wish they showed it from high level language. I have to listen to class to get more insight in this one….

Monday, November 9, 2009

PRES: Probabilistic Replay with Execution Sketching on Multiprocessors

This paper is interesting to read (took few passes to get hold of what authors are saying) and like the ideas presented. Looking forward for good discussion in the class tomorrow --

What do you expect from a debugging tool?

Let me examine the internal state of the application throughout the execution of it so that I can see what is going on inside while it is executing the code.

What debugging tools do you usually use? What are good and bad about them?

I use Rational Software products server web sphere server (based of Eclipse) run in debug mode and look at the system out and error logs to debug the app and server allow to be started in debug mode and we can insert the break points so that when server steps in to that place it will stop and take details of the details and we can also do line debugging as we step through the app.
It is good because it will show us the variable values and method interaction etc… at time that were setup through break points. However, they are bad for concurrent programming debugging.

I wish this tool have a way to more naturally step through the interactions rather than stepping through the breakpoints or prescheduled items and produce the results of the values and interactions as it completes the task.

What are the challenges of debugging a sequential program? What are additional challenges of debugging a parallel program?

Debugging in a sequential program is relatively easy with a tool. Just give some known input with an expected output, and step through the application to see how the input is changed it resulted in bad or unwanted output than what was expected. Debugging parallel programs is harder as we need to know how the work is divided and what to expect the results for the piece that is getting executed on a single etc…

How important is replay for debugging?

If we can’t reproduce the bug and know where it is coming from then it is very hard to be able to fix it and know that it (same bug) won’t be occur again in the future. Debugging allow us to efficiently reproduce the bug and lead us or show us the potential reason for the bug so that we can fix it.


If you need to design a deterministic replay system, how are you going to do it for sequential programs? Does it work for parallel programs? If no, how to make it work?

For sequential programs, we can take state information at various points in the program that would give enough information to known where the inconsistencies are occurring and then the deterministic algorithm can use this information to find the reply. The same idea can be applied to parallel programs, but they may need more current state information gathered from different cores and give to deterministic algorithm to find the reply back.

Loop Parallelism - Task Queue - Graph Partitioning

These pattrens are neatly documented and easy to understand while i was reading. I will try to answer James questions on these to continue conversation...

Loop Parallelism

1. The author says the goal of incremental parallelization is to "evolve" a sequential program into a parallel program. If you have experience in refactoring code from sequential to parallel, how did your approach differ from the steps presented in the paper?

Paper gave good approach in my view. If it were me, probably I start with mostly used areas (frequency) and analyze it first and then combine with rest of the code. If the entire app is difficult to change then look for the pieces that can be improved and apply parallelism to them.

2. Many of us have used Eclipse or another IDE, with their respective plugins, for execution profiling. What has been your experience with these tools? What is your preferred method of locating the code in question (automatic or manual)?

I have used profiling facilities that are provided in Rational Software Architect (It is based on the eclipse platform). I used it to write a plug-in which will give us the code coverage analysis (How much of your code is covered through your test cases). My preferred method is automatic.

3. The author states, "The application of many transformations to many loops may be necessary before efficient performance is achieved." What is your opinion on this statement?

It depends on complexity and also understanding of the underlying loop more than anything else. So more you know about the loop and its potential transformation the more you can optimize the transformation.

Task Queue

How would you compare the Task Queue pattern to other similar distribution patterns like divide-and-conquer and fork/join pattern?

We have a queue or queues that have multiple tasks coming in, so we take each task (divide) and work on it and when we are done with it we post the results to caller(conquer step).

Graph Partitioning

This was probably one of the more math-oriented patterns I have seen and/or read. How often have you experienced a situation where the graph partition strategy would have been useful? If never, can you think of any future situations where this pattern might be come in handy and could be applied

Any situation that can be represented as graph and have some known structure (like bipartite or so on) then we should be able to use this pattern. I can think of a situation where I need to program matching and find best matching of the pairs (which is typical graph theory app) and we can represent that as bipartite graph and use this pattern. It may not be anytime soon using this pattern in my work…

Tuesday, November 3, 2009

Task Parallelism, Recursive Splitting, Discrete Event

Task Parallelism

As name suggested task parallelism is nothing but task level parallelism, breaking a large task into many smaller ones, and then proceeding to find which ones can operate in parallel. This pattern has lot of details in it along with some hardware details that are relevant for the parallelism (needed 2 passes to grasp what author is saying about). I like the context and Forces section but thought solution section may need some more improvement with more clear information on how to approach the solution.

Recursive Splitting

This pattern describes how one can effectively implement parallel execution for the recursive algorithm that splits the data and each part of the recursion will execute as a separate task. This paper talked about following as solution steps for algorithmic strategy involving recursive splitting:

1. Express problem recursively with more than one task generated per call
2. Use a balanced data structure, if possible
3. Use a fork-join or task-queue implementation
4. Use optimizations to improve locality
I like the clarity and simplicity of the paper.

Discrete Event

This pattern is interesting one. Here the problem described for this pattern(from paper) :

Suppose a computational pattern can be decomposed into groups of semi-independent tasks interacting in an irregular fashion. The interaction is determined by the flow of data between them which implies ordering constraints between the tasks. How can these tasks and their interaction be implemented so they can execute concurrently?

And solution (from paper):

A good solution is based on expressing the data flow using abstractions called events, with each event having a task that generates it and a task that processes it. Because an event must be generated before it can be processed, events also define ordering constraints between the tasks. Computation within each task consists of processing events.

I like the approach of how it makes easier to detect deadlocks, but like see more examples in implementation point of view.

Armstrong Thesis – Chapter 5:

This chapter describes how to program fault-tolerance using Erlang. The reading describes strategies for programming fault-tolerance, supervision hierarchies, and well-behaved functions.


A fault-tolerant system should be organized into a hierarchy of tasks. The highest level tasks should run the application according to the software design specification. If this task cannot be performed then a simpler lower level task should be performed. I like the strategies mentioned in this chapter and I am thinking it will provide robust and fault tolerable system. I like the idea of having supervision hierarchies. The tree representation is a nice way of distributing error recovery responsibilities to nodes, so that each node can monitor the errors in its descendent nodes.


Described error recovery mechanism looks good but it may be difficult to identify and define the errors in order to write the error recovery mechanism, I am thinking it may be nice idea if we can write some generic error recovery mechanisms and let the application have some intelligence to see which type of error recovery that It can apply to and as the application more used then programmers can better guess the error styles and errors so outer layer app should be able to pick up these added error recoveries and use them. Well-behaved functions (WBF) are a way of determining what an error is(probably this should build that intelligence in choosing the right error category like I mentioned above). WBF are functions that should not generate an exception unless an error is encountered. So this allows the programmer to interpret any exception generated by a WBF as an error. I like the rules specified to when to raise the exceptions.