This is not intended to be a list of requirements for contributions to Omega, nor are the points held with unwavering conviction. They are merely some observations about my own and other peoples' coding practices.
A common approach when defining a class is to provide several constructors each taking different types of arguments and to have the body of the constructor process the arguments to fill-in the fields of the instance of the class. A more flexible way to do this is to have the processing done a method and to have the constructor call that. The reason for this is clearly that we can call the method from other methods, after the object has been initialized via the constructor. In other words, we can re-initialize the object.
And a convenient corollary of this paradigm relates to defining fields. Firstly, I declare (almost) all instance fields protected and all static fields public (since they are usually constants or initialized via a static block when the class is loaded.) Immediately after declaring the field, say X, I define two methods to get and set the field. Each returns an object of the same type/class as the field and the set method accepts an object of that type as an argument to which the field should be set. protected boolean X = true;
public boolean X() return(X);
public boolean X(boolean value) X = value; return(X()); Of course, we can call other methods in the set method and re-initialize other fields. A constructor might look something like public Classname(boolean v) X(v); and a call after the object has been created obj.X(false); amounts to the same thing as constructing a new object.
Private methods are useful, but only in unusual circumstances. The reason for saying this is that the philosophy is to allow others to extend existing classes, etc. By declaring a field private, one makes this harder. Of course, derived classes should use the accessor methods for fields whether the field is public, protected or private. However, there are circumstances where it is more convenient to have the object stored in the field. For example synchronized(field)
can be more convenient in certain cases than synchronized(field()) ...
A key aspect of is that users can combine different objects to create new environments, tools, etc. and that they can dynamically replace one component with another that performs the same operations but with a different implementation. A simple example of how this might work arises in the optimization setup used in the initial version of the LME code. Here, we have a general class BasicOptimizer which performs the iterations until convergence has been achieved. We allow the user to specify
If these fields were declared as a class, say A and C, then a user could only reparameterize the iterator by adding a class that was derived from the base class expected by the iterator. So, a new algorithm would have to extend A. In many cases, this is the most reasonable approach. One wants to simply override one or two methods. However, in other cases, an entirely new class structure is more suitable and extension brings with it too many details from the parent class. This is when we need the field to be declared as an interface, say Algorithm. Then we can have arbitrary classes - Newton--Raphson, Nelder--Mead, some genetic algorithms, etc. - each of which implements this interface and we can set the iterator's field to an object of any of these classes.
Additionally, other objects can register an interest in being notified by the optimizer at the end of each iteration. For example, one may update a collection of plots displaying the successive values of the different parameter values, or write the values to the screen or file, and another would allow the user to modify the current parameter values to ``help'' the algorithm in determining the next suitable step.
In order to do this, these objects implement a particular interface which supports a particular method (iterationPerformed) invoked by the BasicOptimizer to pass an event to it. (Event notification can also be done in a background thread.) Here, the interface is necessary to provide a common method amongst the listeners so that the optimizer iterator can call it. This method of method dispatching to arbitrary anonymous objects is a key to providing an easily extensible system.
Perhaps the simplest rules to follow to make code thread safe are the following. Firstly, pass values via arguments, not from quasi-global fields/variables that might be used by two threads concurrently. This boils down to the following simple idea:
if a value is to be used only within the descendants of the current call stack (i.e. within methods called by the current one), pass it to each of these methods as an argument.This involves more labour as one has to declare each method to take additional arguments, even if a particular method is only an intermediary which passes the value on to another method which actually uses it.
void A() int x = 1; B(x);
void B( int x) ... y = C(x)
void C(int x) x + 2; An example of using this is in the Function class. If we cache the evaluated arguments in a field to make them available to other methods, errors will occur if this function instance is evaluated in two separate threads, or even simply interleaved with a second call within the same thread foo(x,y,foo(a,b,c))
On the other hand, we often need to store a value to be used in another call stack, such as when another event occurs and a callback invoked, another user command, etc. These are essentially cross-task variables. Wherever the variables live - an instance or static class field, an database - they must be synchronized.
More specifically, when an object is shared across different stack calls, access to it by two or more threads must be synchronized if and only if its value can change. The standard example is where two fields are used to store joint information.
int size; LinkedList list;When we add a new entry to the list, we should increment the value of size and similarly when we remove an entry. (This example of course would be better expressed by having the size field in the LinkedList but we use it here for example purposes only).
A simple way to synchronize is at the method level. We can declare one or more methods to be synchronized to indicate to the JVM that only one thread can be executing one of these methods at a time. In other words it locks the entire object this. This is a coarse level of granularity. Other methods that do not touch the objects being modified cannot proceed in parallel.
We can use finer resolution synchronization with a little more work. We can declare a critical section in which only one thread can be executing at a given time using the synchronized statement, not the method modifier.
public void add(Object el) { ... synchronized(list) { list.add(el); size++; } ... }This means that the code in the ... can proceed concurrently with other code, without locking the entire object. The object that one can synchronize on can be this giving the same effect as the synchronized method, but with finer granularity for parallelism.
When the code in the body of a method that does need to be synchronized is ``small'', using a synchronized may be less efficient than declaring the method to be synchronized. The reason for this is that most JVMs check whether the method is synchronized before invoking it and acquire the lock appropriately. This is internal functionality in the JVM. The synchronized statements use op-codes and hence JVM instructions.
So generally, use synchronized statements. It is worth the extra effort in terms of clarity and performance in most cases. Consider using synchronized statements if on is frequently synchronizing on this and the methods have very little code that does not need to be in the critical section.
To be specific, suppose we are defining a class A. Then, we would provide constructors
class A { public A(A a) { this(a, true); } public A(A a, boolean deepCopy) { field!(a.field1()); field2(a.field2()); } }
Additionally, we might introduce interfaces Copyable and ConstructorCopyable. The first will indicate that the class implements a method copy which takes an argument of the same class and a boolean indicating the deep or shallow copy. The second interface would indicate that the class has a constructor as in the paragraph above and allow us to avoid looking for it unnecessarily.
While the import command in is useful, it can be abused and lead to very confusing code for a reader. We suggest that classes from other packages ( other directories or entirely different distributions) being referenced in a class definition be
import org.omegahat.Environment.Databases.ObjectDatabase; public boolean foo(ObjectDatabase db) { ... }
public boolean foo(org.omegahat.Environment.Databases.ObjectDatabase db) {
Feel free to consult me if there is a need to change the installation or configuration setup.