![]() |
|
The current IISS program uses a homeade language for formula evaluation in cells, involving elements of Mathematica and image-specific languages. There are no data "types" in the language per se; the lex/yacc parser itself handles strings, numbers, and operators and represents them in terms of a parse tree for the rest of the program to digest. It is a pure-functional language, in the sense that it has no conception of statements (only expressions) and assignment. It currently has no conditional or iteration capability.
The language itself, while quite suitable for its current role, suffers from severe implementation defects, including a very complex mechanism for adding new operators to the language, the lack of much-desired data types such as lists, and the above-mentioned lack of iteration / conditionals. Nevertheless, Josh laid out a very clear and clever system using this language and operators contained in shared libraries, and the operator gap can easily be bridged using this new model.
On the other hand, even with the push for additional features in the language, it is possible that the language's fundamental limits could be reached. I believe these limits are:
If another language is to be used, the best candidate appears to be Python. I will assume that the language in should be a general-purpose interpreted language. Perl, more of a system language, has a very large and strange syntax, emphasizes system administration-related actions like string manipulation far more than object manipulation, and is generally harder to embed IMHO. Tcl has been used before by Levoy in a spreadsheet, but is far weaker than most other languages in terms of object support ("everything is a string", although later versions apparantly attenuate this). On the bright side Tcl is extremely embeddable and extendable.
Python (http://www.python.org) offers a good blend of the features most desirable for this project. From the FAQ:
Python is an "interpreted, interactive, object- oriented programming language" that includes modules, exceptions, dynamic typing, VHL dynamic data types, and classes. "... remarkable power with very clear syntax."
The language itself is very stable and has existed since 1991. It is portable with versions for Unix, Win32, Macintosh, and MS-DOS. It is both extendable and embeddable in C. The language is said to be good for cases
"where a great deal of dynamicism, ease of use, power, and flexibility are required.. When augmented with extensions, Python becomes a convenient 'glue' language that helps make heterogenous collections of unrelated packages work together." (again from the faq)
This report describes the experiences in developing a prototype Python-based plugin system that could be applied directly to a new evaluation system for the next-generation spreadsheet, iCE (information Cell Environment). Several of these systems are described in detail, with their relative advantages, disadvantages and unspecified (but known) implementation details with relatively unknown required development times.
The final evaluation system has the following feature goals:
The current way how formulas are evaluated is presented in comparison to the other models.
A lex/yacc grammar transforms the input formula string into a parse tree. This grammar is pretty large (74k). Once this tree is constructed, backwards chaining is used as mentioned in the IISS papers to calculate dependencies. Evaluation is simply traversing this parse tree, and looking up the formulas in rather a rather long and complex procedure (or so I'm told) and running a set of internal operator functions.
"The Reference Model" is the term I'll use for the shared library-based plugin scheme devised several months ago and partially implemented by Josh early this year. It was meant as an ad-hoc solution for the purposes of experimentation and presentation, and explicitly does not have functionality for dependency tracking and its support for binary operators is minimal. Still, it is still useful for the two mentioned reasons, and dependencies and binary ops can be added to the model.
The model revolves around the concept of recursive parsing. The application parses the top-level formula text, which will either be some sort of frame reference or an operator (of the form foo[]). It is assumed that the plugin was located in a plugin directory of shared libraries, and is initialized; it will be called, with its argument string its sole argument. ie foo["bar.tif", 2] resolves to something like eval("\"bar.tif\", 2"). Inside this evaluation function, the plugin again parses the formulas one-deep, calling other plugin evaluators if necessary.
This model of parsing was prompted by the desire to add additional functionality to the language that tended to be focused on specific operators. The canonical example was lists, which Josh mentions in his plugin notes:
"If each operator parses its own argument list, this allows the argument types to be extended. The best example is in the current iiss, with the addition of Read_multiband[]. We wanted an argument that took a list as its value: src_channels->{3, 2, 1}. However the {}'s weren't allowed in the language and I had to create arguments detailing how many channels were to be read and which ones. However, if Read_multiband[] parsed its own argument string, it could have supported this. I'm sure there is an argument for not allowing the language to be changed, but not from me."
Initialization in greater detail. It is assumed that each plugin .so has a known interface of the form:
extern "C" iissOperatorPlugin *constructor(void) {
return new iissOperatorPlugin;
}
That is, there exists a single C function that returns a C++ instance of the desired plugin class. This class looks like:
class iissOperatorPlugin {
public:
iissOperatorPlugin();
~iissOperatorPlugin();
char *name();
char *version();
char *usage();
bool parse(char *arguments);
iissData evaluate(char *arguments);
};
Upon shared library load, name() and version() are located and called to register the plugin in the application registry, and its constructor is called automatically.
When a formula is entered and a top-level operator is parsed, its plugin's parse() method is called. This method ensures the syntactical correctness of the arguments, and recursively calls other parse()s. It returns true if the syntax is correct. As an added bonus, it allows for some semantic parsing as well; for instance, certain formulas may contain file references which may be checked for existence. If a file was typed incorrectly in a formula with several other long calculations, much time could potentially be saved thanks to the skipped evaluation.
In any case, once the parse completes the evaluation begins with the same arguments. The arguments are once again parsed and the result evaluated and returned.
All that is needed to extend the language is to write a new plugin as a shared library, which means to implement all the methods in iissOperatorPlugin. The overhead involved is contained entirely in the parsing operation, as initialization, destruction, identification, versioning, and help text are features that may be common to all other evaluation systems and in any case are easy to implement. This parsing overhead, however, is very significant; in effect, it's like asking to contain a large portion of the entire IISS parsing system (74k, as of `du /usr/mvl1/fraser/iiss/new/src/parse`) inside a single function. And this language is to be modified to support new implicit data types as necessary!
In actuality, the vast majority of parsing operations will be the same across all plugins, so that the top-level iissOperatorPlugin will probably end up containing a bunch of standard parsing methods, designed so that any given plugin can lop off a section of the argument and parse it itself if desired. I have no idea how this would work, unfortunately.
Redefining the language using custom parsing algorithms is neat, but a perhaps more direct way to solve the same problem is to use a language which already includes any data structure you'll probably ever need.
First of all, it must be noted that Python supports all the language features of the IISS language, including operators (as functions, methods, and certain unary and binary operators listed at http://www.python.org/doc/current/ref/operators.html multidimensional array indexing in the C-sense (effectively, arrays of array), explicit array slicing (of the syntax a[4..7]), keyword arguments, etc. A properly modified lex/yacc grammar could convert old-language IISS formulas into new formulas written in Python, and there honestly wouldn't be much difference in the formulas themselves.
A neat feature of Python that will come into heavy play with plugins is that functions are first-class objects that may be assigned to. Suppose we want a global-namespace operator Add, and two module operators Internal.Add and Khoros.Add. To use the Khoros Add operator, simply
Add = Khoros.Add
Embedding Python into another application is actually fairly simple. A static library exists, libpython1.5.a, that encapsulates the entire python interpreter. This may be linked into the application. Py_Initialize() takes care of most of the interpreter initialization inside the application, and code can be issued to the interpreter in a variety of ways: as a string (PyRun_SimpleString), from a file (PyRun_SimpleFile), and as an expression (PyRun_String, which returns the result as a python object). Additional functions exist for importing modules, handling functions and objects, and the like.
Extending Python is generally harder. Classes, functions and objects are contained in a module, which is then imported. The API for exporting classes and functions is rather complicated and won't be discussed in detail here; for some more info, see http://www.python.org/doc/current/ext/ext.html and http://www.python.org/doc/current/api/api.html. Essentially, any access to a class (including method calls) are made through getattr() calls which resolve names into objects for each class; these effectively bind function calls. Classes, methods, and objects are referenced in the module scale as static arrays that are passed to the interpreter. One of these methods will typically be a class constructor for each class, traditionally named just like the class itself. Most of the code deals with PyObjects, or Python objects that represent any class/object in Python; these are translated to and from objects the core plugin code can understand. Additionally the plugins need to be aware of Python's reference-counting system and must increment and decrement refcounts as necessary.
A fairly nice C++ interface to this low-level C API exists; it is called CXX (http://cxx.sourceforge.net). The basic idea behind it is that reference counts are hidden behind classes which take care of them on construction and destruction, Python exceptions generate C++ exceptions, the STL is usable for operations on Python data, and C++ classes represent modules and extension types that may be instantiated instead of using C method/object tables.
The following links show a development snapshow of a prototype CXX system using an image data type and a read operator:
The module "q", in q.cxx, contains the qImage data type encapsulating an image width, image height, and 24-bpp RGB-encoded image data. This extension type is in qImage.h and qImage.cxx. The module "readJPEG" (in readJPEG.cxx) encapsulates the readJPEG operator which reads a JPEG from a file and returns a qImage. Note that libReadJPEG.cpp fully encapsulates the core plugin behavior and in fact was unmodified from an earlier version which used libReadJPEG.cpp directly in a shared library from a C++ program. I'm not honestly sure if all this compiles right now but it certainly illustrates the coding style used in CXX: very template-heavy and class-based.
The resulting libraries are quite large (~400k apiece), although it appears that most of the code is contained in template instantiations. Additionally Python itself must be linked with C++ (which means a home-build Python interpreter on virtually all platforms, since Python is usually linked with C).
As is clearly evident in qImage.h, a lot of effort is needed for functions in one extension to be resolvable in another. Here, a function table is passed from one extension to another through Python, then some preprocessor trickery is used to convert function calls into function array references. The low-level operations on qImage (and thus on iCEData) are in fact done using C functions, not C++.
Currently the biggest issue with CXX is that it does not like to be embedded in another application, although I am confident this is overcomable. CXX is in a bit of development limbo right now; although the last release was relatively recent (March 2000), the current author has indicated that he does not want to continue development himself. Another developer has stepped in though and there appear to be no doubts that development will continue. Also note that the CXX_Extension code that handles extension types (and upon which most of qImage and thus iCE data types are built) is in a very early stage and will probably change a good deal in the future.
Another option that was considered was SWIG ( http://www.swig.org), the Software Wrapper Interface Generator. This is an automatic way to interface several high-level languages (including tcl/tk, python, perl, etc) with C/C++ code. An interface file would be written for each C++ file to export to Python, SWIG is run on it, and it automagically generates all the wrapper code necessary to play with the C++ from within Python. However its C++ support is in a very developmental stage. It does not support many C++ features that may be useful, such as operator/function overloading, templates, namespaces, and pointers to member functions. Additionally its default naming patterns do not appear to be optimal, so that we would have to continually rename methods to make them more palatable in formulas. In short, SWIG was apparantly never designed to provide an scriptable interface to end-users, more of a scriptable interface for developers, and its C++ support is too lacking at the moment. (But what it does accomplish, it does damn well.) It will not be used for now.
Described at a low level of implementation, and ignoring the entire evaluation model, a CXX-based (and Python-based in general) evaluation system would be used to handle formulas as follows:
import iCEOperatorPlugin;so that iCEOperatorPlugin.iCEOperator would reference a given operator.
Add = Khoros.Add
There exist three general uses for Python in iCE:
This is what I believe the spreadsheet should eventually do, given enough development time.
The dual execution methods of user scripting and formula evaluation should be integrated into a console abstraction. All meaningful data manipulation goes through the console; the user and the application each have paths to it. A formula assignment in the spreadsheet exists as a literal assignment statement in the console, potentially duplicated by an equivalent statement typed in by the user. The application will store the data itself for performance reasons, but that's a minor detail.
The console is the most natural way to develop custom operators. Formulas can be typed into the console directly, including the use of variables. Functions can be constructed that can be immediately used in formulas, and those functions can easily be moved to modules as first-class data operators.
As an instantiation of the scripting mechanism the console inherits all the advantages and disadvantages thereof.
Multithreading will be implicit in this model, but the user won't see it. That is, the computation commands issued to the console will be engineered to encapsulate any multithreading as tightly as possible. The spreadsheet data will require numerous locks on it to prevent threads from clashing.
This console model implies a seperation of data manipulation from data display. It is easy enough to break this seperation - after all, the application has the data - but will deeply confuse both the user and the interpreter (which will likely hold dependency information in Python). Any external modifications made to the data must have reflected dependency information.
As far as clear disadvantages go: The console is essentially a new idea for iCE/iiss so it will doubtlessly take a few major revs to work the kinks out. Levoy essentially used a tcl console in his spreadsheet application, and it looks like he thought it was a good idea, but I think our situation is sufficiently different than his that we won't be able to speed up development much by examining his code in more detail. Finally, severally potentially huge implementation details were glossed over here.
A couple useful hybrid models exist between the extremes of the current system, the reference model, and the described Python-based console model ("The Right Thing"):
While it is impossible to perfectly predict what will get us in trouble, here are some things that I can at least think of right now:
Note that problems common to all systems (such as dependency tracking) are ignored.
While Python holds immense promise as a formula and scripting language, its implementation as an evaluation system contains numerous fuzzy areas which could significantly slow down development. The reference model, although it was never meant as a general solution, could work pretty well in simple cases and would be easier to code (but I sure wouldn't want to be the one in charge of parsing). The language-translation hybrid scheme inherits many of the evaluation problems of "The Right Thing" without the benefits of Python as a formula language.
The current parser appears much easier to use in the new application, but there are apparantly hidden pitfalls to its use, mostly relating to it being tightly coupled to the rest of the IISS codebase and data types. Other than that, the operator binding system (ie the giant "if/else" procedure) would have to be rewritten from scratch, but that would be pretty much all of the major work that would need to be done for the evaluation itself. The plugin loading would follow the reference model (scan a dir, load libs, and add names), and in fact the plugin ABI would remain the same.
In light of expected issues when porting the old evaluator, it is probably best to use the reference model to get a quick and somewhat powerful evaluation system working for the demo. Its implementation is well-understood and partially operational, and Josh (who has worked on the reference model and the old evaluator more than I have) vastly prefers coding for the reference model compared to the old evaluator.
Nevertheless, I think that Python is still The Right Thing. This means that the functionality of each plugin should be seperated from the wrapper code as much as possible, preferrably to the point of using seperate files. That way, seperate wrapper files can be used and the plugin code itself wouldn't change.
The evaluation system as a whole is to be encapsulated in a class called Evaluator, a top-level application class initialized on app load. Different versions of this Evaluator would encapsulate both the immediate "modified IISS" system and Python systems, as well as a console. The data is still stored somewhere else in the program, although the Evaluator will have to know where. Methods needed include:
init()
{add/del/group/...}{Cell/Frame/...}(...)
modify{Cell/Frame/...}(char *newFormula)
issueCommand(const char *string) (would basically be ignored for
modified IISS)
--- Rich Tollerton 12 May 2000
Updated 18 May 2000 for XHTML well-formedness and additional disadvantages of using the current system provided by Josh; changed conclusion to support using the reference model.
|
|
|
|
|
Image.h | 631 bytes | |
|
|
Makefile | 1 KB | |
|
|
index.html.bk | 29 KB | |
|
|
index2.html | 27 KB | |
|
|
libReadJPEG.cpp | 2 KB | |
|
|
q.cxx | 1 KB | |
|
|
qImage.cxx | 1 KB | |
|
|
qImage.h | 2 KB | |
|
|
readJPEG.cxx | 969 bytes |