Queen's University Logo

Thomas R. Dean

Research Interests

On a broader context, I believe that different developers conceptualize software in different ways. Methodologies that are appropriate for one person may not be appropriate for others. Thus I believe that prescriptive methodologies are doomed to failure in the general case. Methodologies that are supportive rather than prescriptive are required. To this end I am interested in anything that supports the act of software design, development and evolution without forcing a particular way of thinking on the software professional. I have a particular interest in new programming concepts and environments for software specification, development and evolution. I am part of the Software Technology Laboratory, The Software Engineering Research Laboratory and the Research in Security, Kinsgton (RISK) research group.

Recent Publications and Talks

Long Bio and CV

Research Topics:

Design Recovery and Software Evolution

Design Recovery is an activity that happens throughout the development cycle, not just during the maintenance phase. Lethbridge & Singer have shown that the most common activity of a software developer is searching. Support environments must be able to support this activity in an intuitive and interactive way. My interests in this area are:

  • Underlying Source Code Analysis and Manipulation Techniques
  • Applying Language Analysis Techniques to Network Security
  • Evolution of Legacy Systems

Source Code Analysis and Manipulation

Source code in this context means more than just the text of the program in the implementation language. It can include anything from low level machine code to executable graphical representations of systems. The source code for the system is what is actually executed and what is actually deployed. It therefore remains the final authority on what actually happens (but not what should happen). My research in this area is based on finding ways in which various types of analysis can be applied to real world problems throughout the developer cycle. My focus is on finding information in the source code (all types) to help the developer accomplish his task.

There are several issues in the area. The first is that the grammars appropriate for compilation of text representations of these languages is usually not appropriate for analysis. They are designed for efficient translation to some particular implementation representation. Grammars for use by tools to extract higher design level information organize the language in different ways. So one task in the area is to develop analysis grammars for various languages including procedural languages (C/Java/Pascal/Perl), markup languages (HTML,XML) and scripting languages.

Analysis techniques differ based on the type of language as well. Techniques for procedural languages do not directly translate to mixed environment languages such as Microsoft ASP or Java Server Pages. They also may not directly translate to analysis of graphical languages.

Network Security

Network security not only depends on the security of the protocol but also on the security of the implementation of the protocol. Many of the common compromises are not weaknesses in the protocol but failures of the implementation. Standard conformance testing does not usually cover these cases. This research applies source code analysis and manipulation techniques to security protocols. In the preliminary stages, we are using these techniques to analyze the protocols directly, looking for constraints in the protocol that all implementations must implement. This is used to generate test data for black box style testing that will focus on testing those constraints in real implementations. Future work will expand to analysis of the code under test for both identifying potential failures and to identify the source of test failures. This research is carried out in collaboration with Scott Knight at the Royal Military College of Canada.

Legacy Systems

Legacy systems, more appropriately called heritage or vintage software, represent enormous investments on the part the organizations that own the software. A typical service company (bank, retail, etc) may have between 40 and 120 million lines of code. This code represents all of the past and current business practices of the organization. This software is mission critical, and will never be rewritten or replaced (although individual modules may be replaced from time to time). Most of these organizations take special care to make sure that the structure and quality of the software does not get out of control.

My research in this area is:

  • identifying classes of maintenance tasks that are amenable to various levels of automation, and,
  • developing techniques for automating maintenance tasks

Given my broader goals, I will be seeking automation techniques and maintenance methodologies that are broadly applicable (supportive) rather than narrow (prescriptive). when developing automation support for maintenance tasks.

Current Research Projects

Whole Website Understanding Project(WWSUP)

This project, in collaboration with Dr. Cordy in School of Computing, involves the analysis and modeling of all tiers the a website from the client side (Javascript and applets) through to dynamic server languages (JSP,ASP) through to the back end (Enterprise Beans, Stored Procedures in Databases). In particular, we are seeking to model and automated analysis that transcends the boundaries between the languages (HTML, XHTML, CSS, VB, ECMAScript, Perl, Java). We hope to provide a comprehensive model and allow for automated evolution of tasks that involve coordinating changes to more than one language and technology. Some recent work in this area includes migtrating embeded Java to Custom Tags in JSP, Web Application Slicing and Clone Detection, and migration of JSP to AJAX.

SCL: A language for describing Network Protocols

SCL is a langauge that is used to describe the constriants on a protocol that must be enforced by any implementation. It is based on the ISO standard ASN.1 notation, extended with XML markup. It currently has markup that specifies constraints within a packet and we are in the process of adding state information to the notation. We are also building an eclipse plugin for the test environment.

Automated Security Testing of Stateful Network Protocols

Our research to date has focussed on request/response protocols. This research extends our approach to state based protocols. Currently we are testing our approach on SMB the Windows file sharing protocol.

Task directed modeling of conventional languages

While most people think of legacy systems as systems written in COBOL, PL/I or RPG, any system that has been deployed can be viewed as a legacy system. This means that languages such as C, C++ and Java are also legacy languages. There have been some general models made of these languages such as the Datrix and Hungarian models for C++. While useful for many tasks, it may be better to build models specific to a particular maintenance task. The task directed models may contain a general core, and be extended to included specifics for each of the maintenance tasks. Some examples of maintenance tasks are migration of user interfaces (terminal to web), addition of enterprise messaging, and conversions between language dialects (different compilers).

Design Recovery of scripting languages

A fair amount of research into models of conventional languages has occurred in the community. Little or no research has been done on scripting languages such as shell languages, tcl or perl. Some interesting questions are:

  • To what extent do the various middle models apply to interactions at the scripting level?
  • How do we model the interaction between scripting languages and conventional languages (program call in shell/tcl/perl, module invocation in perl)?

Automated Maintenance

Automated maintenance can take several forms. In the most advanced state, it involves building a process that automates a particular maintenance task. For example, an automated year 2000 tool might analyze and remediate legacy applications with a minimum of manual intervention. A more modest state would be a set of reports designed to assist manual tasks. Which form is most appropriate depends on the task and the size of the code base.

Automated Translation

This project is concerned with translating one high level language to another. The goal is to produce a result that is not only operationally equivalent, but also idiomatic. In addition to translating the language, the task usually involves translating the environment as well. For example, translating a legacy application to Java means not only translating the language, but the transaction environment as well.

As an example, Perl is a common language for writing dynamic content on the web. One project is to analyze the server side code (cgi or mod_perl) and translate it to JSP/Enterprise Beans. There are several components to this transformation. The first is the obvious language component (perl-> JSP/Java). The second is the packaging (perl modules to Enterprise Java Beans). The third is the analysis and translation of any modules imported by the perl modules, which may be written in other languages. The last is the analysis and translation of any transaction requirements of the application.

Copyright © Thomas R. Dean, 2008. Contact Information: