RESEARCH INTERESTS AND ACTIVITIES

 

 

 

ONLINE SOFTWARE

 

 

RESEARCH INTERESTS (In rough reverse chronological order)






Reasoning through learning

Inductive learning is often ineffective because it is underconstrained.
Reasoning is often ineffective because it is overconstrained.

Imperative: A model of reasoning/learning that combines:

  • the reasoner's ability to use structured models of the world with
  • the learner's ability to choose between possibilities by observing the world.



Approach:

  1. Expert provides numeric/rule-based model of the world.
  2. Expert may annotate certain portions of that model as unknown or uncertain.
  3. The system induces a predictive model by refining uncertain portions of the theory.



Benefit:

Using an annotated theory facilitates the use of learning on much more complex problems.
Allows expert to effectively use what is known about a problem even when portions of the complete picture are missing.



Status:

My CogSci91 Paper documents the intuition behind the approach.
The Planning Book Chapter describes early applications to numeric domains.
An extended version of my AAAI workshop paper provides a comparison to other Learning techniques.



On Going Work:

  • A Short (30 page) dissertation manuscript provides the best and most up to date description of the approach.
  • Using these annotated theories I have encoded chemical and physical models of portions of the protein folding at the atomic level. I plan to use these to learn predictive models (See bioinformatics below).





Datamining

If an organization expends the resources required to collect some large body of data then it has often also expended energy in the study of that data.

Imperative:

Thus it is important for datamining tools to be able to utilize existing expertise.



Status:

  • Above I describe an approach for incorporating partial models to focus the mining of predictive models.
  • Below I discuss a datamining application within biology.





Bioinformatics

The human genome project and others like it have resulted in a tremendous influx of raw DNA and protein sequence data. Making use of this raw data requires that one makes high-level predictions about the genes and protein sequences from that raw data.

Imperative:

Predict a protein's shape (fold class) from its primary sequence.

 

Approach:

  • Build partial electro-chemical and physical models of protein folding at the atomic level.
  • Combine several types of data available of protein structures into a single atom-level descriptions for each protein.
  • Mine this transformed data for rules predicting protein fold-class.



Status:

Preliminary results are shown in the last section of my dissertation proposal.



On Going Work:

  • Initial results were obtained on a CM5 supercomputer, Currently porting algorithms to run on other (available) supercomputers.
  • Once complete I will finish testing the system's performance given a new source of information: multiple alignments for many proteins within a single fold-class. Having many instances for each positions within a particular fold provides the learner with a position-by-position estimate of it predictiveness & variance. This estimate will direct learning toward those low noise positions when making predictions.





Personal Data Management

Cheap information processing technology has resulted in a dramatic increase in the quantity and diversity of information that we as humans are now responsible for.

Imperative:  Personal Data Management Tools that:

  • Have a variety of ``zero-overhead'' data input methods.
  • Preserve content transparently across many platforms and formats.
  • Allow user to view (and update) a common data set from multiple perspectives.



History

This has been a part-time hobby for some time, and I have played with a number of ideas:

PAL This monster is many tens of thousands of lines of LISP code. It runs as a detached process on the internet. I connect to it via a special UNIX shell I wrote. It runs _all_ shells a user has on all computers on the net. This approach certainly allows it to accepts and integrate information from many sources (it also had mail/gopher/web input filters). Unfortunately this approach is nearly impossible to make portable, and is very tied to the internet.



On Going Work:

It seemed a shame to develop tools only I can use, so recently I have started building new tools on top of EMACS. This platform handles many details (like editing and display) and is portable across both WINDOWS and UNIX systems.

I have a couple of tools that are functional and modular, but not yet polished applications. They include:

1.      A hypertext info manager.

2.      A bidirectional file updating mechanism that works across WINDOWS and UNIX.

3.      A source code documentation manager.


Progress on this interest is rather hit-or-miss since I don't actually get paid to work on this...
I also have a USR Robotics pilot. This is a really great PDA; its only short comming is its carrying case--so I've built my own version.











































SELECTED PUBLICATIONS

 

"Plausible Inference: A Knowledge-Intensive Approach to Induction.
  (An Application of PI to the Problem of Hidden Homology Modeling)"

Doctoral Thesis. University of Illinois: Technical Report UIUC-DCS-R-97-2004. Urbana, IL. 1998. (PS, PDF)

"Plausible Inference: A Knowledge-Intensive, Inductive Approach to Domain Modeling."

University of Illinois: Technical Report UIUC-DCS-R-97-2004. Urbana, IL. 1997. (Prelim)

"Dynamic-Bias Induction."

American Association for Artificial Intelligence Fall Symposium Series on Relevance (AAAI-94). New Orleans, LA. 1994. pp. 164-67. (Extended Version)

"A First Theory of Plausible Inference and Its Use in Continuous Domain Planning."

Machine Learning Methods for Planning, Steven Minton (ed.). San Mateo, CA: Morgan Kaufmann. 1993. pp. 93-124. (Paper, Figures)

"An Alternative to Deduction."

Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society. Chicago, IL. 1991. pp. 837-841. (Paper)

"Making SME Greedy and Pragmatic."

Proceedings of the Twelfth Annual Conference of the Cognitive Science Society. Cambridge, MA. 1990. pp. 61-68. (Paper)


Note: A number of my publications were done using the old fashion cut and paste; while this was most expedient at the time, it means that some of these documents do not have figures included.




































SOFTWARE   BY  DAN


Most of the stuff I code for personal use is never polished and general enough to distribute, but every once an a while...   Comments & bug reports welcome
MMDDYYYY-oblio@sneakemail.com.


Directory

--

My downloads directory.

Descriptions

--

Short descriptions for files in downloads directory. Several apps are described in more detail immediately below this section.

Publications

--

Publications in the directory are described here.


 

TRACER A source-level tracer for LISP.

Features:

·         Full screen source-code view of a LISP execution.

·         Records execution, so one can move forward and backward to find a bug.

·         Five levels of granularity for tracing.

·         Ten pages of online documentation.

·         Simple. Installs as a single file (docs are embedded)

·         Runs at 1/10 interpreter speed.

Limitations:

·         Not well polished.

·         Does not correctly record destructively modified objects (ie. setf car)

Installation & Use:

1.      Download file, compile-file, and load into LISP.

2.      Type: (TR :load "lisp-source-file")   To load a file for tracing.

3.      Type: (TR '--form-to-trace--)   To record and display an execution trace.

4.      Type: (TR)   To view the last trace recorded.

5.      Type: h   while in tracer for online help.

Requirements: Common LISP; and an ansi display (e.g. a vt100 or xterm)

BM-CONVERTER Convenient remote access for you and others to your bookmarks.

Converts netscape's long list of bookmarks into a hierarchy of lists. I use this all the time for fast access to my home machine's bookmarks when I am remote. Features:

·         Very simple installation and use. (5 min. for simple setup)

·         Three interfaces. Examples: Frame, One Window, Dual Brower.

·         Possible to directly link to internal nodes. (Allow you to maintain specialized hot-link lists on your homepage as a subsection within your hot lists.)

·         Can also be setup as a CGI script (less efficient but always up to date).

Installation & Use:

1.      Edit first line to point to your perl executable

2.      To make executable type: chmod a+x bm-converter; rehash

3.      Execute script: It will generate three files, link your favorite into your homepage. Enjoy.

Requirements: Netscape bookmarks, PERL

AUTO-BACK Daily backup utility.

A simple utility that makes daily backups of user files to a specified tar file.
Features:

·         Very simple installation and use. (5 min. for simple setup)

·         Well polished with good documentation.

·         Fairly configurable definition of files to be backed up.

Installation & Use:

1.      Save the auto-back file into a directory on your path.

2.      cd to that directory and type: chmod a+x auto-back; rehash

3.      Type: auto-back -setup

Requirements: UNIX (w. find, more, & grep) and gtar

SYS-TIME A very accurate LISP timer.

Like the lisp function TIME, but is three to six orders of magnitude more accurate. Intended to measure the efficiency of single LISP commands--e.g. a hash table access verses an alist access, for example.
Features:

·         Very accurate.
(eg. can determine length of a list by the execution time of LENGTH on that list).

·         Simple.

·         Portable.

·         Gives a measure of its own inaccuracy.

·         Documentation in header of source file.

Installation & Use:

1.      Save sys-time.lisp file in your file system.

2.      Load file into lisp.

3.      To determine time needed to format an int type: (sys-time '(format nil "~A" 17))

Requirements: Common LISP






































DOWNLOADS   DIRECTORY


LISP

tracer.lisp

Full-screen LISP source level debugger.

sys-time.lisp

Very accurate execution timer.


MODULES

 

db.lisp

Module: Disk-based hash table. Ported to: LUCID, ALLEGRO, KCL, and CL.

io.lisp

Module: Full screen I/O. Command trees. Supports X-windows and ANSI terminals.

boot.lisp

Module: Portable low-level extensions to Common LISP: Processes, Universal struct access, UNIX environment, etc.

 

 


EMACS LISP

 

mdoc.el

Code/Documentation Synchronizer.

inf-ext.el

Extensions to emacs info that allow one to use .info files a hypertext personal-info-manager.

mirror.el

Tree-based file copying and mirroring (Based on ANGE-FTP).


PERL / SHELL (SH)

 

auto-back

Makes daily backups of user files to a specified tar file.

bm-converter

Converts Netscape boomarks file from a long list to set of nested lists.