The function make-lexer converts lexical analyzer specifications
into lexical analyzer procedures.
Return a lexical analyzer built from specification.
A lexer is built from a specification that consists of regexps and actions. The regexps are listed in order of precedence, each matching a particular token type.
The actions are either a token-id (a symbol or #f), or a procedure.
If an action is a token-id, <i>, then when a matching token is
read, the list (<i> <lexeme>) is returned (where <lexeme>
is a string consisting of the matched characters).
If an action is a procedure, the procedure is called with two arguments -- the lexeme string and the input port. The return value of the procedure is returned from the lexer.
Here is a sample specification:
(define-public ctax-lexer-spec
`(
;; Block/Statement Structuring
("{" <lbrace>)
("}" <rbrace>)
(";" <semi>)
;; Defining Functions
("public" <public>)
("static" <static>)
("auto" <auto>)
;; Flow Control Keywords
("if" <if>)
("else" <else>)
("for" <for>)
("while" <while>)
("return" <return>)
("do" <do>)
("break" <break>)
("continue" <continue>)
;; Numbers
("[0-9]\\+\\.\\?[0-9]*" ,(lambda (token port)
(list '<number> (string->number token))))))
The last item in the specification may optionally use the symbol
else instead of regexp. That case will cover all lexemes not
matched by the preceeding regexps. If not else case is provided,
then unmatched lexemes return token's of type <default>.
Generally speaking, given a specification, the lexer will return the longest matching lexeme. If two cases both match, the lexer will use the one that occurs first in the specification.
The rule that longest matches may be overrided for a particular type of lexeme by putting the keyword :shortest after the action in the lexer specification. If such a lexeme type is ever matched, it is returned immediately without consuming addtional characters to look for a longer match.
Go to the first, previous, next, last section, table of contents.