CHUNGA - Portable chunked streams for Common Lisp


 

Abstract

Chunga implements streams capable of chunked encoding on demand as defined in RFC 2616. For an example of how these streams can be used see Drakma.

The library needs a Common Lisp implementation that supports Gray streams and relies on David Lichteblau's trivial-gray-streams to offer portability between different Lisps.

Chunga is currently not optimized towards performance - it is rather intended to be easy to use and (if possible) to behave correctly.

The code comes with a BSD-style license so you can basically do with it whatever you want.

Download current version or visit the project on Github.


 

Contents

  1. Download and installation
  2. Support
  3. The Chunga dictionary
    1. Chunked streams
      1. chunked-stream
      2. chunked-input-stream
      3. chunked-output-stream
      4. chunked-io-stream
      5. make-chunked-stream
      6. chunked-stream-stream
      7. chunked-stream-input-chunking-p
      8. chunked-stream-output-chunking-p
      9. chunked-input-stream-extensions
      10. chunked-input-stream-trailers
    2. Conditions
      1. chunga-condition
      2. chunga-error
      3. chunga-warning
      4. syntax-error
      5. parameter-error
      6. input-chunking-body-corrupted
      7. input-chunking-unexpected-end-of-file
    3. RFC 2616 parsing
      1. with-character-stream-semantics
      2. read-line*
      3. read-http-headers
      4. token-char-p
      5. read-token
      6. read-name-value-pair
      7. read-name-value-pairs
      8. assert-char
      9. skip-whitespace
      10. read-char*
      11. peek-char*
      12. trim-whitespace
      13. *current-error-message*
      14. *accept-bogus-eols*
      15. *treat-semicolon-as-continuation*
      16. as-keyword
      17. as-capitalized-string
  4. Acknowledgements

 

Download and installation

Chunga together with this documentation can be downloaded from Github. The current version is 1.1.8. Chunga will only work with Lisps where the character codes of all Latin-1 characters coincide with their Unicode code points (which is the case for all current implementations I know).

The esieast way to install Chunga is with Quicklisp

The current development version of Chunga can be found at https://github.com/edicl/chunga.
 

Support

The development version of chunga can be found on github. Please use the github issue tracking system to submit bug reports. Patches are welcome, please use GitHub pull requests.
 

The Chunga dictionary

Chunked streams

Chunked streams are the core of the Chunga library. You create them using the function MAKE-CHUNKED-STREAM which takes an open binary stream (called the underlying stream) as its single argument. A binary stream in this context means that if it's an input stream, you can apply READ-SEQUENCE to it where the sequence is an array of element type OCTET, and similarly for WRITE-SEQUENCE and output streams. (Note that this specifically holds for bivalent streams like socket streams.)

A chunked stream behaves like an ordinary Lisp stream of element type OCTET with the addition that you can turn chunking on and off for input as well as for output. With chunking turned on, data is read or written according to the definition in RFC 2616.


[Standard class]
chunked-stream


Every chunked stream returned by MAKE-CHUNKED-STREAM is of this type which is a subtype of STREAM.


[Standard class]
chunked-input-stream


A chunked stream is of this type if its underlying stream is an input stream. This is a subtype of CHUNKED-STREAM.


[Standard class]
chunked-output-stream


A chunked stream is of this type if its underlying stream is an output stream. This is a subtype of CHUNKED-STREAM.


[Standard class]
chunked-io-stream


A chunked stream is of this type if it is both a CHUNKED-INPUT-STREAM as well as a CHUNKED-OUTPUT-STREAM.


[Function]
make-chunked-stream stream => chunked-stream


Creates and returns a chunked stream (a stream of type CHUNKED-STREAM) which wraps stream. stream must be an open binary stream.


[Specialized reader]
chunked-stream-stream (stream chunked-stream) => underlying-stream


Returns the underlying stream of the chunked stream stream.


[Generic reader]
chunked-stream-input-chunking-p object => generalized-boolean


Returns a true value if object is of type CHUNKED-INPUT-STREAM and if input chunking is currently enabled.


[Specialized writer]
(setf (chunked-stream-input-chunking-p (stream chunked-input-stream)) new-value)


This function is used to switch input chunking on stream on or off. Note that input chunking will usally be turned off automatically when the last chunk is read.


[Generic reader]
chunked-stream-output-chunking-p object => generalized-boolean


Returns a true value if object is of type CHUNKED-OUTPUT-STREAM and if output chunking is currently enabled.


[Specialized writer]
(setf (chunked-stream-output-chunking-p (stream chunked-output-stream)) new-value)


This function is used to switch output chunking on stream on or off.


[Specialized reader]
chunked-input-stream-extensions (stream chunked-input-stream) => extensions


Returns an alist of attribute/value pairs corresponding to the optional "chunk extensions" which might have been encountered when reading from stream.


[Specialized reader]
chunked-input-stream-trailers (stream chunked-input-stream) => trailers


Returns the optional "trailer" HTTP headers which might have been sent after the last chunk, i.e. directly before input chunking ended on stream. The format of trailers is identical to that returned by READ-HTTP-HEADERS.

Conditions

Here are conditions which might be signalled if something bad happens with a chunked stream.


[Condition]
chunga-condition


All conditions signalled by Chunga are of this type. This is a subtype of CONDITION.


[Error]
chunga-error


All errors signalled by Chunga are of this type. This is a subtype of CHUNGA-CONDITION and of STREAM-ERROR, so STREAM-ERROR-STREAM can be used to access the offending stream.


[Warning]
chunga-warning


All warnings signalled by Chunga are of this type. This is a subtype of CHUNGA-CONDITION and of WARNING.


[Error]
syntax-error


An error of this type is signalled if Chunga encounters wrong or unknown syntax when reading data. This is a subtype of CHUNGA-ERROR.


[Error]
parameter-error


An error of this type is signalled if a function was called with inconsistent or illegal parameters. This is a subtype of CHUNGA-ERROR.


[Condition type]
input-chunking-body-corrupted


A condition of this type is signaled if an unexpected character (octet) is read while reading from a chunked stream with input chunking enabled. This is a subtype of CHUNGA-ERROR.


[Condition type]
input-chunking-unexpected-end-of-file


A condition of this type is signaled if we reach an unexpected EOF on a chunked stream with input chunking enabled. This is a subtype of CHUNGA-ERROR.

RFC 2616 parsing

Chunga needs to know a bit about RFC 2616 syntax in order to cope with extensions and trailers. As these functions are in there anyway, they're exported, so they can be used by other code like for example Drakma.

Note that all of these functions are designed to work on binary streams, specifically on streams with element type (UNSIGNED-BYTE 8). They will not work with character streams. (But the "bivalent" streams offered by many Lisp implementations will do.) They must be called within the context of WITH-CHARACTER-STREAM-SEMANTICS.


[Macro]
with-character-stream-semantics statement* => result*


Executes the statement* forms in such a way that functions within this section can read characters from binary streams (treating octets as the Latin-1 characters with the corresponding code points). All the functions below must be wrapped with this macro. If your code uses several of these functions which interact on the same stream, all of them must be wrapped with the same macro. See the source code of Drakma or Hunchentoot for examples of how to use this macro.


[Function]
read-line* stream &optional log-stream => line


Reads and assembles characters from the binary stream stream until a carriage return is read. Makes sure that the following character is a linefeed. If *ACCEPT-BOGUS-EOLS* is not NIL, then the function will also accept a lone carriage return or linefeed as a line break. Returns the string of characters read excluding the line break. Additionally logs this string to log-stream if it is not NIL.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
read-http-headers stream &optional log-stream => headers


Reads HTTP header lines from the binary stream stream (except for the initial status line which is supposed to be read already) and returns a corresponding alist of names and values where the names are keywords and the values are strings. Multiple lines with the same name are combined into one value, the individual values separated by commas. Header lines which are spread across multiple lines are recognized and treated correctly. (But see *TREAT-SEMICOLON-AS-CONTINUATION*.) Additonally logs the header lines to log-stream if it is not NIL.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
read-token stream => token


Read characters from the binary stream stream while they are token constituents (according to RFC 2616). It is assumed that there's a token character at the current position. The token read is returned as a string. Doesn't signal an error (but simply stops reading) if END-OF-FILE is encountered after the first character.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
token-char-p char => generalized-boolean


Returns a true value if the Lisp character char is a token constituent according to RFC 2616.


[Function]
read-name-value-pair stream &key value-required-p cookie-syntax => pair


Reads a typical (in RFC 2616) name/value or attribute/value combination from the binary stream stream - a token followed by a #\= character and another token or a quoted string. Returns a cons of the name and the value, both as strings. If value-required-p is NIL (the default is T), the #\= sign and the value are optional. If cookie-syntax is true (the default is NIL), the value is read like the value of a cookie header.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
read-name-value-pairs stream &key value-required-p cookie-syntax => pairs


Uses READ-NAME-VALUE-PAIR to read and return an alist of name/value pairs from the binary stream stream. It is assumed that the pairs are separated by semicolons and that the first char read (except for whitespace) will be a semicolon. The parameters are used as in READ-NAME-VALUE-PAIR. Stops reading in case of END-OF-FILE (instead of signaling an error).

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
assert-char stream expected-char => char


Reads the next character from the binary stream stream and checks if it is the character expected-char. Signals an error otherwise.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
skip-whitespace stream => char-or-nil


Consume characters from the binary stream stream until an END-OF-FILE is encountered or a non-whitespace (according to RFC 2616) characters is seen. This character is returned (or NIL in case of END-OF-FILE).

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
read-char* stream => char


Reads and returns the next character from the binary stream stream.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
peek-char* stream &optional eof-error-p eof-value => boolean


Returns a true value if a character can be read from the binary stream stream. If eof-error-p has a true value, an error is signalled if no character remains to be read. eof-value specifies the value to return if eof-error-p is false and the end of the file has been reached.

See WITH-CHARACTER-STREAM-SEMANTICS.


[Function]
trim-whitespace string &key start end => string'


Returns a version of the string string (between start and end) where spaces and tab characters are trimmed from the start and the end.


[Special variable]
*current-error-message*


Used by the parsing functions in this section as an introduction to a standardized error message. Should be bound to a string or NIL if one of these functions is called.


[Special variable]
*accept-bogus-eols*


Some web servers do not respond with a correct CRLF line ending for HTTP headers but with a lone linefeed or carriage return instead. If this variable is bound to a true value, READ-LINE* will treat a lone LF or CR character as an acceptable end of line. The initial value is NIL.


[Special variable]
*treat-semicolon-as-continuation*


According to John Foderaro, Netscape v3 web servers bogusly split Set-Cookie headers over multiple lines which means that we'd have to treat Set-Cookie headers ending with a semicolon as incomplete and combine them with the next header. This will only be done if this variable has a true value, though. Its default value is NIL.


[Function]
as-keyword string &key destructivep => keyword


Converts the string string to a keyword where all characters are uppercase or lowercase, taking into account the current readtable case. Might destructively modify string if destructivep is true which is the default. "Knows" several HTTP header names and methods and is optimized to not call INTERN for these.


[Function]
as-capitalized-string keyword => capitalized-string


Kind of the inverse of AS-KEYWORD. Has essentially the same effect as STRING-CAPITALIZE but is optimized for "known" keywords like :CONTENT-LENGTH or :GET.

 

Acknowledgements

Thanks to Jochen Schmidt's chunking code in ACL-COMPAT for inspiration. This documentation was prepared with DOCUMENTATION-TEMPLATE.