Home of: [Atelier "FUJIGURUMA"] > [SASAX-RSS]

SEE "For Readers of English Version",
or Japanese version of this page

Implementation note

The purpose of this document is to explain details about implementation of SASAX-RSS.

This will help you to write your specific XML document parser on SASAX framework.

ATTENTION: Source codes shown in this document give higer priority to simpleness of explanation than to exact sameness with distributed source.

I wish you do not confuse between one in this tutorial and another in distribution.

Class names

In this document, abbreviated class names are used. Complete names are shown below.

Classes of SAX framework

NotationFull name
Attributes org.xml.sax.Attributes
SAXException org.xml.sax.SAXException

Classes of SASAX framework

Please refer JavaDoc in document distribution or on web for these classes.

NotationFull name
AbstractElement jp.ne.dti.lares.foozy.sasax.AbstractElement
CompositeElement jp.ne.dti.lares.foozy.sasax.CompositeElement
LooseDateTimeElement jp.ne.dti.lares.foozy.sasax.LooseDateTimeElement
Element jp.ne.dti.lares.foozy.sasax.Element
ElementDrivenHandler jp.ne.dti.lares.foozy.sasax.ElementDrivenHandler
Notification jp.ne.dti.lares.foozy.sasax.Notification
ParseContext jp.ne.dti.lares.foozy.sasax.ParseContext

Classes of SASAX-RSS parser

NotationFull name
ChannelElement jp.ne.dti.lares.foozy.sasax.rss.ChannelElement
ItemElement jp.ne.dti.lares.foozy.sasax.rss.ItemElement
RSSRootElement jp.ne.dti.lares.foozy.sasax.rss.RSSRootElement

Classes of SASAX-RSS GUI

NotationFull name
Channel jp.ne.dti.lares.foozy.sasax.rss.Channel
ChannelNotification jp.ne.dti.lares.foozy.sasax.rss.ChannelNotification
Item jp.ne.dti.lares.foozy.sasax.rss.Item
ItemNotification jp.ne.dti.lares.foozy.sasax.rss.ItemNotification

SASAX-RSS parser

Parser part of SASAX-RSS is simple enough to read/understand implementation of it, if you already read SASAX tutorial and understand about RSS specification.

One of few things, which are not described in SASAX tutorial, is that "ignore"/"ordered" parameters of CompositeElement constructor.

That constructor is introduced since SASAX 1.2, and these parameter mean:

ignore
this is whether element accepts unknown(= not registered) elements or not. According to RSS specification, unknown elements are allowed only as immediate sub-element of "channel"/"image"/"item"/"textinput" elements.
ordered

this is whether sub element should appear in order of registration or not. According to RSS specification, element order is not restricted.

Deserialization from XML document

Separation from SASAX-RSS parser

SASAX-RSS parser only examines whether specified XML document is valid as RSS document or not.

You should do something to get information from RSS document (e.g.: detail about "channel", list of "item" and so on), and "something" is setting information container up in almost all cases.

It is a kind of deserialization for (information container) object from XML document, and is specific to container definition. So, deserialization implementation, which is in GUI part of SASAX-RSS, is separated from SASAX-RSS parser.

There are some ways to get parsing result on SASAX:

get value after parsing:
This is the easiest way, but not good at complex document (e.g.: getting list of value from repeated element).
overriding notifyDetermined() method:
You can get parsing result immediately at determination by that way on classes derived from AbstractElement, but re-usability of both Element implementation class and result handling logic are decreased.
receive start/end events as Notification:
Implementing deserialization logic as Notification can isolate XML document parsing logic on SASAX and deserialization logic which is specific to your application.

So, SASAX-RSS chooses last approach.

Deserialization implementation

For example, deserialization codes for "channel" RSS element, are shown below. These are from ChannelNotification, which is implementation of Notification.

In below example, "channel_" is the Channel type member field of it. Channel is container for RSS channel information.


public void elementStarted(Element element,
                           ParseContext context,
                           Attributes attributes)
    throws SAXException //
{
    // Get "rdf:about" attribute
    String about =
    attributes.getValue("http://purl.org/rss/1.0/", 
                        "about");
    channel_.setValue("rdf:about", // name of value
                      about);
}

Procedure of "channel" element start

High lighted parts have information from XML document.


public void elementEnded(Element element,
                         ParseContext context)
    throws SAXException //
{
    ChannelElement channel = (ChannelElement)element;
    // Get "rss:title" value
    channel_.setValue("title", 
                      channel.title.getString(true));
    // Get "rss:link" value
    channel_.setValue("link",
                      channel.link.getString(true));
    // Get "rss:description" value
    channel_.setValue("description",
                      channel.description.getString(true));
}

Procedure of "channel" element end

Deserialization procedures, which require attribute value of element, are placed in elementStarted(), and others, which require determination of sub elements, are placed in elementEnded(). Is not it difficult, is it ?

At parsing time, you should add Notification for deserialization to corresponded element, like shown below.


// Create information container
Channel channel = new Channel();
// Create RSS document parsing Element
RSSRootElement root = new RSSRootElement(null);
// Add Notifiation to "channel" element
root.channel.addNotification(new ChannelNotification(channel));

          :

ElementDrivenHandler handler = new ElementDrivenHandler(root);
handler.parse(reader); // parse RSS XML document
// here, Channel is deserialized from XML document

Deserialization procedure

Extension

This section explains how to add extension elements and get value from them, by showing implementation code for "date" of "Dublin Core" under "item" of RSS.

In SASAX-RSS GUI source files, specific implementation parts for "date" of "Dublin Core" are sorrounded by ">>>> DC:DATE" and "<<<< DC:DATE".

There are only 4 such parts. One of them is string symbol definition to share, and another is for displaying. The other two parts are real deserialization code: "registration of element" and "deserialization from element".

Register element

LooseDateTimeElement can be used for "date" of "Dublin Core", and element registration code is like as shown below.


LooseDateTimeElement dcDate =
new LooseDateTimeElement(root.item,
                         "http://purl.org/dc/elements/1.1/",
                         "date",
                         null // meaning GMT time zone
                         );
// register dcDate element under "item" as optional
root.item.addOptionalItem("dc:date", dcDate);

Element for "date" of "Dublin Core"

Deserialize from element

Deserialization code in ItemNotification for "item" element is like as shown below.


public void elementEnded(Element element,
                         ParseContext context)
    throws SAXException //
{
    ItemElement item = (ItemElement)element;

        :

    LooseDateTimeElement dcDate =
    (LooseDateTimeElement)(item.getComponent("dc:date"));
    item_.setValue("dc:date", dcDate.getDate(false));
}

Deserialization from "date" of "Dublin Core"

In above example, "item_" is the Item type member field of it. Item is container for RSS item information.