ABSTRACT
The introduction of interactive multimedia documents gives a real power to the communication field. This kind of documents gives more expressive power to catch user attention by the integration of different media types. Therefore, several research works are focused on the way to manage multimedia documents. We present a system that supports both the specification of constraints among media components and the generation of multimedia presentations. The main characteristic of this system is its facility to be used by not programmer users. The proposed system is mainly based on an object oriented document model and offers, to users, a methodology to specify, then to generate their presentations in a simple way.
PDF Abstract XML References Citation
How to cite this article
DOI: 10.3923/itj.2007.1013.1020
URL: https://scialert.net/abstract/?doi=itj.2007.1013.1020
INTRODUCTION
MultiMedia Documents (MMDs) combine, in time and space, different types of elements like video, audio, still-picture, text and synthesized image. These data types offer more expressive power and opportunities to catch user attention as a virtual visit (with video and images) of a house for sale brings more information than a textual description.
Designing and managing a multimedia document are challenging tasks requiring from an author to specify different types of information at different levels. This includes the selection of components to be included in the document, the structure and the behavior of the multimedia items and, not in the last, the possible user interactions at run-time.
For this reason, we must make a distinction between the specification (or editing) phase of the temporal scenario and its presentation (or execution) phase. The edition and presentation operations are carried out at different times and possibly by different users: The first task being performed by the author, the second one by the reader (which can be the author). An authoring system for multimedia documents must handle these two phases, because it is essential for an author to easily skip from the editing phase to the execution one in order to gradually test and improve the document presentation. The document management is more delicate when considering interactive documents that allow the reader interactions during the presentation.
Many systems have tackled the problem of authoring complexity in different ways (Bulterman and Hardman, 2005) and various tendencies guided efforts on achieving tools for multimedia documents edition. Nevertheless, all the suggested systems share a common component: A multimedia document model that integrates media descriptions together with temporal and spatial specifications (Buchanan and Zellweger, 2005; Bulteraman and Hardman, 2005).
In some studies, the specification of the constraints, which generate scenarios, is based on the axis of absolute time. HyTime (ISO/IEC, 1997) offers, in this context, a model able to support representation of a great number of document configurations as long as the media items have a determinist temporal behavior.
Other systems permit the scenario specifications with programming languages or scripts as MHEG (Geyer et al., 1997) and the prototypes GLUE (Berkom, 1998) and GLASS (Geyer et al., 1997). The standard, based on the paradigm of object oriented programming languages, requires programming competences and following the components ordering, by the only reading of a script, becomes a tedious task.
Some research work on scenario specifications, by execution structures, was also proposed. In this approach, a document is modeled by a control stream diagram, which specifies the interactions between the documents components. The constraints on these items are then described in an easier and richer way. Many proposals were carried out in this research axis and various models have been provided. Some of these models are extensions of Petri Nets (PN) as the OCPN (Little and Ghafoor, 1993) and HTSPN (Willrich et al., 2000). Although the PN approach is simple and offers checks of inconsistency, it presents some disadvantages mainly with regard to the readability of the specifications and the reuse of the document.
Another proposed way to represent specifications by execution structures is by using hierarchical graphs where the logical structure of the document is used to describe the temporal synchronization. The power of this approach is the possibility to organize the document in independent modules on which the synchronization primitives are applied (Sampaio, 2003).
CMIFED was for a long time the most representative tool of this approach (Villard, 2000). However, a new standard for MMD design and edition was proposed within the consortium W3C to mitigate the CMIFED weaknesses (SMIL, 2001). This standard uses a declarative language and proposes a document format of integration. As SMIL is based on XML, which has as objective to support several types of applications, it plays a significant role in the specification and management of MMD. Nevertheless, SMIL presents, until recently, some limits in constraints specifications. It proposes two ordering relations that allows only coarse grains synchronization and neither intra-media synchronization (inside a media) nor lip synchronization (between a video and an audio). Moreover, on the contrary of future tendencies, SMIL does not support any extensibility mechanism for basic media items nor navigation dependent on the context (Tran-Thuong, 2003). It is however interesting to note that SMIL remains in construction and advanced versions are awaited in the future.
The standards and systems presented above are only a representative sample of efforts on editing and presenting MMDs. Many other systems exist such as Madeus (Jourdan et al., 1998), MPGS (Bertino et al., 2000) and Cuypers (Celentano and Gaggi, 2002); each of them presents originalities according to the others.
The variety of multimedia approaches reflects the large number of requirements that have to be covered by a multimedia authoring system. But these needs are only partially fulfilled by existing applications. In our sense, the most correct way to deal with interactive multimedia document is to follow a clear methodology from the beginning of the edition phase to the end of the presentation one.
This study focuses on the document model and the system presentation. The verification and quality of service aspects are developed elsewhere.
O2DM: A MULTIMEDIA DOCUMENT MODEL
Based on the study of the existing modeling strengths and gaps, we propose an Object Oriented Document Model (O2DM) in order to specify scenarios and requested constraints in a multimedia document. The use of an object-oriented approach is justified by the associated technology contributions:
• | Power of the concepts like reusability; |
• | Facility of design, development and evolution of the systems |
• | Management of the user- machine interaction |
Moreover, the object-oriented technology allows extensibility and thus makes possible to define and use new data types. This property is very important with the continuous development in data processing.
A MMD document modeled with O2DM can be defined as a set of basic elements organized according to a hierarchical structure and ordered according to temporal and spatial constraints (Labed and Boufaida, 2005). The model offers a three level description including: the content description of multimedia objects, the components synchronization dealing with the spatial and temporal characteristics and the environment interactions.
Multimedia objects: The content structure describes the multimedia items to present in a document. This level is interested in the media types, their formats and the associated attributes.
Component types: Media components are the data that will be presented in documents. In order to provide a better expression of documents, a formatter (i.e., authoring system) must permit documents to include a rich variety of media types (Buchanan and Zellweger, 2005).
Currently, O2DM supports six basic types: text, image, audio, video, clip and the composed type. Whereas the first four types are widely known in literature, we define a clip as a parallel composition of a video sequence with an audio item to form an entity. This composition is carried out so that any deterioration of one component must be reproduced on the entire element. For example, in an advertising movie, a request to stop the video images implies automatically the stop of the associated audio band what is not automatically produced when speaking about audio and video sequences synthetically synchronized.
On the other hand, considering that a multimedia document can be handled as a component in other documents, we introduce the composed type. This media contains elements with external formats such as HTML and XML pages, scripts, programs or even documents created by multimedia editors. This type follows the idea, in well-known document editors, of imported objects and makes possible to enrich the MMD by the variety of the handled information.
Therefore, a MMD is considered as a composition of objects classified into two main classes (display and audio) according to the output device on which their playout is achieved.
The display objects class: The main class includes all media objects that are displayed on monitor devices. It is further specialized in five subclasses: O_Text including the set of text objects, O_Image for the images and graphs, O_Video for films and videos, O_Clip for the audio-visual sequences and O_Compose representing the external data.
The audio object class (O_Audio): Including media objects played on audio devices without any screen presentation.
Multimedia attribute: The specification of media properties is an important issue in MMDs. In fact, it characterizes the model capacity of expression and makes it possible to define the corresponding treatment to the media (capitalize a letter, degrade a color, etc.).
In O2DM, multimedia object attributes are self-explained and are global or specific (Labed and Boufaida, 2005). The global attributes are those associated to all the objects whatever are their type: The media name (or identifier), which must be unique in the document, its basic type and the source attribute, which indicates the place where the object is stored (it could be an URL if the media is retrieved on the network or an access path in the case of a database object). The attribute source makes possible to just refer a data without local storage. This O2DM property gives an additional power to the model and permits the appearance of a media item several times without any storage or redundancy problems. To these parameters, O2DM adds the author name and the creation date, which are facultative (could be in use when considering data search); and the data format and size, which are essential when dealing with the quality of the document.
Other attributes and properties are bound to media objects according to their basic type. They are the specific attributes associated to some types as the volume for the audio object. These attributes constitute the handling and access information of media data such as the object coding format (wave or midi for sound, jpeg or bmp for images, etc.) and the temporal and spatial attributes. They are also used to carry out effects at the presentation step (management of the volume in an audio/ video presentation, colors degradation, etc.) and manage the quality of service, which must be respected at the MMD presentation step.
In fact, a study of these properties implies that the attributes can be classified in two sets: temporal attributes and spatial ones.
Temporal attributes: There are three main attributes, which specify the temporal dimension of a multimedia object: durat, control and iterat.
Durat: Specifies the presentation duration of an object. Ideally, it is presented by three values: lower dl, preferable dp and higher dh. This division is organized in order to allow the management of uncertainties caused by the delays resources from which the component is retrieved. Since O2DM allows an object to be presented more than once and at different moments in a document, the duration is specified by a set of triplets such as Durat = {(dl1, dp1, dh1) (dln, dpn, dhn)} with n≥1.
Control: Specifies if the object duration is fixed or can be modified by the system (with respect to the range duration) in critics situations.
Iterat: Specifies the number of sequential presentation of an object in a presentation. So, if iterat is equal to zero, the object is presented just once. Otherwise, at the object end session it must be replayed from the beginning.
Spatial attributes: To each displayed object, the system creates a presentation window according to the user needs. The layout of an object in its associated window is specified with three attributes:
Sup-Corner (X, Y): which corresponds to the left-high corner of the window.
Width: Which represents the width of the object presentation window.
Height: which represents the height of the object presentation window.
Thus, a presentation window PS (Sup-Corner, Width, height) is specified for an object and it could be modified (reduced or enlarged) within the controlled limits at the execution time.
Table 1 summarizes the media attributes whereas the following code represents an attributes specification of a video retrieved from a hard disk to be presented for 1 min.
![]() |
Components of interest and moving objects: The relaxation strategies appear in the MPGS system (Bertino et al., 2000). They are used to propose new scenarios, when it is not possible to generate the MMD according to the user constraints. An inadequate situation could be presented in the case of a document requiring an hour to be entirely presented whereas the viewer has only 25 min. The suggested solution is to achieve a zoom over one (or several) item(s) by specifying the component(s) part(s) of interest that will be presented if the generation of the document becomes impossible following the first specification.
The specification of a component of interest depends on the object type. Indeed, in the case of audio or video objects, the temporal dimension is what can be considered and the object beginning and ending times are those, which must be specified again within respect to the temporal attributes of the original object. In the case of static objects (images, text, graphics), the specification is done on the spatial dimension by establishing a new presentation window.
Table 1: | O2DM component attributes |
![]() |
Its dimension (Sup_Corner, Height ) must be included in those of the window reserved to the original object.
O2DM allows one to specify an interesting object by an optional method named Int_obj. the parameters of this method change according to the media types:
Int_obj (tmin, tmax) is applied for dynamic objects; such as tmin and tmax are respectively the beginning and ending times of the interesting object;
Int_obj (dim_int) is applied for static items whereas the parameter dim_int includes the spatial attributes of the new considered object.
Another interesting document characteristics is the possibility to insert moving objects, which gives the MMD a new look. For instance, it will be more attractive that the text in a motion picture credits moves of bottom to the top, or that an image in an advertisement makes its appearance of a corner and move towards its final position.
For these purpose, O2DM provides a method: Displacement applied to media objects. The displacement is produced for an object in a space (window) and produced according to the spatial attributes PS (cf. the above paragraph), a moving speed MS and a shift ‘TD (hr, hl, vr, vl,) with horizontal moving angles (hr, hl) and vertical ones (vr, vl). Moreover, when the object reaches the terminal position, it can stop at this position or replay the move from the beginning according to the value, (respectively false and true) of a back attribute.
Components synchronization: This level concerns the conceptual structure of the model and allows one to specify the temporal and spatial ordering based on a set of relationships implemented in O2DM as methods.
Temporal composition: It was largely proved in ISO/IEC, (1997) Little and Ghafoor (1993) and Willrich et al. (2000) that the temporal models based on intervals (Allen, 1983) are more appropriate to the temporal specification of MMD. This assertion is justified by the great number of relations offered by these models and the high level abstraction of the relations descriptions. However, studies carried out on Allens relationships; prove that these associations are mainly qualitative. Indeed, the relation Before (Oi, Oj) specifies that the object Oi must be represented before the object Oj. This specification does not precise the waiting time that must be respected from the end of Oi (the end moment (End (Oi)) to the moment when the object Oj begins (Begin (Oj)). As far as the quantitative aspect is important in MMD, O2DM extends the temporal method by a new quantitative parameter Delay.
Table 2: | O2DM temporal relations |
![]() | |
Notes: A and B are two media objects to synchronize; t is the delay time between two objects; durat specifies an object presentation duration; tmin is the beginning time of an object; tmax is the ending time of an object; “ ⇒ ” forces the ending or beginning moments |
Table 2 summarizes the O2DM temporal relations which can be formulated by:
Re (C1, C2, /Delay) such as |
(I) Re ∈{Meets, Before, Starts, During, Overlaps, Finishes, equals}; the relations introduced by Allen (Allen, 1983); (ii) C1 and C2 are the objects in relationships according to the method Re and (iii) Delay is the local attribute associated just to the methods: Before, During and Overlaps. It represents (in seconds) the waiting time between the start of the second object according to the first one.
As an example, the specification: {Before (P, I, 5); Finishes (M, I); Before (P, M, 12)} represents the set of temporal methods necessary to the orchestration specification of three objects: a poem (P) followed by an image (I), which is presented in parallel with a background music (M).
However, even these relations are not sufficient to model all the needs. Indeed, there are some cases not yet covered as in the case of two components (O1, O2) to present and only knowing that one of the items will begin the execution of the other according to the user interaction. At the specification step, the use of the method Stars seems well adapted but which of the relations should be used: Starts (O1, O2) Or Starts (O2, O1). The same situation can occur with the methods Meets, Before and Finishes.
For these reason and to give the user more choice, we extend the model with other methods that allow the interaction specification and that we note One-Meets, One-Before, One-Starts and One-Finishes (Labed and Boufaida, 2005).
Spatial composition: Each multimedia object displayed on the monitor has its own position specified by the spatial attributes previously presented.
![]() | |
Fig. 1: | A spatial orchestration for the asterix example |
These parameters define the objects position and the size of their associated windows. Moreover, in order to compose interesting documents, O2DM offers three main methods to specify the positions of objects ones according to the others: Disjoint, Superimpose and Replace whereas, Disjoint is used when at the same time two objects are presented on the monitor having their associated windows one far from the other as in the example of texts to study. On the contrary, Superimpose (Replace) is used when the window of the first object will be partially (totally) covered by the second item window.
Indeed, in an image comparison it could be necessary to superpose concerned images to get the existing differences or similarities. The combination of these relations is judged sufficient to model all the possible spatial ordering orchestrations.
Figure 1 is composed of three parts: part (a): the Asterix and companions photo (i1), which replaces another (a hide from view item: i2); part (b) is a superposition of three objects: the famous villages residents (i3), a magnifying glass (i4) on an Asterix augmented image (i5); the third part is constituted by the two parts (a) and (b) placed side by side. The associated O2DM spatial composition could be {Disjoint (a, b); Replace (i1, i2); Superimpose (i5, i3); Superimpose (i4, i5)}.
Environment interactions: Let us consider a situation when a user enjoys a song. There are several actions which can be produced by the user as to go back to a specific part, stop, etc. In fact, at the run-time of a document, the user can interact with the presented product and modify the execution scenario. In O2DM, these interactions are implemented by methods invoked according to interaction types. The admitted control on the model components concerns the presentation and the environment levels.
Presentation control: The control concerns the interactions on presentation durations for dynamic media.
The implemented methods are Begin, Freeze, Stop, Forward and Rewind. These methods have a direct impact on both the clocks associated to the media objects and the general clock related to the whole document. As an example, in the case of song resumption, the clock value associated to the audio object must be decremented according to the rewind ratio with zero as the minimal threshold.
Environment control: Concerns interactions having an effect on the used resources such as moving objects on the screen (up, down, in the principal diagonal, etc.) or the sound management (increase/decrease the volume sound). The associated O2DM methods are mainly Mov_up, Mov_dow, Mov_left, Mov_rig, Mov_p_diag, Mov_s_dia, Mute, Vol_up and Vol_dow.
THE SYSTEM REFERENCE ARCHITECTURE
Our authoring and presentation system supports both the specification and the generation of interactive multimedia documents.
The system mainly consists of three components: The Editor component, the Executor module and the Validation component (Validator). Figure 2 gives an overview of the system where each module performs particular functions.
Editor component: The rapid and easy edition of documents is an essential goal of our proposition. Thus, the system offers a comfortable user interface that allows an incremental edition and facilitates the specification of media items (media composer) with their associated properties, synchronization relations and even user interactions during both the authoring and presentation steps and without programming.
![]() | |
Fig. 2: | An overview of the system architecture |
According to the user constraints, the editor provides (at the media container part) the classes associated to the specified media items and the relations provided according to the temporal and spatial compositions.
Let us assume that for her 25 years birthday, a travel agency decides to offer a very important discount on a weekend holiday for all the persons who have been married for 25 years. The advertisement author wishes to split the advertisement into three parts. The first part consecrated to the holiday, is subdivided to an introduction presented in a textual format T; two pictures I1 and I2 representing views of the castle where the couples will spend the holiday and finally a virtual visit recorded as a video sequence V1. The text and images will be presented for 75 sec directly followed by the video, which can be played for a minute. The second and third parts are respectively a clip C dedicated to the agency and a video V2 associated to an audio A representing a view of all the trips and holidays proposed by the company. The objects will be played sequentially each one for 3 min. Figure 3 shows a part of the video class associated to the virtual visit.
The author does not have to take care of technical details; he only has to concentrate his attention on the result he wants. During the authoring step, the properties checker ensures the integrity of media specification. So all the properties are checked continuously and each new value must be verified before admitted.
It is also important to note that the system integrates a module (Get_a_Look), which makes it possible to visualize a simulation of the presentation to get a quick idea about the result. Figure 4 represents a sketch of the example specification.
![]() | |
Fig. 3: | Video component class |
![]() | |
Fig. 4: | A view before execution |
Validation Component: This component performs the validation set of mapping rules . It permits the automatic generation of a formal temporal model over which special verification algorithms that are applied to assure the temporal consistency of the media scenarios. The verification of spatial constraints is not developed in this study.
The validator also includes an estimation module. This component allows cost estimations used by both the document author and/or the viewer to decide the execution of a document presentation or the negotiating of modifications according to the computed values (Labed and Boufaida, 2004). At this stage, the relaxation strategies (cf. preceding sections) are used to reduce document cost according to the user constraints.
Presentation component (executor): According to the verified specifications and synchronization schemes, media projectors are activated with different features for various media types and display techniques. The module renders the media data and manages user interactions such as mouse button clicks. The presentation component includes a schedule modifier activated when a presentation has to be changed. The modified solution is given back to the validator module to check once more the new constraints. This process of solution feedback and validation is repeated until obtaining a user-accepted scenario.
The module ensures the searching and retrieving of media items. It also controls the flow of objects from the local databases and/or the servers. For this reason, the schedule provides clocks to deal with inter/intra stream synchronization to guarantee a direct delivery of dynamic items (video, audio ) and therefore a WYSISWYG quality services.
DOCUMENT LIFE-CYCLE
Usually, multimedia authoring systems domain misses a development process and consequently do not guarantee the quality of produced documents.
![]() | |
Fig. 5: | Methodology steps |
To ensure a certain quality, we propose a five-step methodology for authoring and presenting multimedia documents (Fig. 5):
Step 1: | Document specification using the O2DM document model that allows unambiguous specification of all the document aspects including the content, the conceptual and the presentation structures. An automatic check over all the properties is achieved as the media content specification progresses. |
Step 2: | Formal temporal specification considering only the temporal constraints. An automatic mapping of the temporal relations into formal representations is achieved. The result is used for the next step. |
Step 3: | Verification of the specification by checking multimedia scenarios against temporal inconsistencies according to temporal representations. A formal model is required in this step. |
Step 4: | Estimation of the specification by automatic production of cost estimations concerning the time necessary to present the document as specified by the author (Labed and Boufaida, 2004). |
Step 5: | Presentation of the specification where the multimedia document components are played according to the verified constraints. |
CONCLUSION
Considering the users needs, we conceived an interactive tool for the specification and generation of interactive multimedia documents. The main features of our system are an object oriented document model and a simple methodology including verifications during all the authoring and execution steps. A simple manipulating interface is considered as an important component in the system architecture to ensure the authoring even for persons who are not programmers.
Future work includes the presentation of the verification model and the development of more powerful tools for checking the established specifications in order to avoid qualitative and quantitative inconsistencies.
REFERENCES
- Allen, J.F., 1983. Maintaining knowledge about temporal intervals. ACM Commun., 26: 832-843.
Direct Link - Bertino, E., E. Ferrari and M. Stolf, 2000. MPGS: An interactive tool for the specification and generation of multimedia presentations. IEEE Trans. Knowledge Data Eng., 12: 102-125.
Direct Link - Buchanan, M.C. and P.T. Zellweger, 2005. Automatic temporal layout mechanisms revisited. ACM Trans. Multimedia Comput. Commun. Appli., 1: 61-88.
Direct Link - Bulterman, D.C.A. and L. Hardman, 2005. Structured multimedia authoring. ACM Trans. Multimedia Comput. Commun. Appli., 1: 89-109.
Direct Link