### List of technical reports of year 2004

Pisanti, Nadia and Crochemore, Maxime and Grossi, Roberto and Sagot, Marie-France
Motif inference is at the heart of several time-demanding computational tasks, such as in molecular biology, data mining and identification of structured motifs in sequences, and in data compression, to name a few. In this scenario, a motif is a pattern that appears repeated at least a certain number of times (the quorum), to be of interest. The pattern can be approximated in that some of its characters can be left unspecified (the don't cares). Motif inference is not aimed at searching a given pattern but, rather, at discovering all the possible patterns that appear as motifs in the given input string. The combinatorial explosion of these patterns makes their discover an exponential-time computation. For this, the notion of basis has been recently introduced to succinctly represent all of them within reasonable time and space bounds. The goal of the paper is to shed light on the state of the art for this emerging field and to add further properties to what is currently known. |

Ferrari, Gianluigi and Tuosto, Emilio
We introduce a process calculus that contains constructs to express and program resource negotiations in systems of mobile processes. The calculus proposed is an extension of Distributed $\pi$-calculus~ \cite{HR98} and, therefore, it provides process mobility in networks that may dynamically change their topology. Negotiation takes place between processes, asking credits to achieve their purposes, and locations, that release credits depending on their availability. We describe a type system for the calculus which guarantees that well-typed processes cannot fail because of misuse of credits over resources. The safety result is more involved due to process mobility: A migrating process should not access locations that do not guarantee safety. |

Montanari, Ugo and Tuosto, Emilio
\spi\ \cite{ag98} has been presented as a (formal) setting used to describe and analyze protocols. Here we will present a short description of the calculus, introduce a new LTS, proving its equivalence to the LTS defined by Abadi and Gordon, and give a representation of the \spi\ in the Edimburg Logical Framework introduced by Haper, Honsell and Plotkin. This Logical Framework provides a setting used to define formal systems based on a general treatment of syntax, rules and proofs by means of a typed $\lambda$-calculus with dependent types. Finally, we will prove that the LF-semantics and the LTS-semantics are equivalent. |

Attardi, Giuseppe and Esuli, Andrea and Simi, Maria
A number of applications require selecting targets for specific contents on the basis of criteria defined by the contents providers themselves. Performing this task requires inverting the direction of search, retrieving queries from documents rather than vice versa, as in ordinary information retrieval. We present a retrieval system called Best Bets that handles this case. Best Bets generalize Information Filtering and encompass a variety of applications including editorial suggestions, promotional campaigns and targeted advertising. In particular we show how to model Google AdWords™ as a Best Bets system. We discuss how to address performance issues in the implementation of Best Bets, in particular efficient search, incremental updates and dynamic ranking. We report about experiments on an implementation of our techniques and a comparison with previous approaches. |

Leoni, Gualtiero and Perugini, Angela
This paper presents and analyzes a join algorithm for cube-connected multiprocessors. The algorithm, called PB_Cube, was developed to exploit as much as possible the hypercube structural characteristics, enabling us to achieve synchronization between processors, maximum parallelism and a reduction of the traffic on the network. Moreover, it doesn't present any bucket overflow and buffer overflow problems. Experimental results indicate the algorithm has linear speedup and scale-up in a significant range. Furthermore its performance is comparable with that of Hybrid Hash [9] join and better than that of Parallel Distributive algorithm [4]. Our algorithm has the advantage of being able to handle non-equijoin. These results become even more interesting when we consider that the costs for Hybrid Hash and Parallel Distributive algorithms are given under hypothesis stronger than ours. |

Ferragina, Paolo and Gullì, Antonio
Recently there has been a surge of commercial interest about novel IR-tools, like Vivisimo or Groxis, that support the user of a search engine in his/her query formulation and query refinement. The basic idea is that the snippets returned by the search engine are grouped into clusters which are then organized in a hierarchy whose nodes are properly labeled via meaningful sentences. Each sentence must capture the "theme" of the snippets contained into the cluster it labels. This way the user is provided with a small, but intelligible, picture of the query answers at various levels of details. Despite this commercial interest, we found just four scientific papers on this topic. None of them achieved results comparable to Vivisimo, that actually represents the state-of-the-art. In the present paper we address this problem in its full generality: labels of variable length for denoting the clusters, labels drawn from the Web snippets as non contiguous sequences of terms, clusters possibly overlapping and organized within a hierarchy. We achieve this results by means of an algorithmic approach that exploits some innovative ideas, at least from the academic side! |

Micheli, Alessio and Sona, Diego and Sperduti, Alessandro
In this note we consider the Contextual Recursive Cascade Correlation model (CRCC), which is able to learn contextual mappings in structured domains. Specifically, we propose a formal characterization of the "context window", i.e., given a state variable, the "context window" is the set of state variables directly or indirectly contributing to its determination. On the basis of this definition, a formal expression describing the "context windows" for the CRCC is derived. The resulting expression is very compact and describes the context window as a function of the state variables in the model, and some topological properties of the structures in input to the model. Moreover, the expression elucidates how the shape of the context window is modified by the introduction of new state variables, i.e., hidden units. For comparison, a similar expression is also derived for the non-contextual Recursive Cascade Correlation model. |

Bigi, Giancarlo
The paper aims to analyse and compare two different approaches to optimality conditions for multiobjective optimization problems, which involve nonsmooth functions. Both necessary and sufficient first order conditions are presented for the case in which constraint is given just as a set. Finally, the optimality conditions based on these two approaches are compared. |

Aldinucci, Marco and Torquati, Massimo
We present HOC: a fast, scalable objects repository providing programmers with a general storage module. HOC may be used to implement DSMs as well as distributed cache subsystems. HOC is composed by a set of hot-pluggable cooperating processes that may sustain close to optimal network traffic rates. We designed an HOC-based Web cache that extends the Apache Web server and remarkably improves Apache farms performances with no modification to the Apache core code. |

Aldinucci, Marco and Coppola, Massimo and Danelutto, Marco and Vanneschi, Marco and Zoccolo, Corrado
ASSIST (A Software development System based upon Integrated Skeleton Technology) is a programming environment oriented to the development of parallel and distributed high-performance applications according to a unified approach. The language and implementation features of ASSIST are a result of our long-term research in parallel programming models and tools. ASSIST is evolving towards programming environments for high-performance complex enabling platforms, especially Grids. In this paper, we show how ASSIST can act as a valid research vehicle to study, experiment and realize Grid-aware programming environments for high-performance applications. Special emphasis is put on the innovative methodologies, strategies and tools for dynamically adaptive applications, that represent the necessary step for the success of Grid platforms. First we discuss the conceptual framework for Grid-aware programming environments, based upon structured parallel programming and components technology, anticipating how ASSIST possesses the essential features required by such framework. Then we summarize the ASSIST programming model, showing its evolution, along the line of structured parallel programming, to solve critical problems of expressive power, flexibility, interoperability and efficiency; some examples, both of kernels and of complex applications, are used to point out the ASSIST features. The modular compiler model and the current implementation for heterogeneous platforms and Globus-based Grids are illustrated. We show the features that allow ASSIST programs to be used in CORBA infrastructures, that represents our basic starting point towards interoperability in Grid applications. Finally, the presentation of all the previous issues is used to derive an ASSIST-based model for supporting dynamically adaptive applications. |

Baragatti, Alberto and Bruni, Roberto and Melgratti, Hernan and Montanari, Ugo
We present a prototype application for distributed agreements in multi-parties negotiations, where the number of participants is statically unknown, new participants can dynamically join ongoing negotiations and each participant knows only those parties with whom interacted. The application is based on asynchronous communication and it exploits the D2PC protocol for committing or aborting the negotiation. The software architecture consists of three modular layers, with the D2PC compiled in the bottom layer and completely transparent to final users. Our prototype is tailored to the running scenario of Ad-Hoc networks used by rescue teams for disaster recovery. |

Frangioni, Antonio
It is well-known that the Lagrangian dual of an Integer Linear Program (ILP) provides the same bound as a continuous relaxation involving the convex hull of all the optimal solutions of the Lagrangian relaxation. It is less often realized that this equivalence is \emph{effective}, in that basically all known algorithms for solving the Lagrangian dual either naturally compute an (approximate) optimal solution of the "convexified relaxation", or can be modified to do so. After recalling these results we elaborate on the importance of the availability of primal information produced by the Lagrangian dual within both exact and approximate approaches to the original (ILP), using three optimization problems with different structure to illustrate some of the main points. |

Luccio, Fabrizio and Mesa Enriquez, Antonio and Olivares Rieumont, P. and Pagli, Linda
A subtree P of a given tree T is a bottom-up subtree of T if, for each internal vertex u of P, all the children of u in T are also vertices of P. Our problem is deciding if a pattern tree P of m vertices is a bottom-up subtree of a text tree T of n vertices, m <= n, and, in the affirmative case, finding all the occurrences of P in T. If the vertices are labeled, the labels in the corresponding positions in P and T must match. We consider unordered rooted trees with labeled vertices, which are reconducted to simpler ordered trees through an ordering algorithm. Processing time depends on the label format. If the labels are single characters of a constant or of an n-integer alphabet \Sigma (respectively, |\Sigma| = constant, or \Sigma is composed of integers in the range [1 \div n]), the problem is solved in O(m +\log n) time and \Theta(m) additional space, after a preprocessing of T is done in \Theta(n) time and \Theta(n) additional space. The number of occurrences of P in T does not appear in the search time because all such occurrences can be directly reconstructed from a constant output information. For more complex labelsthe running times slightly increase. In particular if the labels are sequences of characters (e.g. as in XML files) the running time becomes a function of the total length of all the labels in T and P. Although directed to the "simple problem" of exact subtree matching, our work is the first one to solve it in linear time for unordered trees. Moreover our approach is the one of dictionary search. Regarding $T$ as a static text on which several queries are made, $P$ as the contents of one such a query, and assuming $m=o(n)$, the response time for each $P$ is sublinear in the size of the overall structure. |

Tulone, D.
We propose a distributed digital time--stamping service (TSS) based on absolute timestamps and resilient to Byzantine failures. In contrast with previous solutions based on linking schemes, our TSS provides \emph{high scalability}, \emph{better response time}, and \emph{fine granularity}. It works in asynchronous systems, |

Flocchini, Paola and Pagli, Linda and Prencipe, Giuseppe and Santoro, Nicola and Widmayer, P. and Zuva, T.
Recently great attention has been given to {\em point-of-failure swap rerouting}, an efficient technique for routing in presence of transient failures. According to this technique, a message follows the normal routing table information unless the next hop has failed; in this case, it is redirected towards a precomputed link, called {\em swap}; once this link has been crossed, normal routing is resumed. The amount of precomputed information required in addition to the routing table is rather small: a single link per each destination. Several efficient serial algorithms have been presented to compute this information; none of them can unfortunately be efficiently implemented in a distributed environment. In this paper we present protocols, based on a new strategy, that allow the efficient computation of all the optimal swap edges under several optimization criteria. |

Luccio, Fabrizio and Pagli, Linda
The restricted rotation distance d_{R}(S,T) between two trees S, T of n vertices is the minimum number of rotations by which S can be transformed into T, where rotations can only take place at the root of the tree, or at the right child of the root. A sharp upper bound d_{R}(S,T)\leq 4n-8 is known, based on the word metric of Thompson's group. We refine this bound to d_{R}(S,T)\leq 4n-8-\rho_{S} \rho_{T}, where \rho_{S} and \rho_{T} are the numbers of vertices in the rightmost vertex chains of the two trees, providing a very simple transformation algorithm based on elementary properties of trees. We also generalize the concept of restricted rotation by allowing rotations at the highest k levels of the tree, and study the new distance for k=2,3. |

Frangioni, Antonio and Gentile, Claudio
Interior Point algorithms for Min-Cost Flow problems have been shown to be competitive with more established "combinatorial" approaches, at least on some problem classes and for very large instances. A fundamental contribution to the efficiency of these approaches is provided by the specialized crossover routines that can be implemented for this problem; they allow early termination of the IP approach, sparing it with the "nasty" iterations where the core linear systems become very difficult to solve by iterative algorithms. Because crossover procedures are nothing but one step of a combinatorial approach to MCF, we propose to extend this concept to that of extended crossover: use the initial part of an Interior Point algorithm to MCF to warm-start a "combinatorial" algorithm. We report our experiments along this line using, as "combinatorial companion" , a primal-dual algorithm for MCF that has a natural way for exploiting the information provided by the IP approach. Our results show that the efficiency of the combined approach critically depends on the accurate selection of a set of parameters among very many possible ones, for which designing accurate guidelines appears not to be an easy task; however, they also show that the combined approach can be competitive with the two original approaches in isolation, at least on some "difficult" instances. |

Dobrev, Stefan and Flocchini, Paola and Prencipe, Giuseppe and Santoro, Nicola
Consider a networked environment, supporting mobile agents, where there is a black hole: a harmful host that disposes of visiting agents upon their arrival, leaving no observable trace of such a destruction. The black hole search problem is the one of assembling a team of asynchronous mobile agents, executing the same protocol and communicating by means of whiteboards, to successfully identify the location of the black hole; we are concerned with solutions that are generic (i.e., topology-independent). We establish tight bounds on the size of the team (i.e., the number of agents), and the cost (i.e., the number of moves) of a size-optimal solution protocol. These bounds depend on the a priori knowledge the agents have about the network, and on the consistency of the local labellings. In particular, we prove that: with topological ignorance $\Delta+1$ agents are needed and suffice, and the cost is $\Theta(n^2)$, where $\Delta$ is the maximal degree of a node and $n$ is the number of nodes in the network; with topological ignorance but in presence of sense of direction only $two$ agents suffice and the cost is $\Theta(n^2)$; and with complete topological knowledge only $two$ agents suffice and the cost is $\Theta(n \log n)$. All the upper-bound proofs are constructive. |

Bevilacqua, Roberto and Bozzo, Enrico and Del Corso, Gianna M.
We study transformations by unitary similarity of a nonderogatory matrix to certain rank structured matrices. This kind of transformations can be described in a unified framework that involves Krylov matrices. The rank structures here addressed are preserved by QR iterations, and every iterate can be associated with a suitable Krylov matrix. |

Ferragina, Paolo and Gullì, Antonio
Search engines provide the view of the Web, and their smart ranking algorithms are their point of view. To offer the best view, personalized ranking algorithms are currently flourishing. They focus on the users rather than on their submitted queries, by taking into account some contextual/profiled information. In this paper we propose a personalized (meta-)search engine based on the web-snippet hierarchical clustering technology (a la Vivisimo) that is fully adaptive and non intrusive both for the user and for the queried search engine(s). It works on the top of 16 commodity search engines and fetches 200 (or more) results from them per user query. Our engine is able to mine on-the-fly the fine and variegate ''themes'' behind these results and then organize them in a hierarchy of folders that offers, at various levels of details, an up-to-date picture of these results. Users can therefore browse the hierarchy, select the themes that best match the ``intention'' behind their query, and ask our engine to personalize on-the-fly those query results according to their choices. In this way lazy users are not limited to look at first ten results, but immediately acquire several points of view on a larger pool (about 200) of them! We claim that it does exist a mutual reinforcement relationship between ranking and web-snippet clustering from which both of them may benefit. Our extensive experiments show that this form of personalization is very effective in informative queries, polysemous queries, and poor queries consisting of at most two terms (more than 80% of the Web queries are of this type!). In these cases, in fact, one theme might be so web-popular to unfortunately monopolize the top-ten results of link-based ranking algorithms. |

Tuosto, Emilio and Vieira, Hugo Torres
Spatiality is an important aspect of distributed systems because their computations depend both on the dynamic behaviour and on the structure of their components. Spatial logics have been proposed as the formal device for expressing spatial properties of systems. We define CCS, a CCS-like calculus whose semantics allows one to observe spatial aspects of systems on the top of which we define models of the spatial logic. Our alternative definition of models is proved equivalent to the standard one. Furthermore, logical equivalence is characterised in terms of the bisimilarity of CCS. |

Felicioli, C.
BpMatch is a new algorithm whose function is to efficiently calculate, from sequences S and T, the maximum coverage of T using only subsequences and complemented reversed subsequences of S, with minimum length l, possibly overlapped, and, in such a maximum coverage, to minimize the number of subsequences used. The problem is solved by executing a preelaboration of S (independently from the sequence on which a maximum coverage will be later looked for, and therefore executed only once for any target sequence T), generating a graph which allows a fast recognizing of S's subsequences. Graphs G and G' must be generated from S and complemented reversed S, then, using G, G' and T, calculus of the maximum coverage can be computed in linear space and time. |

Bernazzani, L. and Duce, C. and Micheli, Alessio and Mollica, V. and Sperduti, Alessandro and Starita, Antonina and Tinè, M.R.
The recursive neural networks deal with prediction tasks for compounds represented in a structured domain. These approaches allow combining, in a learning system, the flexibility and general advantages of a neural network model with the representational power of a structured domain. As a result a completely new approach to QSPR/QSAR analysis is obtained through the adaptive processing of molecular graphs whose performance is even better than that of traditional approaches. In this paper a Recursive Cascade Correlation model (RecCC) has been applied to the analysis of the Gibbs free energies of solvation in water of 179 monofunctional open chain organic compounds. An original representation of the molecules in terms of labeled directed positional acyclic graphs has been developed by individuating a limited set of constituent atomic groups and representation rules. The descriptive and predictive abilities of the RecCC model have been tested and compared with those of a traditional group contributions method. The inherent ability of the RecCC model to abstract chemical knowledge through the adaptive learning process have been investigated by Principal Components Analysis of the internal representations developed by the neural network, finding that the model recognizes the chemical compounds on the basis of a not trivial combination of their chemical structure and target property. |

Bevilacqua, Roberto and Bozzo, Enrico and Menchi, Ornella
In this paper the problem of reconstructing a two-dimensional tomographic medical image is considered. The emission function is defined on a plane circular domain and has to be reconstructed from data collected by SPECT. Through the natural pixel discretization approach the solution is expressed as a linear combination of functions belonging to a suitable basis. We consider here four different bases, all of them giving a highly structured coefficient matrix. By the Fourier transform the linear system thus obtained can be solved efficiently. The computational cost and the performance of the bases are compared. The numerical experimentation shows that when the data are contaminated by Poissonian noise all the bases are substantially equivalent from the point of view of the reconstruction efficiency. |

**NOTE. **Starting January 2015 Technical reports can be inserted in the Open Access repository UnipiEprints.

Once published, they will be visible also in these pages. The old TR service is no longer maintained.

Data imported from UnipiEprints