


Vol 45, No 7 (2019)
- Year: 2019
- Articles: 9
- URL: https://ogarev-online.ru/0361-7688/issue/view/10889
Article
Path Querying with Conjunctive Grammars by Matrix Multiplication
Abstract
Problems in many areas can be reduced to one of the formal-languages-constrained path problems. Conjunctive grammars are more expressive than context-free grammars and are used, for example, in static analysis to describe an interleaved matched-parentheses language, which is not context-free. Path querying with conjunctive grammars is known to be undecidable. There is an algorithm for path querying with linear conjunctive grammars which provides an over-approximation of the result, but there is no algorithm for arbitrary conjunctive grammars. We propose the first algorithm for path querying with arbitrary conjunctive grammars. The proposed algorithm is matrix-based and allows us to efficiently apply GPGPU (General-Purpose computing on Graphics Processing Units) computing techniques and other optimizations for matrix operations.



Dynamic Detection of Use-After-Free Bugs
Abstract
A novel method for detecting use-after-free bugs based on the program dynamic analysis is described. In memory unsafe programming languages, such as C or C++, this class of bugs mainly occurs when the program tries to access an area of dynamically allocated memory that has been already freed. For each program execution path, the method checks the correction of the allocation, deallocation, and access operations. Since the dynamic analysis is used, bugs can be found only in the parts of the code that was actually executed. The symbolic program execution with the help of SMT (Satisfiability Modulo Theories) solvers is used. This allows us to generate data the processing of which produces new execution paths.



Methods for Information Diffusion Analysis
Abstract
Information diffusion is a fundamental process taking place on the Internet. Every day, various pieces of information are published and further disseminated through news articles and user messages. Even though this process can be explicitly observed, it is difficult to determine individual information pathways. The penetration of the global information environment into all spheres of everyday life radically changes the rate and ways of information diffusion. In this review, we discuss models of online information flows while dividing them into two groups: explanatory models (which imply the existence of an underlying network over which information propagates) and predictive models (which are used to investigate the spread of individual information flows). Despite the complexity of analyzing basic characteristics of information diffusion, this analysis is necessary to understand general processes that occur in the information society.



Parallelization of Implementations of Purely Sequential Algorithms
Abstract
The work is dedicated to the parallelization of programs in especially difficult cases when the used algorithm is purely sequential, there are no parallel alternatives to this algorithm, and its execution time is unacceptably high. Various parallelization methods for software implementations of such algorithms and resulting computational load balancing are considered that make it possible to considerably accelerate the execution of application programs using purely sequential algorithms. The proposed methods are illustrated with examples of their application to two algorithms used in a dynamic binary code analysis toolset. The main goal of this paper is to show that the use of a purely sequential algorithm in a software implementation does not necessarily imply that its execution is inevitably sequential. The proposed methods of parallelizing implementations of such algorithms and balancing the resulting computational load can help develop efficient parallel programs that fully utilize the hardware capabilities of modern computers.



Automated Generation of Machine Instruction Decoders
Abstract
A method for the automated generation of machine instruction decoders for a wide range of processor architectures that uses the target architecture assembler is proposed. A software tool based on this method is implemented and tested on a number of microcontrollers, such as PIC16F877A, AVR, Tricore, and H8/300H.



On Effective Scheduling in Computing Clusters
Abstract
Presently, big companies such as Amazon, Google, Facebook, Microsoft, and Yahoo! own huge datacenters with thousands of nodes. These clusters are used simultaneously by many clients. The users submit jobs containing one or more tasks. The task flow is usually a mix of short, long, interactive, and batch tasks with different priorities. The cluster scheduler decides on which server the task should be run as a process, container, or virtual machine. Scheduler optimizations are important as they provide higher server utilization, lower latency, improved load balancing, and fault tolerance. Optimal task placement is a complex problem that has multiple dimensions and requires algorithmically complex optimizations. This increases placement latency and limits cluster scalability. In this paper, we consider different cluster scheduler architectures and optimization problems.



Exploratory Search for Scientific Articles
Abstract
It is intuitively clear that search for scientific publications often has many characteristics of exploratory search. The purpose of this paper is to formalize this intuitive understanding, explore which scientific search tasks can be classified as research search ones, what general approaches to the research search problem exist, and how they are implemented in specialized search engines for scientists. We overview the existing works that address the information-seeking behavior of scientists and a special variant of search called exploratory search. There are several types of search behavior typical for scientists; we show that most of them are exploratory ones. Exploratory search differs from information retrieval and requires special support from the search systems. We analyze seventeen search systems for academicians (from Google Scholar, Scopus, and Web of Science to ResearchGate) from the perspective of exploratory search support. We find that most of them do not go far beyond simple information retrieval, and there is room for further improvements, especially in collaborative search support.



Testing the Conformance of Implementations of the EAP Protocol and Its Methods to Internet Specifications
Abstract
Generation of tests for checking the conformance of implementations of the Extensible Authentication Protocol (EAP) and its methods to Internet specifications is described. The project is based on the UniTESK technology that allows one to automate the verification of network protocols using their formal models and the extension JavaTesK, which implements the UniTESK technology in Java. The additional use of mutation testing techniques makes it possible to test the stability of a protocol implementation to corrupt messages. This approach proved to be effective in finding a number of critical vulnerabilities and other deviations from the EAP in some implementations.



Next-Generation Intermediate Representations for Binary Code Analysis
Abstract
Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for solving these problems should meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications to describe target instruction sets. External specifications facilitate maintenance and allow one to quickly implement support for new instruction sets. We analyze some of the most popular instruction set architectures (ISAs), including those used in microcontrollers, and from that compile a list of requirements for the retargetable decoder. We then overview existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layer directed acyclic graph that describes the decoding process in universal terms. The second component of the analysis platform is the actual architecture-neutral IR. In this paper, we describe such IRs and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, also being easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, simpler alternative to phi-functions, and extensible elementary operation set at the core. This IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most of the binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and generic address space model, in which accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.


