Research Goals As criminal activity increasingly involves computers, networks and digital devices, the criminal justice systems faces many challenges.  The reconstruction of events in modern crimes often involves analysis of digital data in large e-mail collections, network traffic logs and other system logs. Automated tools for finding patterns and trends and linking together data entities are playing an increasingly significant role in investigations. The impact of these modern investigative techniques on privacy and civil liberties, basic notions of which were derived from experiences in a physical rather than digital world, raises many fundamental legal and ethical questions.  The widespread availability of sensitive personal and financial information has created unprecedented opportunities for fraud and threatens the economic security of both individuals and organizations.The following projects treat each of these concerns.

Clustering and Concept Linkage Tools for Analyzing Structured and Unstructured Forensic Data This project examines the use of clustering and concept linkage for the analysis of crime data. Clustering tools determine groups of entities (e.g., documents or data records) with similar properties. No prior categorizations are assumed for the groupings. Concept linkage tools establish relationships among entities. Crime analysts now use such tools to uncover criminal associations, determine key players in criminal networks and establish authorship of documents and e-mail. The goals of this project are to 1) ascertain the extent to which clustering and concept linkage tools are employed by law enforcement, 2) determine how effective they are on actual crime data, 3) establish their limitations due to computational and training requirements. In addition, an educational component of this project will develop course modules that introduce clustering, concept linkage and other knowledge discovery techniques to students in under graduate and graduate forensic and database courses. Some of the most effectiv e clustering and concept linkage tools require computationally expensive tasks such as computing shortest distances between nodes in a network or performing a decomposition of a matrix, particularly the singular value decomposition. This project will examine ways to optimize these tools for use in a cluster computing environment.

(NYS Graduate and Research Technology Initiative Project Clustering tools for use in criminal investigations, with Mark Frankel and Eman Abdu)

Privacy and Security In a 1967 special report in Communications of the ACM, the noted political scientist and privacy expert Alan Westin wrote "an individual's right to limit the circulation of personal information is a vital ingredient of his right to privacy." Today it is impossible for almost anyone to know who has his/her personal or financial information and how this information is being used. The Internet, the Web, extensive use of E-commerce, and the wide range of digital devices now in use make it easier and less costly than ever to collect and share personal information. Individual personal data is valuable so it's often collected for one purpose and then sold to third parties for other purposes. The past 15 years have seen the rise of the so-called data aggregation industry - an industry that collects personal data from various sources such as legal records and commercial transactions and then sells the data or analyses of the data to government agencies, marketing companies and even individuals. The use of wireless networks, portable high density storage media, e.g., inexpensive jump drives that hold gigabytes of information, and the outsourcing of critical business functions also make it difficult for organizations to secure sensitive information. Furthermore, knowledge discovery tools now make it possible to establish links and associations among entities in terabyte databases. Thus not only is sensitive personal data available, but business and social relationships, personal interests, and patterns of activity often are discernable.

Widespread availability of sensitive personal information has created unprecedented opportunities for fraud and other types of crime. In 2005, the Federal Trade Commission reported that fraud related to identity theft cost individuals and organizations over $53 billion. Highly organized cyber criminal organizations with access to exceptional technical expertise and computational capabilities present a significant threat to both individuals and organizations. The goals of this effort are the following: 1) Identify both technologies and practices that expose sensitive data to exploitation in criminal activities. 2) Determine how the use of knowledge discovery tools impacts privacy and civil rights. 3) Identify methods and practices that can protect individuals and organizations when sensitive information must be made available.

(NSF project, The Ethics of Identity Management, with John Klienig, Adina Shwartz, Brian O'Connell, Jamie Levy and Vincent Moldanado)

Networrk Security and Forensics - ForNet There is no end in sight to the exploits to which systems connected to a network are exposed. In fact, the risks only grow as networks carry an increasing variety of traffic, more end users are responsible for administration of ever more complex systems, and, most importantly, profit becomes the motivating factor in computer related crime. Unlike the telephone system where a billing function requires detailed records to be kept, limited recorded keeping and logging have been the norm in packet switched data networks. The high bandwidth of modern data networks makes them difficult and costly to monitor especially in real time. Two key goals of much network security research is (1) the establishment forensic capabilities that provide attribution and accountability, and (2) the rapid detection of anomalous traffic in order to thwart misuse and mitigate threats to users and infrastructure. Achieving these goals is difficult enough but at the same time network costs must be contained, privacy protected and network utility must not be impacted.

Working with researchers at Polytechnic University, we are helping implement ForNet, a distributed forensic system that employs hierarchical Bloom filters to reduce the footprint of stored network traffic. ForNet is a tool for logging distributed network traffic and provides a distributed query capability. In addition, ForNet provides privacy protection since it allows investigators to query data for items of interest without making available data unrelated to the query.

(NSF project ForNet: A Distributed Forensics System, with Nasir Memon, Joel Wein, Herve Bronniman, Kulesh Shanmugasundram, Miroslay Ponec of Polytechnic University and Adina Schwartz of the Law and Police Science Dept. at John Jay College)

Virtual Security Lab With the ever increasing threat to information systems and information infrastructures, there is a critical shortage of computer security and computer forensics experts, particularly in the law enforcement field. One impediment to the development of the needed educational programs is the cost and difficulty of maintaining lab environments to support courses in network forensics and security. Using Polytechnic University's virtual security lab as a model, we have built a virtual environment within John Jay College's Forensic Computing Lab. Here students do not deal with actual machines but instead use VMware instances of hosts and network devices to simulate networked computing environments. Students use the virtual lab facilites for work in operating systems, computer networking and to study the effects of worms, viruses and other forms of malware.

(NSF Project , with Bilal Khan and Joel Sandin of John Jay College)

National Incident Based Reporting System (NIBRS) The FBI's National Incident-Based Reporting System (NIBRS) is a crime reporting system that provides detailed information on criminal incidents reported to local, state and national law enforcement agencies. With NIBRS, crime analysts can get information on the offense(s) associated with an incident as well as the characteristics of the arrestee(s), offender(s), victim(s) and the types and value of stolen or recovered property. In this project we developed a relational database implementation of NIBRS as well as on-line analysis capabilities to make NIBRS data available to researchers in the social sciences. In addition, we have developed a NIBRS migration package that automates the addition of new releases of NIBRS data to the database.

(NASA CIPA and NYS Graduate and Research Technology Initiative Project, with Boris Bondarenko, Raul Cabrera, Eman Abdu, and Peter Shenkin)