Go to Home | Publications

My general research interest lies in the development of algorithms and techniques for effective and efficient operations of distributed network systems, with the multidimensional angles of performance, scalability, security, energy saving, and fault tolerance. The current and past research projects are listed as follows:

Network Traffic Analysis and Management

This research tackles critical challenges such as scalability and the multivariate property of the traffic data from the standpoint of network traffic analysis, which is an integral part of network monitoring. We define two components for intelligent traffic analysis: online analysis creating high-level summaries of traffic data through network state representation for real-time analysis, and in-depth analysis producing the associated indexing and annotation information of the traffic data through deep inspection in the batch processing manner.

While network traffic analysis is to provide the tools for network security, network traffic management plays a role to enable effective network forensic services. There would be several challenges, including the increase of the transmission rate, slow storage speed, data integrity and privacy, and so forth. Thus far, traffic archiving has been somewhat limited to a function of recording captured traffic data to storage with the associated indexes. My approach to this problem is with a systematic viewpoint to provide core primitives for network forensics, from traffic capturing, indexing, and recording to future accessing.

Related Articles:

  • Jinoh Kim, Alex Sim, Brian Tierney, Sang C. Suh, Ikkyun Kim, "Multivariate Network Traffic Analysis using Clustered Patterns," Computing Journal, Springer, April 2018
  • Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C. Suh, Ikkyun Kim, Kuinam J. Kim, "A Survey of Deep Learning-based Network Anomaly Detection," Cluster Computing Journal, Springer, pp. 1-13, 2017
  • Jinoh Kim and Alex Sim, "A New Approach to Online, Multivariate Network Traffic Analysis, The 2nd Workshop on Network Security Analytics and Automation (Workshop program for IEEE ICCCN 2017), Vancouver, Canada, July 31-August 3, 2017
  • Sunhee Baek, Donghwoon Kwon, Jinoh Kim, Sang C. Suh, Hyunjoo Kim, Ikkyun Kim, "Unsupervised Labeling for Supervised Anomaly Detection in Enterprise and Cloud Networks," The 4th IEEE International Conference on Cyber Security and Cloud Computing (IEEE CSCloud 2017) New York, NY, June 26-28, 2017
  • Jinoh Kim, Wucheol Yoo, Alex Sim, Sang Suh, Ikkyun Kim, "A Lightweight Network Anomaly Detection Technique," IEEE International Workshop on Computing, Networking and Communications (Workshop program for IEEE ICNC 2017), Silicon Valley, CA, January, 2017
  • Jinoh Kim, Alex Sim, Sang Suh, Ikkyun Kim, "An Approach to Online Network Monitoring Using Clustered Patterns," IEEE International Conference on Computing, Networking and Communications (ICNC), Silicon Valley, CA, January, 2017

Intelligent In-memory Caching in Clouds

Data caching is an integral part for improving data access performance. In a cloud computing environment, individual customers may have different performance requirements, and hence, the tenant-based caching decision is essential. This research addresses challenges to effectively support cloud-based caching services, including data access characterstics, cache performance prediction, tenant-optimized cache eviction, and resource sharing across cloud tenants.

This project is in an initial stage, but we made some interesting observations through the simulation conducted on Apache Ignite providing an in-memory computing platform. For hit ratio prediction in the multi-tenant model, this research examined several regression techniques, such as Support Vector Regresson (SVR), Gaussian Processes Regression (GRP), and deep learning fully connected networks. The initial result shows that using deep learning technique works very well, while the prediction by the other techniques is overall unacceptable.

Network Application Identification

Despite the increasing interest in application identification, the traditional approach based on transport layer port numbers has become less effective due to several reasons including the increasing use of random or non-standard port numbers and tunneling (e.g., HTTP tunnels). One approach to overcome this is to inspect application payload information. While highly accurate, it is limited and complicated for encrypted or obfuscated packets. Another common approach is to utilize flow statistics, such as flow size and duration, for classifying applications. Since it does not require to read packet contents, this approach has no limitation to plaintext flows, but it is known to be relatively less accurate. In this project, we developed a set of algorithms and methods to offer accurate identification of applications with greater flexibility.

Related Articles:

  • Anil Kumar, Jinoh Kim, Sang C. Suh, and Ganho Choi, "Incorporating Multiple Cluster Models for Network Traffic Classification,"The 40th IEEE Conference on Local Computer Networks (LCN 2015), Clearwater Beach, Florida, October 2015
  • Justin Tharp, Jinoh Kim, Sang C. Suh, Hyeonkoo Cho, "Maximizing True Positives for Signature-based Network Application Identification," SDPS 2014, Kuching, Malaysia, June 2014
  • Justin Tharp, Jinoh Kim, Sang C. Suh, Hyeonkoo Cho, "Reconciling Multiple Matches for the Signature-Based Application Identification," 3rd International Conference on Communication and Network Security (ICCNS), London, UK, November 2013; Also in Journal of Communications
  • Ilhwan Moon, Umar Albalawi, Jinoh Kim, Sang C. Suh, Wang-Hwan Lee, "A Hybrid Classifier with a Binning Method for Network Application Identification,"Journal of Integrated Design and Process Science, Vol. 18, No. 2, pp.3-22, September 2014
  • Ilhwan Moon, Umar Albalawi, Jinoh Kim, Sang C. Suh, Wang-Hwan Lee, "A Hybrid Classifier using Payload Encoding and Flow Statistics for Application Identification," SDPS 2013, Campinas, Brazil, October 2013

Big-data Computing for Security Event Analysis

One of the critical challenges for security analytics is the tremendous volume of complex datasets (i.e., big data), which subsequently leads to a set of technical challenges including computing infrastructures, storage systems, data representation and management, application interfaces (e.g., for query support), and so forth. Therefore, an adequate platform providing a large-scale computing infrastructure and a software stack is essential for the success of big-data security analytics. In this research, we aim to develop a framework that can efficiently support a broad spectrum of applications for security analytics. In this project, we examined the applicability of existing technologies used for big data computing (such as MapReduce and Hadoop). In addition, data representation and organization had also been studied to efficiently manage a large (and variety) set of security data in the platform.

Related Articles:

  • Jinoh Kim, Ilhwan Moon, Kyungil Lee, Sang C. Suh, and Ikkyun Kim, "Scalable Security Event Aggregation for Situation Analysis," IEEE BigDataService 2015, San Francisco, March, 2015
  • Jinoh Kim, Ilhwan Moon, Kyungil Lee, Sang C. Suh, and Ikkyun Kim, "Incremental Processing for Massive Security Event Aggregation," The 2015 Global Conference on Information Technology, Computing, and Applications, Las Vegas, January, 2015

Exa-Scale High-performance Computing

Performance is always of importance in computing systems. Particularly in scientific computing, the size of data can easily go up to tera- or peta-bytes, and as a result, storage I/O becomes a severe bottleneck of performance. While conventional approaches are more related to improving I/O performance, my approach is the side of reducing the frequency of disk accesses for better performance. For scientific applications, the data blocks are often repeatedly accessed for subsequent queries after created, and the data access is dominant to the response time. Since the size of index is much smaller than the data itself in most cases, maintaining index should be beneficial for individual and overall system performance. Extensive measurement studies with a Pixie3D application for MHD (magnetohydrodynamics), have been conducted in NERSC clusters for parallel, in-situ indexing based on FastBit, a bitmap-based indexing technology developed by Berkeley Lab. The implementation has been applied to the ADIOS middleware projected by Oak Ridge National Laboratory.

Related Articles:

  • Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive Computing," IEEE Symposium on Large-scale Data Analysis and Visualization (LDAV), October 2011

Energy Proportional Computing

Energy becomes getting more attentions than ever, and the IT community is of no exception regarding this. In 2006, datacenters consumed 1.5% of the total U.S. energy, and it is expected to be 3% of the total in 2011. One major contributor of the severe energy waste in a datacenter is idle power since it is often overprovisioned to handle peak load.Energy proportionality is a new design concept building computer systems that refers to the ability to consume energy in proportion to the given load intensity. At the Berkeley Lab, I worked on developing energy saving algorithms for storage and MapReduce cluster systems to provide energy-proportionality. My research interests in this topic include data layout and replication, load prediction, power optimization in heterogeneous settings, and performance and energy trade-offs.

Related Articles:

  • Jerry Chou, Ting-Hsuan Lai, Jinoh Kim, Doron Rotem, "Exploiting Replication for Energy-Aware Scheduling in Disk Storage Systems," IEEE Transactions on Parallel and Distributed Systems, Vol. 26, No. 10, pp. 2734--2749, 2015
  • Jinoh Kim and Doron Rotem, "FREP: Energy Proportionality for Disk Storage Using Replication," Journal of Parallel and Distributed Computing (JPDC), Vol. 72, Issue 8, pp. 960-974, August 2012
  • Jinoh Kim, Jerry Chou, and Doron Rotem, "Energy Proportionality and Performance in Data Parallel Computing Clusters," 23rd Scientific and Statistical Database Management Conference (SSDBM), July 2011
  • Jerry Chou, Jinoh Kim, and Doron Rotem, "Energy-aware Scheduling in Disk Storage Systems," 31st International Conference on Distributed Computing Systems (ICDCS), June 2011
  • Jinoh Kim and Doron Rotem, "Energy Proportionality for Disk Storage Using Replication," 14th International Conference on Extending Database Technology (EDBT), March 2011
  • Jinoh Kim and Doron Rotem, "Using Replication for Energy Conservation in RAID Systems," International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), July 2010

Data Dessemination for Distributed Computing

In large-scale distribured computing systems, data access can be a critical bottleneck due to node heterogeneity and end-to-end bandwidth scarcity. For efficient data dissemination, network performance estimation is an essential function. Existing estimation techniques are accurate but not very scalable for a large system. At the University of Minnesota, I developed a framework OPEN (Overlay Passive Estimaton of Network performance) that provides scalable network performance estimation, based on sharing of measurements between nodes without toplogical and geographical constraints. OPEN provides a node characterization function to reuse measurements from other nodes, and gossip-based dissemination algorithms for cost-effective measurement dissemination.

The framework has been evaluated with a variety of applications, including Montage in astronomy and BLAST in bioinformatics. It can be applied to various large-scale systems such as desktop grid systems, peer-to-peer computing systems, and volunteer-based computing systems like BOINC on which various @home projects are running includingSETI@home. I am interested in extending the OPEN framework for future cloud systems where multiple clouds exchange a significant amount of data one another, in order to minimize data cost in such a federated clouds environment.

Related Articles:

  • Ph.D. Dissertation: Data Dissemination for Distributed Computing, University of Minnesota, 2010.
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Passive Network Performance Estimation for Large-scale, Data-intensive Computing," IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 22 Issue 8, pp. 1365-1373, August 2011
  • Jinoh Kim, "Data Parallelism for Large-scale Distributed Computing," International Journal on Internet and Distributed Computing Systems (IJIDCS), Vol. 1, Issue 1, pp.1-11 June 2011
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Using Data Accessibility for Resource Selection in Large-scale Distributed Systems," IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 20, Issue 6, pp.788-801, June 2009
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Accessibility-based Resource Selection in Loosely-coupled Distributed Systems," 28th International Conference on Distributed Computing Systems (ICDCS), June 2008
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Exploiting Heterogeneity for Collective Data Downloading in Volunteer-based Networks," 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), May 2007

Go to Home | Publications