Go to Home | Publications

My general research interest lies in the development of algorithms and techniques for effective and efficient operations of distributed network systems, with the multidimensional angles of performance, scalability, security, energy saving, and fault tolerance. The current and past research projects are listed as follows:

Systems/Network Telemetry and Analytics

The measurement, collection, and analysis of data are highly important to provide visibility into what is occurring at any time in the computing infrastructure. This project has developed various tools and methods to capture the potential anomalies and changes that can affect the performance and reliability of systems and services.

Selected Articles:

  • Jinoh Kim and Alex Sim, "A New Approach to Multivariate Network Traffic Analysis," Journal of Computer Science and Technology (JCST), Springer, Vol. 34, No. 2, pp. 388-402, March 2019
  • Jinoh Kim, Alex Sim, Brian Tierney, Sang C. Suh, Ikkyun Kim, "Multivariate Network Traffic Analysis using Clustered Patterns," Computing Journal, Springer, April 2018
  • Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C. Suh, Ikkyun Kim, Kuinam J. Kim, "A Survey of Deep Learning-based Network Anomaly Detection," Cluster Computing Journal, Springer, pp. 1-13, 2017
  • Jinoh Kim and Alex Sim, "A New Approach to Online, Multivariate Network Traffic Analysis, The 2nd Workshop on Network Security Analytics and Automation (Workshop program for IEEE ICCCN 2017), Vancouver, Canada, July 31-August 3, 2017
  • Sunhee Baek, Donghwoon Kwon, Jinoh Kim, Sang C. Suh, Hyunjoo Kim, Ikkyun Kim, "Unsupervised Labeling for Supervised Anomaly Detection in Enterprise and Cloud Networks," The 4th IEEE International Conference on Cyber Security and Cloud Computing (IEEE CSCloud 2017) New York, NY, June 26-28, 2017
  • Jinoh Kim, Wucheol Yoo, Alex Sim, Sang Suh, Ikkyun Kim, "A Lightweight Network Anomaly Detection Technique," IEEE International Workshop on Computing, Networking and Communications (Workshop program for IEEE ICNC 2017), Silicon Valley, CA, January, 2017
  • Jinoh Kim, Alex Sim, Sang Suh, Ikkyun Kim, "An Approach to Online Network Monitoring Using Clustered Patterns," IEEE International Conference on Computing, Networking and Communications (ICNC), Silicon Valley, CA, January, 2017

Intelligent In-memory Caching in a Cloud

Data caching is an integral part for improving data access performance. In a cloud computing environment, individual customers may have different performance requirements, and hence, the tenant-based caching decision is essential. This research addresses challenges to effectively support cloud-based caching services, including data access characterstics, cache performance prediction, tenant-optimized cache eviction, and resource sharing across cloud tenants.

This project is in an initial stage, but we made some interesting observations through the simulation conducted on Apache Ignite providing an in-memory computing platform. For hit ratio prediction in the multi-tenant model, this research examined several regression techniques, such as Support Vector Regresson (SVR), Gaussian Processes Regression (GRP), and deep learning fully connected networks. The initial result shows that using deep learning technique works very well, while the prediction by the other techniques is overall unacceptable.

Selected Articles:

  • Jinhwan Choi, Yu Gu, Jinoh Kim, "Learning-based dynamic cache management in a cloud," arXiv preprint arXiv:1902.00795 (2019)
  • Taejoon Kim, Yu Gu, Jinoh Kim, "A Hybrid Cache Architecture for Meeting Per-Tenant Performance Goals in a Private Cloud." arXiv preprint arXiv:1906.01260 (2019)

Network Application Identification

Despite the increasing interest in application identification, the traditional approach based on transport layer port numbers has become less effective due to several reasons including the increasing use of random or non-standard port numbers and tunneling (e.g., HTTP tunnels). One approach to overcome this is to inspect application payload information. While highly accurate, it is limited and complicated for encrypted or obfuscated packets. Another common approach is to utilize flow statistics, such as flow size and duration, for classifying applications. Since it does not require to read packet contents, this approach has no limitation to plaintext flows, but it is known to be relatively less accurate. In this project, we developed a set of algorithms and methods to offer accurate identification of applications with greater flexibility.

Selected Articles:

  • Justin Tharp, Sang C. Suh, Hyeonkoo Cho, Jinoh Kim, "Improving signature quality for network application identification," Digital Communications and Networks. In press, available online October, 2018
  • Anil Kumar, Jinoh Kim, Sang C. Suh, and Ganho Choi, "Incorporating Multiple Cluster Models for Network Traffic Classification," The 40th IEEE Conference on Local Computer Networks (LCN 2015), Clearwater Beach, Florida, October 2015
  • Justin Tharp, Jinoh Kim, Sang C. Suh, Hyeonkoo Cho, "Reconciling Multiple Matches for the Signature-Based Application Identification," 3rd International Conference on Communication and Network Security (ICCNS), London, UK, November 2013; Also in Journal of Communications
  • Ilhwan Moon, Umar Albalawi, Jinoh Kim, Sang C. Suh,Wang-Hwan Lee, "A Hybrid Classifier with a Binning Method for Network Application Identification," Journal of Integrated Design and Process Science (JIDPS), SDPS, Vol. 18, No. 2, pp. 3-22, 2014

Big-data Computing for Security Event Analysis

One of the critical challenges for security analytics is the tremendous volume of complex datasets (i.e., big data), which subsequently leads to a set of technical challenges including computing infrastructures, storage systems, data representation and management, application interfaces (e.g., for query support), and so forth. Therefore, an adequate platform providing a large-scale computing infrastructure and a software stack is essential for the success of big-data security analytics. In this research, we aim to develop a framework that can efficiently support a broad spectrum of applications for security analytics. In this project, we examined the applicability of existing technologies used for big data computing (such as MapReduce and Hadoop). In addition, data representation and organization had also been studied to efficiently manage a large (and variety) set of security data in the platform.

Selected Articles:

  • Jinoh Kim, Ilhwan Moon, Kyungil Lee, Sang C. Suh, and Ikkyun Kim, "Scalable Security Event Aggregation for Situation Analysis," IEEE BigDataService 2015, San Francisco, March, 2015

Exa-Scale High-performance Computing

Performance is always of importance in computing systems. Particularly in scientific computing, the size of data can easily go up to tera- or peta-bytes, and as a result, storage I/O becomes a severe bottleneck of performance. While conventional approaches are more related to improving I/O performance, my approach is the side of reducing the frequency of disk accesses for better performance. For scientific applications, the data blocks are often repeatedly accessed for subsequent queries after created, and the data access is dominant to the response time. Since the size of index is much smaller than the data itself in most cases, maintaining index should be beneficial for individual and overall system performance. Extensive measurement studies with a Pixie3D application for MHD (magnetohydrodynamics), have been conducted in NERSC clusters for parallel, in-situ indexing based on FastBit, a bitmap-based indexing technology developed by Berkeley Lab. The implementation has been applied to the ADIOS middleware projected by Oak Ridge National Laboratory.

Selected Articles:

  • Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive Computing," IEEE Symposium on Large-scale Data Analysis and Visualization (LDAV), October 2011

Energy Proportional Computing

Energy becomes getting more attentions than ever, and the IT community is of no exception regarding this. In 2006, datacenters consumed 1.5% of the total U.S. energy, and it is expected to be 3% of the total in 2011. One major contributor of the severe energy waste in a datacenter is idle power since it is often overprovisioned to handle peak load.Energy proportionality is a new design concept building computer systems that refers to the ability to consume energy in proportion to the given load intensity. At the Berkeley Lab, I worked on developing energy saving algorithms for storage and MapReduce cluster systems to provide energy-proportionality. My research interests in this topic include data layout and replication, load prediction, power optimization in heterogeneous settings, and performance and energy trade-offs.

Selected Articles:

  • Jerry Chou, Ting-Hsuan Lai, Jinoh Kim, Doron Rotem, "Exploiting Replication for Energy-Aware Scheduling in Disk Storage Systems," IEEE Transactions on Parallel and Distributed Systems, Vol. 26, No. 10, pp. 2734--2749, 2015
  • Jinoh Kim and Doron Rotem, "FREP: Energy Proportionality for Disk Storage Using Replication," Journal of Parallel and Distributed Computing (JPDC), Vol. 72, Issue 8, pp. 960-974, August 2012
  • Jinoh Kim, Jerry Chou, and Doron Rotem, "Energy Proportionality and Performance in Data Parallel Computing Clusters," 23rd Scientific and Statistical Database Management Conference (SSDBM), July 2011
  • Jerry Chou, Jinoh Kim, and Doron Rotem, "Energy-aware Scheduling in Disk Storage Systems," 31st International Conference on Distributed Computing Systems (ICDCS), June 2011
  • Jinoh Kim and Doron Rotem, "Energy Proportionality for Disk Storage Using Replication," 14th International Conference on Extending Database Technology (EDBT), March 2011

Data Dessemination for Distributed Computing

In large-scale distribured computing systems, data access can be a critical bottleneck due to node heterogeneity and end-to-end bandwidth scarcity. For efficient data dissemination, network performance estimation is an essential function. Existing estimation techniques are accurate but not very scalable for a large system. At the University of Minnesota, I developed a framework OPEN (Overlay Passive Estimaton of Network performance) that provides scalable network performance estimation, based on sharing of measurements between nodes without toplogical and geographical constraints. OPEN provides a node characterization function to reuse measurements from other nodes, and gossip-based dissemination algorithms for cost-effective measurement dissemination.

The framework has been evaluated with a variety of applications, including Montage in astronomy and BLAST in bioinformatics. It can be applied to various large-scale systems such as desktop grid systems, peer-to-peer computing systems, and volunteer-based computing systems like BOINC on which various @home projects are running includingSETI@home. I am interested in extending the OPEN framework for future cloud systems where multiple clouds exchange a significant amount of data one another, in order to minimize data cost in such a federated clouds environment.

Selected Articles:

  • Ph.D. Dissertation: Data Dissemination for Distributed Computing, University of Minnesota, 2010.
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Passive Network Performance Estimation for Large-scale, Data-intensive Computing," IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 22 Issue 8, pp. 1365-1373, August 2011
  • Jinoh Kim, "Data Parallelism for Large-scale Distributed Computing," International Journal on Internet and Distributed Computing Systems (IJIDCS), Vol. 1, Issue 1, pp.1-11 June 2011
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Using Data Accessibility for Resource Selection in Large-scale Distributed Systems," IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 20, Issue 6, pp.788-801, June 2009
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Accessibility-based Resource Selection in Loosely-coupled Distributed Systems," 28th International Conference on Distributed Computing Systems (ICDCS), June 2008
  • Jinoh Kim, Abhishek Chandra, and Jon Weissman, "Exploiting Heterogeneity for Collective Data Downloading in Volunteer-based Networks," 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), May 2007

Go to Home | Publications