Text
Data mining tools for malware detection
Table of contents:
ch. 1 Introduction
1.1.Trends
1.2.Data Mining and Security Technologies
1.3.Data Mining for Email Worm Detection
1.4.Data Mining for Malicious Code Detection
1.5.Data Mining for Detecting Remote Exploits
1.6.Data Mining for Botnet Detection
1.7.Stream Data Mining
1.8.Emerging Data Mining Tools for Cyber Security Applications
1.9.Organization of This Book
1.10.Next Steps
Introduction to Part I: Data Mining and Security
ch. 2 Data Mining Techniques
2.1.Introduction
2.2.Overview of Data Mining Tasks and Techniques
2.3.Artificial Neural Network
2.4.Support Vector Machines
2.5.Markov Model
2.6.Association Rule Mining (ARM)
2.7.Multi-Class Problem
2.7.1.One-vs-One
2.7.2.One-vs-All
2.8.Image Mining
2.8.1.Feature Selection
2.8.2.Automatic Image Annotation
2.8.3.Image Classification
2.9.Summary
References
ch. 3 Malware
3.1.Introduction
3.2.Viruses
3.3.Worms
3.4.Trojan Horses
3.5.Time and Logic Bombs
3.6.Botnet
3.7.Spyware
3.8.Summary
ch. 4 Data Mining For Security Applications
4.1.Introduction
4.2.Data Mining for Cyber Security
4.2.1.Overview
4.2.2.Cyber-Terrorism, Insider Threats, and External Attacks
4.2.3.Malicious Intrusions
4.2.4.Credit Card Fraud and Identity Theft
4.2.5.Attacks on Critical Infrastructures
4.2.6.Data Mining for Cyber Security
4.3.Current Research and Development
4.4.Summary
ch. 5 Design And Implementation Of Data Mining Tools
5.1.Introduction
5.2.Intrusion Detection
5.3.Web Page Surfing Prediction
5.4.Image Classification
5.5.Summary
Conclusion To Part I
Introduction To Part II
ch. 6 Email Worm Detection
6.1.Introduction
6.2.Architecture
6.3.Related Work
6.4.Overview of Our Approach
6.5.Summary
ch. 7 Design Of The Data Mining Tool
7.1.Introduction
7.2.Architecture
7.3.Feature Description
7.3.1.Per-Email Features
7.3.2.Per-Window Features
7.4.Feature Reduction Techniques
7.4.1.Dimension Reduction
7.4.2.Two-Phase Feature Selection (TPS)
7.4.2.1.Phase I
7.4.2.2.Phase II
7.5.Classification Techniques
7.6.Summary
ch. 8 Evaluation And Results
8.1.Introduction
8.2.Dataset
8.3.Experimental Setup
8.4.Results
8.4.1.Results from Unreduced Data
8.4.2.Results from PCA-Reduced Data
8.4.3.Results from Two-Phase Selection
8.5.Summary
Conclusion To Part II
Introduction to Part III
ch. 9 Malicious Executables
9.1.Introduction
9.2.Architecture
9.3.Related Work
9.4.Hybrid Feature Retrieval (HFR) Model
9.5.Summary
ch. 10 Design Of The Data Mining Tool
10.1.Introduction
10.2.Feature Extraction Using n-Gram Analysis
10.2.1.Binary n-Gram Feature
10.2.2.Feature Collection
10.2.3.Feature Selection
10.2.4.Assembly n-Gram Feature
10.2.5.DLL Function Call Feature
10.3.The Hybrid Feature Retrieval Model
10.3.1.Description of the Model
10.3.2.The Assembly Feature Retrieval (AFR) Algorithm
10.3.3.Feature Vector Computation and Classification
10.4.Summary
ch. 11 Evaluation And Results
11.1.Introduction
11.2.Experiments
11.3.Dataset
11.4.Experimental Setup
11.5.Results
11.5.1.Accuracy
11.5.1.1.Dataset1
11.5.1.2.Dataset2
11.5.1.3.Statistical Significance Test
11.5.1.4.DLL Call Feature
11.5.2.ROC Curves
11.5.3.False Positive and False Negative
11.5.4.Running Time
11.5.5.Training and Testing with Boosted J48
11.6.Example Run
11.7.Summary
Conclusion To Part III
Introduction to Part IV
ch. 12 Detecting Remote Exploits
12.1.Introduction
12.2.Architecture
12.3.Related Work
12.4.Overview of Our Approach
12.5.Summary
ch. 13 Design Of The Data Mining Tool
13.1.Introduction
13.2.DExtor Architecture
13.3.Disassembly
13.4.Feature Extraction
13.4.1.Useful Instruction Count (UIC)
13.4.2.Instruction Usage Frequencies (IUF)
13.4.3.Code vs. Data Length (CDL)
13.5.Combining Features and Compute Combined Feature Vector
13.6.Classification
13.7.Summary
ch. 14 Evaluation And Results
14.1.Introduction
14.2.Dataset
14.3.Experimental Setup
14.3.1.Parameter Settings
14.2.2.Baseline Techniques
14.4.Results
14.4.1.Running Time
14.5.Analysis
14.6.Robustness and Limitations
14.6.1.Robustness against Obfuscations
14.6.2.Limitations
14.7.Summary
Conclusion To Part IV
ch. 15 Detecting Botnets
Contents note continued: 15.1.Introduction
15.2.Botnet Architecture
15.3.Related Work
15.4.Our Approach
15.5.Summary
ch. 16 Design Of The Data Mining Tool
16.1.Introduction
16.2.Architecture
16.3.System Setup
16.4.Data Collection
16.5.Bot Command Categorization
16.6.Feature Extraction
16.6.1.Packet-Level Features
16.6.2.Flow-Level Features
16.7.Log File Correlation
16.8.Classification
16.9.Packet Filtering
16.10.Summary
ch. 17 Evaluation And Results
17.1.Introduction
17.1.1.Baseline Techniques
17.1.2.Classifiers
17.2.Performance on Different Datasets
17.3.Comparison with Other Techniques
17.4.Further Analysis
17.5.Summary
Conclusion To Part V
Introduction to Part VI
ch. 18 Stream Mining
18.1.Introduction
18.2.Architecture
18.3.Related Work
18.4.Our Approach
18.5.Overview of the Novel Class Detection Algorithm
18.6.Classifiers Used
18.7.Security Applications
18.8.Summary
ch. 19 Design Of The Data Mining Tool
19.1.Introduction
19.2.Definitions
19.3.Novel Class Detection
19.3.1.Saving the Inventory of Used Spaces during Training
19.3.1.1.Clustering
19.3.1.2.Storing the Cluster Summary Information
19.3.2.Outlier Detection and Filtering
19.3.2.1.Filtering
19.3.3.Detecting Novel Class
19.3.3.1.Computing the Set of Novel Class Instances
19.3.3.2.Speeding up the Computation
19.3.3.3.Time Complexity
19.3.3.4.Impact of Evolving Class Labels on Ensemble Classification
19.4.Security Applications
19.5.Summary
Reference
ch. 20 Evaluation And Results
20.1.Introduction
20.2.Datasets
20.2.1.Synthetic Data with Only Concept-Drift (SynC)
20.2.2.Synthetic Data with Concept-Drift and Novel Class (SynCN)
20.2.3.Real Data-KDD Cup 99 Network Intrusion Detection
20.2.4.Real Data-Forest Cover (UCI Repository)
Contents note continued: 20.3.Experimental Setup
20.3.1.Baseline Method
20.4.Performance Study
20.4.1.Evaluation Approach
20.4.2.Results
20.4.3.Running Time
20.5.Summary
Conclusion For Part VI
Introduction to Part VII
ch. 21 Data Mining For Active Defense
21.1.Introduction
21.2.Related Work
21.3.Architecture
21.4.A Data Mining-Based Malware Detection Model
21.4.1.Our Framework
21.4.2.Feature Extraction
21.4.2.1.Binary n-Gram Feature Extraction
21.4.2.2.Feature Selection
21.4.2.3.Feature Vector Computation
21.4.3.Training
21.4.4.Testing
21.5.Model-Reversing Obfuscations
21.5.1.Path Selection
21.5.2.Feature Insertion
21.5.3.Feature Removal
21.6.Experiments
21.7.Summary
ch. 22 Data Mining For Insider Threat Detection
22.1.Introduction
22.2.The Challenges, Related Work, and Our Approach
22.3.Data Mining for Insider Threat Detection
Contents note continued: 22.3.1.Our Solution Architecture
22.3.2.Feature Extraction and Compact Representation
22.3.3.RDF Repository Architecture
22.3.4.Data Storage
22.3.4.1.File Organization
22.3.4.2.Predicate Split (PS)
22.3.4.3.Predicate Object Split (POS)
22.3.5.Answering Queries Using Hadoop MapReduce
22.3.6.Data Mining Applications
22.4.Comprehensive Framework
22.5.Summary
ch. 23 Dependable Real-Time Data Mining
23.1.Introduction
23.2.Issues in Real-Time Data Mining
23.3.Real-Time Data Mining Techniques
23.4.Parallel, Distributed, Real Time Data Mining
23.5.Dependable Data Mining
23.6.Mining Data Streams
23.7.Summary
ch. 24 Firewall Policy Analysis
24.1.Introduction
24.2.Related Work
24.3.Firewall Concepts
24.3.1.Representation of Rules
24.3.2.Relationship between Two Rules
24.3.3.Possible Anomalies between Two Rules
24.4.Anomaly Resolution Algorithms
Contents note continued: 24.4.1.Algorithms for Finding and Resolving Anomalies
24.4.1.1.Illustrative Example
24.4.2.Algorithms for Merging Rules
24.4.2.1.Illustrative Example of the Merge Algorithm
24.5.Summary
Conclusion To Part VII
ch. 25 Summary And Directions
25.1.Introduction
25.2.Summary of This Book
25.3.Directions for Data Mining Tools for Malware Detection
25.4.Where Do We Go from Here?
A.1.Introduction
A.2.Developments in Database Systems
A.3.Status, Vision, and Issues
A.4.Data Management Systems Framework
A.5.Building Information Systems from the Framework
A.6.Relationship between the Texts
A.7.Summary
B.1.Introduction
B.2.Secure Systems
B.2.1.Introduction
B.2.2.Access Control and Other Security Concepts
B.2.3.Types of Secure Systems
B.2.4.Secure Operating Systems
B.2.5.Secure Database Systems
B.2.6.Secure Networks
B.2.7.Emerging Trends
B.2.8.Impact of the Web
B.2.9.Steps to Building Secure Systems
B.3.Web Security
B.4.Building Trusted Systems from Untrusted Components
B.5.Dependable Systems
B.5.1.Introduction
B.5.2.Trust Management
B.5.3.Digital Rights Management
B.5.4.Privacy
B.5.5.Integrity, Data Quality, and High Assurance
B.6.Other Security Concerns
B.6.1.Risk Analysis
B.6.2.Biometrics, Forensics, and Other Solutions
B.7.Summary
C.1.Introduction
C.2.Secure Data Management
C.2.1.Introduction
C.2.2.Database Management
C.2.2.1.Data Model
C.2.2.2.Functions
C.2.2.3.Data Distribution
C.2.3.Heterogeneous Data Integration
C.2.4.Data Warehousing and Data Mining
C.2.5.Web Data Management
C.2.6.Security Imp act
C.3.Secure Information Management
C.3.1.Introduction
C.3.2.Information Retrieval
C.3.3.Multimedia Information Management
C.3.4.Collaboration and Data Management
Contents note continued: C.3.5.Digital Libraries
C.3.6.E-Business
C.3.7.Security Impact
C.4.Secure Knowledge Management
C.4.1.Knowledge Management
0.4.2.Security Impact
C.5.Summary
D.1.Introduction
D.2.Layered Technology Stack
D.3.XML
D.3.1.XML Statement and Elements
D.3.2.XML Attributes
D.3.3.XML DTDs
D.3.4.XML Schemas
D.3.5.XML Namespaces
D.3.6.XML Federations/​Distribution
D.3.7.XML-QL, XQuery, XPath, XSLT
D.4.RDF
D.4.1.RDF Basics
D.4.2.RDF Container Model
D.4.3.RDF Specification
D.4.4.RDF Schemas
D.4.5.RDF Axiomatic Semantics
D.4.6.RDF Inferencing
D.4.7.RDF Query
D.4.8.SPARQL
D.5.Ontologics
D.6.Web Rules and SWRL
D.6.1.Web Rules
D.6.2.SWRL
D.7.Semantic Web Services
D.8.Summary
No other version available