In our article, we saved a bunch of time by going directly to the 2 that has the insiders.csv file for matching which datasets and individual extracted records of are value. Note just how large the sets are compared to how much we will use and reduce at the end of the data pre-processing. Generate the model and examine the accuracy, applicability, and identify additional modifications or tuning needed in any of part of the data pipelineĮxamining the Dataset Hands on and Manual Pre-ProcessingĮxamining the raw US-CERT data requires you to download compressed files that must be extracted.Next, after data pre-processing we'll need select, setup, and create the functions we will use to create the model and create the neural network layers itself.We will also have to mark which are insider threat and non-threat rows (true positives, and true negatives). We will need to ensure all the text strings are encoded into numbers so the engine we use can ingest it. We need to extract and process the dataset in such a way where it is structured with fields that we may need as 'features' which is just to be inclusive in the AI model we create.It's important for newcomers to any data science discipline to know that the majority of your time spent will be in data pre-processing and analyzing what you have which includes cleaning up the data, normalizing, extracting any additional meta insights, and then encoding the data so that it is ready for an AI solution to ingest it. Required: Install python packages: (numpy, pandas, tensorflow, sklearn via "pip install " from the command line.Required: Python environment, use the Python 3.8.3 圆4 bit release.Required: Rapidminer Studio Trial (or educational license if it applies to you).Optional: if you want a nice IDE for Python: Visual Studio 2019 Community Edition with the applicable Python extensions install.Choose: If you just want to follow along execute what I've done, you can download the pre-processed data, Python, and solution files from my Github (click repositories and find tensorflow-insiderthreat).Please plan to have several hundred gigs of free space Choose: To be hands on from scratch and experiment with your own variations of data: download the full dataset: : *Caution: it is very large.If you wish to follow along and perform these activities yourself, please download and install the following tools from their respective locations: Please do not use the models you create in this tutorial in a production environment without sufficient tuning and analysis before making them a part of your security program. The author provides these methods, insights, and recommendations *as is* and makes no claim of warranty. Perform basic analysis of your data, chosen fields for AI evaluation, and understand the practicality for your organization using the methods described.Use RapidMiner Studio and Tensorflow 2.0 + Keras to create and train a model using a pre-processed sample CSV dataset.Pre-process the data provided from US-CERT into an AI solution ready format (Tensorflow in particular).What many tutorials don't state is that if you're starting from scratch data pre-processing takes up to 90% of your time when doing projects like these.Īt the end of this hybrid article and tutorial, you should be able to: Stay with me and try not to fall asleep during the data pre-processing portion. Note: To use and replicate the pre-processed data and steps we use, prepare to spend 1-2 hours on this page. Throughout the article, I will also point out the applicability and return on investment depending on your existing Information Security program in the enterprise. We will ultimately create models that can be re-used for additional predictions based on security events. We will start our journey with the raw data provided by the dataset and provide examples of different pre-processing methods to get it "ready" for the AI solution to ingest. The methods and solutions are designed for non-domain experts particularly cyber security professionals. This technical article will teach you how to pre-process data, create your own neural networks, and train and evaluate models using the US-CERT's simulated insider threat dataset. Insider Threat Detection with AI Using Tensorflow and RapidMiner Studio
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |