CSPM SOTU 2022 RSA

I was looking forward to attending the latest RSA Conference in the first week of June 2022. Looking forward to meeting friends and learning about all the new ideas various vendors were showcasing on the exposition floor. This year is the first time in last few years we are back in person after the pandemic. Almost everyone is shaking hands and hugging and actiing like every

Campus Network Design

This is an example network drawing for a typical campus location with main data center hosting hundreds of rack mounted servers (LAN) connected to many buildings in with Long Range fiber optic cables (MAN).

Uplink to the WAN supports BGP with multiple links up to 100 Gbps.

The data center switches are low latency with 10Gbps for servers including VXLAN protocol support to work with VMware NSX SDN network overlays.

At the edge there are thousands of 2.5Gbps ports with Power Over Ethernet using 802.1x EAP-PEAP and EAP-TLS for Certificate based Network Access Control.

Enterprise RADIUS with Guest access is supported for the high speed Wireless Network using 802.11ax (Wi-Fi 6) APs.

Campus Network Design

Using a design from 2018 before Brocade was sold off and broken apart here is a parts list to consider. We are looking for similar solutions from major network suppliers today.

This drawing has the following sections:

  • External WAN connections with multiple 1,10, 40, and 100 gbps uplinks
  • 1 pair of MLXe-4 switches for Border Router cluster
  • 1 pair of PAN PA-7080 firewall HA cluster
  • 1 pair of MLXe-32 switches for Internal Router cluster
  • 1 pair of RUCKUS ICX 7850 switches for core distribution to each area using 100 GbE links
  • 10 pairs of VDX 6740 data center top of rack switches with 10GbE on all ports and multiple 40GbE uplinks
  • 1 pair of RUCKUS ICX 7850 aggregation switches to each using 100 GbE links
  • 10 stacks of RUCKUS ICX 7550 access switches for edge with POE and 2.5GbE on all ports. Stack has from 2 to 12 units.
  • Over 100 WiFi access Points would be needed with multiple gigabit connections.
  • All devices have redundant power supplies and fans with 40gbps QSFP interconnections.
  • Network setup, automation, and monitoring would be considered part of the solution.
  • Controller machines would be deployed with Cloud based or Virtual Machines minimizing dependencies on physical hardware deployment.

PAN FW example OSPF with Active/Active High Availability 

In this scenario, the firewalls are deployed in Active/Active HA. This design supports asymmetric traffic, traffic engineering, and consistent deterministic failover behavior. In testing, this design proved to be highly resilient and fast to recover. This design can tolerate the loss of any two network connections without degrading performance or availability. 

Following is a diagram of what will be implemented: 

From Palo Alto Networks OSPF guide

Set the link costs such that certain routes will be preferred over other routes. The link costs are specified to keep the traffic routing symmetric. This also simplifies troubleshooting, packet captures, and firewall log monitoring. 

Note: Floating IP addresses (“Virtual Address”) are typically used when the firewall is adjacent to end hosts. In this scenario, the firewall is directly connected to routers, so floating IP addresses are not used. 

Configure HA as Active/Active. For details on the meanings of the settings, refer to the following article on Active/Active HA in the Palo Alto Networks Knowledge base: https://live.paloaltonetworks.com/docs/DOC-1765 

Note: The path monitoring and link monitoring configurations are not shown below. Make sure that you configure those appropriately. Refer to the document above for help on configuring those settings. 

Dev Test Prod Multiple Environments for a more Secure SDLC

Introduction

A small team of developers writing software for web based application services might not appreciate the need for having their code published to multiple environments. It does take some extra tooling to setup and this effort might slow down the schedule adding a few days to get that new feature out to the production go live environment.

It’s business as usual for large enterprises with different teams responsible for code, operations, networking, security, etc to have several Dev, Test, Verify, and Prod environments for a more resilient and secure software development lifecycle (SDLC).

As an extreme example of what can go wrong due to a software mistake that makes it to production this is the story of how a company with nearly $400 million in assets went bankrupt in 45-minutes because of a failed deployment.

SDLC Security Environment Release Tool Mapping

SDLC Stages – Feature Promotion Process

Software goes through a number of stages on its way from development to production. Each organization will need to adopt these concepts to their own capabilities and requirements. This drawing shows an example of how features are promoted with code going through various stages of the SDLC from development, testing, verification, integration, and deployment before finally being released into the live production environment. At each stage it is helpful to stand up a running version of the application to facilitate the functional and security testing. Some applications have SLAs that require elaborate performance and load testing to be done in a verification environment that mirrors production. Kubernetes facilitates this as we can simply create an alternate namespace using ACLs for authentication and global load balancer rules to ensure only authorized testers have access.

Microsoft Example from 2010

This is not a new concept. Here’s an article Microsoft published 10 years ago:

At a high level, the application goes through these stages as part of the development and deployment process:

  1. A developer checks some code into Team Foundation Server (TFS).
  2. TFS builds the code and runs any unit tests associated with the team project.
  3. TFS deploys the solution to the test environment.
  4. The developer team verifies and validates the solution in the test environment.
  5. The staging environment administrator performs a “what if” deployment to the staging environment, to establish whether the deployment will cause any problems.
  6. The staging environment administrator performs a live deployment to the staging environment.
  7. The solution undergoes user acceptance testing in the staging environment.
  8. The web deployment packages are manually imported into the production environment.

These stages form part of a continuous development cycle.

Microsoft drawing SDLC 2010

Oracle example 2015

2.1.1 Definition of Development Environment

An Oracle development environment is typically an installation on a single host computer (such as a Microsoft Windows desktop or laptop computer or a Linux computer). The requirements for a development environment are very different from the requirements for a production environment.

There is no need for high availability in a development environment, and the number of components and products that can be installed is typically limited to those required by the software engineer or the application that the software engineer is developing.

2.2.1 Planning for a Production Environment

An Oracle production environment is an installation where the products have been configured to deploy production-ready applications and features to your application users.

Unlike a development environment, a production system typically takes advantage of more advanced features, such as server clusters and is deployed to multiple regions.

Gigaom Key Criteria Report on Vulnerability Management

  • https://gigaom.com/report/key-criteria-for-evaluating-vulnerability-management-tools
  • Vulnerability management tools scan your IT estate to help identify and mitigate security risks and weaknesses. These tools can facilitate the development of a more comprehensive vulnerability management program. Leveraging people, processes, and technologies, successful initiatives effectively identify, classify, prioritize, and remediate security threats.

    A security vulnerability is a weakness that can compromise the confidentiality, integrity, and availability (CIA) of information. Attackers are constantly looking to exploit defects in software code or insecure configurations. Vulnerabilities can exist anywhere in the software stack, from web applications and databases to infrastructure components such as load balancers, firewalls, machine and container images, operating systems, and libraries. This includes code used in the CI/CD pipeline as well as the infrastructure-as-code (IAC) that defines the compute, network, and storage infrastructure.

    Recent cybersecurity events have exposed widespread vulnerabilities involving the exploitation of zero-day malware and unknown weaknesses. Threat actors continually discover new exploitation tactics, techniques, and procedures (TTPs) to take advantage of weaknesses throughout integrated systems. Moreover, identifying breach paths is increasingly complicated due to the widespread adoption of ephemeral services.

    Vulnerability management solutions should provide end-to-end visibility of the protect-surface by aggregating both platform and application risks in a single pane of glass, while leveraging prioritized remediation based on business risk and threat context for efficiency. Containerized workloads deployed via DevOps pipelines have unique security requirements that demand a fully integrated vulnerability assessment to be automated into cloud platform services running containerized workloads.

    The path to a mature security posture starts with the ability to identify vulnerabilities in software code, third-party libraries, and at runtime. In addition, the cloud platform used to host your applications should be scanned for misconfigurations. This requires the use of policy configuration baselines, benchmarks, and compliance standards that apply to both the infrastructure and the code used to build it. As organizations implement security guardrails early in the software development lifecycle (SDLC), they can take advantage of cloud-native culture to ensure network and security tools are used throughout all phases of the SDLC.

    This GigaOm report explores the key criteria and emerging technologies that IT decision makers should evaluate when choosing a vulnerability management solution. The key criteria report, together with the GigaOm radar report that evaluates relevant products, provides a framework to help organizations assess the solutions currently available on the market and how these tools fit with their requirements.

    Links

    Here are some links to review for more info…

    Training Video Portal Concept

    How to setup a portal that allows prospects to browse and purchase videos with training material.

    1. Build a web page with a catalog of videos. This is a starting point. https://github.com/GoogleCloudPlatform/microservices-demo
    2. Allow guests to view short clips of the videos and a description with short text extract and reviews – short versions of the videos are available from MS Stream – https://www.microsoft.com/en-us/microsoft-365/microsoft-stream?rtc=1
    3. Guests select a training video to purchase – they add it to the cart
    4. Upon checkout they go to the stripe for payment processing: https://stripe.com/docs/payments/checkout/client#create-products
    5. Once the purchase is made and email with the link to view the full video is available from Microsoft Stream: https://www.microsoft.com/en-us/microsoft-365/microsoft-stream?rtc=1
    6. A log of all activity will be available for uploading to the remote logging server.

    Troubleshooting Technical Problems

    Have you ever done an Operating System update only to have something break like your wireless networking? Follow these steps to resolve the issue quickly and without a lot of drama. Let’s see how it works in practice since sometimes these things can get complicated. One problem might have a simple fix but sometimes you will discover another issue that makes the solution more complicated.

    Here’s an example following using Cisco’s eight-step troubleshooting method to fix your network:

    1. Define the problem.
    2. Gather detailed information.
    3. Consider probable cause for the failure.
    4. Devise a plan to solve the problem.
    5. Implement the plan.
    6. Observe the results of the implementation.
    7. Repeat the process if the plan does not resolve the problem.
    8. Document the changes made to solve the problem.

    For our example the wireless network wouldn’t get an IP address after the OS upgrade. To resolve this we could simply try to revert back to the previous build. Or we can see if the issue can be resolved correctly by updating the wireless drivers. Here’s the updated plan…

    Troubleshooting Wireless on Acer Laptop after Windows 10 upgrade

    1. Define the problem.
      – wireless network wouldn’t get an IP address after the OS upgrade
    2. Gather detailed information.
      – Windows 10 Insider Build 21292 rs prerelease 210108-1514
      – Laptop is Acer Aspire A715-71G
    3. Consider probable cause for the failure.
      – drivers need to be updated for new OS install
    4. Devise a plan to solve the problem.
      – download new drivers from Acer: https://www.acer.com/ac/en/US/content/support-product/7296?b=1&pn=NX.GP8SV.005
      – since our networking isn’t working we need to try Ethernet cable to download since wireless is down
      – install new drivers
    5. Implement the plan.
      – plug in ethernet cable to router
      – get an ip address
    6. Observe the results of the implementation.
      – during this process we got stuck as another problem was encountered. Now we need to decide to investigate and resolve this issue or revert back to previous build and not resolve these two issues directly
    7. Repeat the process if the plan does not resolve the problem.
      – now we have a new issue to work on and can follow this exact same process to see if we can fix the ethernet issue
    8. Document the changes made to solve the problem. – This should be step one. Start by copying this block of text and editing it for your specific use case…
      – open a support case
      – capture notes as you go along to have accurate records about exactly what steps were taken
      – since we’re using an insider build we agreed to share feedback about what issues we encountered and help make the next version better for others

    Troubleshooting Ethernet on Acer Laptop after Windows 10 upgrade

    1. Define the problem.
      – ethernet network wouldn’t get an IP address after the OS upgrade
    2. Gather detailed information.
      – Windows 10 Insider Build 21292 rs prerelease 210108-1514
      – Laptop is Acer Aspire A715-71G
    3. Consider probable cause for the failure.
      – drivers need to be updated for new OS install
    4. Devise a plan to solve the problem.
      – download new drivers from Acer: https://www.acer.com/ac/en/US/content/support-product/7296?b=1&pn=NX.GP8SV.005
      – since our wireless networking isn’t working and now we know the Ethernet isn’t working too we need to use another computer to download the drivers to see if that resolves this issue
      – once the drivers are downloaded on another computer we can copy them to a USB Flash Drive (UFD) and then use that UFD to move them to the device having the network issues
      – install new drivers
    5. Implement the plan.
      – update drivers
    6. Observe the results of the implementation.
      – get an ip address
    7. Repeat the process if the plan does not resolve the problem.
      – now that we solved this issue we can follow this same process to see if we can fix the wireless issue
    8. Document the changes made to solve the problem. – This should be step one. Start by copying this block of text and editing it for your specific use case…
      – capture notes as you go along to have accurate records about exactly what steps were taken
      – since we’re using an insider build we agreed to share feedback about what issues we encountered and help make the next version better for others
      – to prepare for future update problems with drivers it is suggested that we keep a copy of all drivers needed on the local hard drive or on a USB drive handy for this type of situation

    MITRE ATT@CK for the Global Security Operations Center GSOC CyberSec

    The ATT&CK framework from MITRE is focused on techniques used to compromise client operating systems such as Microsoft Windows, Linux, Apple’s Mac OS, and mobile os like Apple iOS and Google Android.

    Adversarial
    Tactics,
    Techniques,
    &
    Common
    Knowledge

    But as we’ve seen recently lateral attack from one of these client OS devices can be used against servers and cloud resources too such as stealing an OAUTH token allowing admin access for the SAML SSO solution and gaining access to pretty much any SaaS tool used at the organization.

    MITRE allows external contributors but this process needs to be enhanced to more easily allow vendors and subject matter experts to update content and provide feedback.

    If you work in or are building a SOC then this is for you. MITRE has a book published in 2014 by Carson Zimmerman. Download the PDF file here: Ten Strategies of a World-Class Cybersecurity Operations Center

    Table of Contents
    Executive Summary

    WISB WIRI and CMDB Oh My

    Do you really know what’s going on with your network assets?
    When you make a planned change how can you be sure it was successful?
    What when something breaks unexpectedly – maybe someone made an unplanned change or didn’t communicate this to the team?

    WISB and WIRI models can help detect changes by performing regular queries against your CMDB. What does this all mean? How does it work.

    What
    It
    Should
    Be

    What
    It
    Really
    Is

    Change
    Management
    Data
    Base

    It’s quite simple when you think about it. But there will be some work required to set it all up and fine tune the reporting. Getting access to the data and making sense of it is another matter. ITIL describes this process as DIKW. The idea is that you can’t make a Wise decision without the Knowledge that comes from Information derived from Data.


    I first learned about this topic which working with eBay on a huge project to assist with a technology modernization initiative in 2008. You can read a bit about how they use WISB and WIRI here:

    clean boot Windows 10 to determine problem cause

    How to determine what is causing an issue with Windows 10 after you do a clean boot


    When having a system performance or network issue with your Windows 10 PC you can follow these steps to determine the cause. Open these instructions on a mobile device or another computer so they are available while troubleshooting the PC with the problem.

    MSCONFIG System Configuration Screenshot Steps

    First of all perform a clean boot. After this is done proceed to the following steps.

    If the problem does not occur while the computer is in a clean boot environment, then you can determine which startup application or service is causing the problem by systematically turning them on or off and restarting the computer.  While turning on a single service or startup item and rebooting each time will eventually find the problematic service or application, the most efficient way to do this is to test half of them at a time, thus eliminating half of the items as the potential cause with each reboot of the computer.  You can then repeat this process until you’ve isolated the problem.  Here’s how:

    1. Sign in to the computer by using an account that has administrator rights. If you don’t have an administrator account, you can create one. Create a local user or administrator account in Windows 10
    2. For Windows 10, in the search box on the taskbar, type msconfig.  (In Windows 8 or 8.1, swipe in from the right edge of the screen, and then select Search. Or, if you’re using a mouse, point to the lower-right corner of the screen, and then select Search.  In the search box, type msconfig.)
    3. Select msconfig or System Configuration from the search results.
    4. Select Services, and then select Hide all Microsoft services.
    5. Select each of the check boxes in the upper half of the Service list.
    6. Select OK, and then select Restart.
    7. After the computer restarts, determine whether the problem still occurs.
      • If the problem still occurs, one of the checked items is the problematic service.  Repeat steps 1 through 6, but in Step 5, clear the lower half of the boxes in the Service list that you selected in your last test.  
      • If the problem doesn’t occur, the checked items are not the cause of the problem. Repeat steps 1 through 6, but in Step 5, turn on the upper half of the boxes that you cleared in the Service list in the last test. 
      • Repeat these steps until you’ve either isolated the problem to a single service, or until you’ve determined that none of the services are the cause of the problem.  If you experience the problem when only one service is selected in the Service list, go to step 10. If none of the services cause the problem, go to step 8.
    8. Select Startup, and then select the upper half of the check boxes in the Startup Item list.
    9. Select OK, and then select Restart.
      • If the problem still occurs, repeat step 8, but this time clear the lower half of the boxes in the Startup Item list that you selected in your last test.  
      • If the problem does not occur, repeat step 8, and turn on the upper half of the boxes that you cleared in the Startup Item list in the last test. 
      • If you still experience the problem after only one Startup Item is selected in the Startup Item list, this means that the selected Startup Item causes the problem, and you should go to step 10. If no Startup Item causes this problem, there might be a problem with a Microsoft service.  Repair the service, reset, or reinstall Windows
    10. After you determine the startup item or the service that causes the problem, contact the program manufacturer to determine whether the problem can be resolved. Or, run Windows with the problem item disabled.  To do this, run the System Configuration utility and enable your Services and Startup Items, but clear the check box for the problem item.


    CIS AWS Benchmark Drawings and Updates

    Center for Internet Security Amazon Web Services Cloud Benchmark Drawings and Updates

    Here is a copy of the drawing we used for the original AWS CIS Hardening Benchmark in 2016. Many things have changed since then and we are calling on the community to help us with a thorough review and update to this security guide. This was created with an open source SVG tool and will be available to participants of the CIS Community. Please join the effort here: https://workbench.cisecurity.org/community/18/discussions/5829

    AWS-CIS-3-tier-arch.svg