CWN #3 - 2023 Week 12
GPT-4 is still here ! More details about it, a french initiative for training cybersecurity professionals, the RedCanary 2023 reports and more.
Highlight of the week
After last week's teaser, what is ChatGPT ? It is a Large Language Model (LLM) and not an AI ! Fundamentally, it’s always trying to produce a “reasonable continuation” of whatever text it has got so far, ie the most seen on the billions of texts used in the training. In LLMs, pieces of texts are called “tokens”, and this particular model is a Neural Network, trained with ~1B parameters. It specifically uses a Transformer architecture to better deal with text, especially by adding the concept of “attention”, ie the model has a kind of memory while generating new tokens, thus making “more sense”.
What’s new in GPT4 (only for “Plus“ subscribers for now):
- It takes images as inputs, can understand longer context and, of course, was trained with more parameters (from 175M to ~1B) !
- A more diverse and larger dataset enables better scores in a wide range of practice exams (legal, medical…).
- OpenAI also introduces the “system” message to steer the model’s answer style and tone (ex: ”You are a tutor that always responds in the Socratic style”), which is probably exploited in a few prompt jailbreaks you may have seen… (https://openai.com/research/gpt-4)
Potential usages for cybersecurity:
- Analyse command lines and emails to spot malicious ones (~66% detection rate https://www.signalblur.io/using-limacharlie-and-chatgpt-to-perform-malware-anomaly-detection/ & https://ai.sophos.com/2022/12/15/gpt-3-and-cybersecurity/ & https://github.com/sophos/gpt3-and-cybersecurity)
- Query the multitude of cybersecurity tools and consoles, or even security policies with natural language, simplifying the access to relevant information and its understanding to non cybersecurity practitioners
- Help to prioritize alerts (https://www.elastic.co/fr/security-labs/exploring-applications-of-chatgpt-to-improve-detection-response-and-understanding)
- It could help criminals in their activities to generate “cleaner” phishing templates, although it’s only a small part of their activities (they have more to do in maintaining their infrastructure up & running)
A few risks to keep in mind:
- Privacy risk, as prompts and responses are added to ChatGPT database along the way. It should not be used to manipulate sensitive data ! Training corporate users is crucial and blocking accesses to the service should be considered (if a strong DLP culture exists).
- It can (still!) “generate harmful advice, buggy code, or inaccurate information”. You should review the provided code, advice or text before using it as-is… Today, complex domain knowledges are not present within generic pre-training corpus.
- False statements and data can be injected into the model to steer its behavior. OpenAI has run an “adversarial testing program” against the model for 6 months to enhance its alignment. They assessed its robustness to incorrect statements and its refusal to answer prompts about high risk areas, and even though the model scores better in robustness compared to ChatGPT3, this risk is still real.
So what’s next?
- Other models are emerging in an attempt to break OpenAI today’s dominance, like Standford’s instruction following model called Alpaca (a model similar to gpt3.5 but trained for $ 600, with self instruct tasks generated from Text-davinci-003 then 52k instructions following examples from Meta’s LLaMa 7B) https://crfm.stanford.edu/2023/03/13/alpaca.html
- Models could be run in a P2P fashion to lower their costs to individuals (https://petals.ml/). Although the privacy could be way worse then using the current “big” models…
- We could see the emergence of local, dedicated and private models based on an existing LLM but with further training on domain specific models (like Codex or Copilot). These models could also be smaller and trained directly on domain-specific and curated data (like Galactica or PubMedGPT).
With the announcement of plugins in ChatGPT Plus to query data & 3rd party services, some limitations around data freshness and accuracy will most probably be quickly overcome. There is concern Threat Actors will leverage this plugin capabilities to insert incorrect statements into the model…(https://openai.com/blog/chatgpt-plugins)
In the future, with LLMs specialized in cybersecurity domains, possibilities will grow and answers will be more accurate.
Selection of the week
An initiative from AFNOR, Le Campus régional de Cybersécurité et de Confiance numérique (C3NA), Le Centre de Formation de l’ANSSI (CFSSI) and *Cybermalveillance.gouv.fr* has led to the creation of a repository of 50 skills domain, aligned with the NIST framework, to guide the training of cyber security staff.
Each domain has a short description of the expected activities and a skill level (junior, experienced, confirmed, expert). For example, “Backup: being able to define a backup plan according to the data criticality, confirmed”.
This should help the creation of practical training path (and certifications?) and increase the number of cybersecurity practitioners.
📰 https://www.cybermalveillance.gouv.fr/medias/2023/03/ReferentielCompetences.pdf
The data leak forum BreachForums has been taken down ! Less than a year after seizing RaidForums, the FBI has arrested its alleged administrator “Pompompurin”. Another admin nicknamed “Baphomet” quickly took over but aborted his plans after discovering accesses to Pompompurin’s machine. He announced he would keep it’s online presence and find a way to build something new from scratch.
📝 http://www.documentcloud.org/documents/23713130-pompourin-affidavit-govuscourts
An OpsSec failure from a threat actor allowed Zscaler ThreatLabz to deep dive into its TTPs. APT37 (also ScarCruft or Temp.Reaper, https://attack.mitre.org/groups/G0067/) is a North Korea based threat actor performing cyber espionnage campains and often distributes the Chinotto PowerShell-based backdoor.
Zscaler ThreatLabz uncovered a GitHub with resources used by APT37, detailed in their report. On top of the usual Macro-based MS office, LNK and CHM files, a new vector through Excel add-in (XLL) was discovered. A bunch of files IOC are shared and can be used for retro hunting purposes.
📝 https://www.zscaler.com/blogs/security-research/unintentional-leak-glimpse-attack-vectors-apt37
After reviewing 40k threats not blocked by their customers’s (other) tools in 2022, RedCanary analysed all their data. They warn it “skews heavily towards Initial Access, Privilege Escalation and Lateral Movement” due to the nature of their visibility into their customer’s environments.
The report is 111-page long but is well worth a read, especially with the links to AtomicRedTeams tests when they’re available. 3 highlights:
- Ransomware shifts, still with exfiltration and extortion but often without encryption;
- Phishing shifts (Microsoft having finally blocked VBA macros by default), as Threat Actors are using file formats (like compressed archived or container files) bypassing the Mark of the Web;
- CobaltStrike is still in the top 10 of the most used C2 frameworks, but Brute Ratel, Sliver and Mythic had an increase in popularity. If it’s not already the case, tailor your detections around behaviors and not tools.
📝 https://redcanary.com/threat-detection-report/
🧠 The full report https://resource.redcanary.com/rs/003-YRU-314/images/2023_ThreatDetectionReport_RedCanary.pdf
A deep dive into external non transitive trust in Active Directory by Charlie Clark: it’s not “non transitive”…
🧠 https://www.semperis.com/blog/ad-security-research-breaking-trust-transitivity/
Emotet is back again ! On the 7th of March, they resumed their campains with the usual stolen email threads, using padded Word documents but quickly switching to OneNote files.
📝 https://blog.talosintelligence.com/emotet-switches-to-onenote/
Parting thoughts
This post was not written by ChatGTP (maybe it should have been?). Have a good week.
Comments and feedbacks are welcome: contact@cyberwhatnow.com