GitHub - CERTCC/Vulnerability-Data-Archive: With the hope that someone finds the data useful, we used to periodically publish an archive of almost all of the non-sensitive vulnerability information in our vulnerability reports database. See also https://github.com/CERTCC/Vulnerability-Data-Archive-Tools
Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

With the hope that someone finds the data useful, we used to periodically publish an archive of almost all of the non-sensitive vulnerability information in our vulnerability reports database. See also https://github.com/CERTCC/Vulnerability-Data-Archive-Tools

License

Notifications You must be signed in to change notification settings

CERTCC/Vulnerability-Data-Archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 

Repository files navigation

CERT Coordination Center Vulnerability Data Archive

Important

This data archive has been supplanted by the VINCE API and is no longer being maintained. We are leaving the repository here for archive purposes. Details about vulnerabilities coordinated by CERT/CC since June 2020 are available via VINCE.

Release 2020-06-03

Change Log

2020-06-03 updated data

2019-05-14 Updated data

2018-11-06 Updated data

2018-01-30 Updated data

2017-11-08 Updated data. Sorted JSON keys so future updates should diff more cleanly in git commit logs.

2017-03-30 Updated data. Some more CVSS scores backfilled.

2016-11-03 Updated data. Backfilled "Not Defined" values for many vulnerabilities' CVSS scores.

2016-06-24 Changed search URL, sponsorship, added GitHub migration, no change to data

2016-03-30 Updated data

2014-12-08 Updated data, note significant number of nearly identical reports associated with automated Android SSL testing

2014-05-22 Updated data

2013-11-20 Updated data

2013-04-03 Updated data, added field definition for [DateCreated], fixed 8-bit characters in section 8

2012-07-10 Initial release

Description

This data archive contains nearly all of the non-sensitive vulnerability data collected by CERT, from the inception of the vulnerability notes database (approximately May 1998) to the date the archive was prepared, as noted above in the Change Log.

Since roughly 2004, the United States Department of Homeland Security (DHS) United States Computer Emergency Readiness Team (US-CERT) and other government sponsors have funded the vulnerability analysis and coordination work that includes this vulnerability data and the publication of Vulnerability Notes.

This data is incomplete. All records (reports) should have an ID, title, and creation date. Only some (~6%) of the reports have been analyzed, coordinated, written up, and published as Vulnerability Notes.

Most of the reports are in a preliminary state, with blank or default field values. Few fields are consistently entered across the entire data set. It is generally inappropriate from an analysis perspective to draw conclusions from incomplete and inconsistent data. You have been warned.

There are two sets of data, vulnerability reports and vendor records. A published Vulnerability Note is made up of one vulnerability report and one or more vendor records.

In this document, field names are enclosed in square brackets, like this: [FieldName].

Vulnerability reports describe information about a reported vulnerability. A report may contain 0 or more vulnerabilities. CERT typically attempts to have one vulnerability report per vulnerability, but this isn't always a practical level of abstraction. [VulnerabilityCount] is the number of vulnerabilities (per CERT's definition) in a vulnerability report.

Vulnerability reports have 0 or more associated vendor records. Vendor records describe vendor information related to a vulnerability report. A record is typically created when CERT notifies a vendor about a vulnerability, reasonably believes the vendor may be affected, becomes aware of information about the vendor related to the vulnerability, or otherwise feels that there is relevant information about the vendor related to the vulnerability.

Tools for working with the data

See https://github.com/CERTCC-Vulnerability-Analysis/Vulnerability-Data-Archive-Tools for some python tools to get you started with using this data.

Tell Us About It

Did you find something interesting in the data? Did you come up with some cool way of slicing it or remixing it and you want to share? You can tweet us @certcc.

If you find a problem with the data or the tools, please create an issue report in the appropriate repository. Please be aware though that we offer no formal support, however we may respond to questions and feedback submitted as issues in this project.

Data Format

The data in this repository has been transformed from the data found at http://www.cert.org/download/vul_data_archive/. That data set contains archives raw exports of the CERT Vulnerability Notes database. The Vulnerability Notes Database is a Lotus Notes application, and the raw JSON and XML exports in the original archive can be difficult to work with. In this repository we've converted the JSON data to more conventional key-value pairs to make it easier to use.

The directory structure is as follows:

  • ./data/ contains the entire data set.
  • Below that, ./data/0/ through ./data/99/ contain subdirectories for individual vulnerability reports. A vulnerability report can be found by taking the VU#NNNNN ID modulo 100. (I.e., the last two digits of the VU#, so VU#123456 would be in ./data/56/)
  • Individual vulnerability report directories ./data/56/vu_123456/ should contain exactly one vu_*.json file and 0 or more vendor_*.json files.
  • vu_*.json files contain data as described in Field Definitions for Vulnerability Reports
  • vendor_*.json files contain data as described in Field Definitions for Vendor Records

In the Lotus Notes application from which this data was exported, s\Some fields are initially created as text and the field type is updated only once data has been entered. For example, a datetime field that is blank may appear as a text field.

Field Definitions for Vulnerability Reports

Descriptions for fields included in published Vulnerability Notes are also available here.

Some of the fields in this archive are not included in published Vulnerability Notes.

Field Name Data Type Description
[ID] text Vulnerability report ID, the format is VU#nnnnnn (six digits), with some older records having fewer than six digits. This is the associated vulnerability report for the vendor record.
[IDNumber] text The numerical portion of the ID (nnnnnn, six or sometimes fewer digits).
[Title] text Title of the vulnerability report. May be HTML-escaped.
[Keywords] textlist List of keywords. These become HTML meta keywords in a published Vulnerability Note.
[Overview] text Overview/summary of the vulnerability report.
[Description] text A detailed description of the vulnerability report.
[Impact] text The impact of the vulnerability.
[Resolution] text A solution to the vulnerability, typically a complete solution, such as a patch or update.
[Workarounds] text Workarounds or mitigations for the vulnerability, typically something less than a complete solution, but still effective at mitigating the vulnerability.
[SystemsAffectedPreamble] text A preamble to the list of vendors.
[ThanksAndCredit] text Acknowledgement, credit, and possibly thanks to the person(s) or organization(s) who discovered or reported the vulnerability or who contributed to the coordination or analysis effort or information used in the Vulnerability Note.
[Author] text This field is set to the name of the analyst who is first assigned the vulnerability report. When a report is published as a Vulnerability Note, this field is the author.
[References] text or textlist URLs to reference information about the vulnerability report.
[CVEIDs] text or textlist List of one or more related CVE IDs.
[CERTAdvisory] text or textlist References to one or more CERT Advisories.
[US-CERTTechnicalAlert] text or textlist References to one or more US-CERT Technical Alerts.
[VulnerabilityCount] number The number of unique vulnerabilities in a vulnerability report.
[DateCreated] datetime Date the vulnerability report was created. This closely corresponds to the date CERT was first aware of the vulnerability.
[DatePublic] datetime Date the vulnerability is known to be public. This may be the date a Vulnerability Note is published. This field may only contain date information, not time.
[DateFirstPublished] text or datetime Date the Vulnerability Note was first published. This field is a text type if it is blank, datetime when it is populated.
[DateLastUpdated] datetime Date the vulnerabilty report was last updated.
[Revision] number Number of times the vulnerability report was revised, with "1" being the initial creation of the report.
Note on fields beginning with VRDA_D1_ text (integers) "VRDA_D1_" in the field name indicates that the field is used in the first round of decision support (triage, surface analysis, or D1) to determine how to handle the vulnerability report. Although the "VRDA_D1" component fields are text type, they can be safely treated as numbers (integers). "VRDA_D1" fields were added to the database in 2007.
More information about VRDA (Vulnerability Response Decision Assistance) is available here: http://search.cert.org/search?q=vrda
[VRDA_D1_DirectReport] text If this field is set to "1" then the vulnerability was directly reported to or found by CERT. If this field is "0" then the vulnerability was not a direct report or internal discovery. If this field is blank then the report may or may not have been a direct report or internal discovery.
[VRDA_D1_Population] text This field answers the question "What is the population of affected systems?" [VRDA_D1_Population] maps to [CVSS_TargetDistribution].
"1" - Low
"2" - Low-Medium
"3" - Medium-High
"4" - High (e.g., Microsoft Windows, Adobe Flash, TCP, DNS, core UNIX/Linux)
[VRDA_D1_Impact] text This field answers the question "What is the impact of the vulnerability?"
"1" - Low (e.g., nuisance DoS/resource consumption, limited information disclosure)
"2" - Low-Medium
"3" - Medium-High
"4" - High (e.g., execute arbitrary code with elevated privileges, take control of target)
Notes on fields beginning with CAM_ text (integer) "CAM_" in the field name stands for "CERT Advisory Metric." More information is available here: http://www.kb.cert.org/vuls/html/fieldhelp#metric
If all of the "CAM_" fields in a vulnerability report are "0" then it is most likely that the vulnerability report has not been analyzed beyond initial creation and surface analysis. Although the "CAM" component fields are text type, they can be safely treated as numbers (integers from 0-20). The calculated "CAM" fields are number type.
[CAM_WidelyKnown] text This field answers the question "Is information about the vulnerability widely available or known?"
[CAM_Exploitation] text This field answers the question "Is the vulnerability being exploited?"
[CAM_InternetInfrastructure] text This field answers the question "Is internet infrastructure at risk because of this vulnerability?"
[CAM_Population] text This field answers the question "How many systems on the internet are at risk from this vulnerability?"
[CAM_Impact] text This field answers the question "What is the impact of exploiting the vulnerability?"
[CAM_EaseOfExploitation] text This field answers the question "How easy is it to exploit the vulnerability?"
[CAM_AttackerAccessRequired] text This field answers the question "What are the preconditions does an attacker require to exploit the vulnerability?"
[CAM_ScoreCurrent] number Calculated CERT Advisory Metric score, decimal number from 0-180.
[CAM_ScoreCurrentWidelyKnown] number Calculated CERT Advisory Metric score with [CAM_WidelyKnown] set to 20.
[CAM_ScoreCurrentWidelyKnownExploited] number Calculated CERT Advisory Metric score with [CAM_WidelyKnown] and [CAM_Exploitation] both set to 20.
[IPProtocol] text IP protocol information related to the vulnerability, e.g., 80/tcp, 161/udp.
Notes on fields beginning with CVSS_ text "CVSS_" in the field name stands for "Common Vulnerability Scoring System." More information is available here: http://www.first.org/cvss/cvss-guide and in Vulnerability Severity Using CVSS.

Vulnerability Reports started including CVSS v2 metrics in March 2012. Older reports will have empty CVSS_ values. Reports that have not been scored will have default CVSS_ values, including "--" for some fields.
[CVSS_AccessVector] text See http://www.first.org/cvss/cvss-guide#i2.1.1
[CVSS_AccessComplexity] text See http://www.first.org/cvss/cvss-guide#i2.1.2
[CVSS_Authenication] text See http://www.first.org/cvss/cvss-guide#i2.1.3
[CVSS_ConfidentialityImpact] text See http://www.first.org/cvss/cvss-guide#i2.1.4
[CVSS_IntegrityImpact] text See http://www.first.org/cvss/cvss-guide#i2.1.5
[CVSS_AvailabilityImpact] text See http://www.first.org/cvss/cvss-guide#i2.1.6
[CVSS_Exploitability] text See http://www.first.org/cvss/cvss-guide#i2.2.1
[CVSS_RemediationLevel] text See http://www.first.org/cvss/cvss-guide#i2.2.2
[CVSS_ReportConfidence] text See http://www.first.org/cvss/cvss-guide#i2.2.3
[CVSS_CollateralDamagePotential] text See http://www.first.org/cvss/cvss-guide#i2.3.1
[CVSS_TargetDistribution] text See http://www.first.org/cvss/cvss-guide#i2.3.2
[CVSS_TargetDistribution] maps to
[CVSS_SecurityRequirementsCR] text See http://www.first.org/cvss/cvss-guide#i2.3.3
[CVSS_SecurityRequirementsIR] text See http://www.first.org/cvss/cvss-guide#i2.3.3
[CVSS_SecurityRequirementsAR] text See http://www.first.org/cvss/cvss-guide#i2.3.3
[CVSS_BaseScore] text See http://www.first.org/cvss/cvss-guide#i3.2.1
[CVSS_BaseVector] text See http://www.first.org/cvss/cvss-guide#i2.4
[CVSS_TemporalScore] text See http://www.first.org/cvss/cvss-guide#i3.2.2
[CVSS_TemporalVector] text See http://www.first.org/cvss/cvss-guide#i2.4
[CVSS_EnvironmentalScore] text See http://www.first.org/cvss/cvss-guide#i3.2.3
[CVSS_EnvironmentalVector] text See http://www.first.org/cvss/cvss-guide#i2.4

Field Definitions for Vendor Records

Field Name Data Type Description
[ID] text Vulnerability report ID, the format is VU#nnnnnn (six digits), with some older records having fewer than six digits. This is the associated vulnerability report for the vendor record.
[VendorRecordID] text Lotus Notes unique ID for a vendor record
[Vendor] text Name of the vendor.
[Status] text Status of the vendor with regard to the vulnerability. By default, the status is "Unknown." If we believe a vendor is affected, the status is "Affected" or "Vulnerable". If we believe a vendor is not affected, the status is "Not Affected" or "Not Vulnerable".
[VendorStatement] text A statement about the vunerability by the vendor. The statement in this field is authenticated -- usually the text has been cryptographically signed by the vendor or verified by CERT out-of-band.
[VendorInformation] text Information about the vendor about the vulnerability. This is more loosely vetted information, for example something available on the vendor's web site.
[VendorReferences] text or textlist URLs to vendor reference information about the vulnerability.
[Addendum] text Additional comments or rebuttal from CERT to
[DateNotified] datetime Date the vendor was notified by CERT. Notification implies at least that CERT sent email to a last-known good contact at the vendor.
[DateResponded] datetime Date the vendor responded.
[DateLastUpdated] datetime Date the vendor record was last updated.
[Revision] number Number of times the vendor record was revised, with "1" being the initial creation of the record.