DBCarver
DBCarver is a research tool that performs database management system (DBMS) forensic analysis. It allows users to inspect DBMS storage beyond what DBMSes expose, including unallocated disk space and RAM contents. DBCarver runs independent of the DBMS, applies to all major relational row-store DBMSes (list of supported DBMSes), and does not require a live system.
DBCarver was inspired by the forensic technique called file carving. File carving reconstructs files without using the file system or any of its metadata. However, file carving cannot be used for database files for a number of reasons which we discuss in our papers. As a solution, we introduced database page carving. Page carving not only works for the query-able database contents, but also deleted data, unallocated storage, and RAM contents. Our current implementation of database page carving is called DBCarver.
Overview: How Does DBCarver Work?
DBCarver consists of two main components: the parameter collector (A) and the carver (F). The parameter detector is essentially the calibration part of DBCarver. It typically only needs to be run a single time for each DBMS (or a new DBMS version) to generate a configuration file. For new DBMS versions, parameter collection will typically verify that the storage layout remains the same. A new DBMS would produce new configuration layout file, depending on its specifics. Once you have created a configuration file, you can use the carver to extract all database contents from any type of file including raw disk images, individual DBMS files, and RAM snapshots.
In order to determine DBMS storage configuration, the parameter detector (A) loads synthetic data into a DBMS (B), captures storage (C), finds pages in storage, and captures page layout parameters in a configuration file (E) -- a text file describing page-level layout for that particular DBMS. Parameters include those described in DFRWS 2015, and have since been expanded to support other metadata. DBCarver automatically generates parameters values for new DBMSes, or new DBMS versions. While most DBMSes retain the same page layout across versions, we observed different parameter values between PostgreSQL versions 7.3 and 8.4.
The carver (F) uses the configuration files to reconstruct any database content from disk images, RAM snapshots, or any other input file (G). The carver returns storage artifacts (H), such as user records, metadata describing user data, deleted data, and system catalogs.
Supported DBMSes
DBCarver has been tested on the following DBMSes, although it will support others by default or with minimal tweaking. As DBCarver has been evaluated on these DBMSes, new parameter files do not need to be generated.
Commercial DBMSes
- IBM DB2
- Microsoft SQL Server
- Oracle
Open-source DBMSes
- ApacheDerby
- Firebird
- MariaDB
- MySQL
- PostgreSQL
- SQLite
Examples
A copy of DBCarver code is freely available upon request.Use Cases
We have worked with a number of forensic consultant and law enforcement agencies to evaluate DBCarver. This list includes Regional Chicago Forensic Lab (lab managed by FBI), Mandiant/FireEye, Royal Canadian Mounted Police, and Virginia Department of Forensic Science. Through these collaborations, we have helped find new evidence for active investigations. In addition to gathering raw evidence, we have considered two cybersecurity threat detection approaches that leverage forensic artifacts extracted by DBCarver:Database Administrator (DBA) threat Consider a scenario where a DBA (i.e., a person tasked with managing and maintaining the DBMS) is performing a malicious action against a database. For our purposes, it does not matter whether the DBA or an attacker who took over the DBA's account is performing the tampering in question. DBA accounts have the ability to temporarily suspend audit logs (legitimate reasons include improving performance of bulk data load). Thus, a DBA account is capable of disabling the audit log, performing malicious command(s) and then re-enabling the audit log.
In our paper, we explain algorithms that can identify such actions independently of the database by matching the audit log contents to the forensic artifacts discovered by DBCarver. For example, in the following figure, we can see two deleted records that are attributed to a command (deleted records #1 and #3), and one suspicious record (#4) that could not be explained by any logged commands.System Administrator (SysAdmin) threat A second type of extra-priviledged attack that we are able thwart is described in our EDBT 2018 paper. In that case, we consider malicious actions by SysAdmin, an account with administrative priviledges at the OS level but with no particular capabilities within the DBMS. Such user is not capable of affecting the audit log (in a secure environment, where the audit log is protected from tampering). However, a SysAdmin is able to finding and directly edit the DBMS files on disk. This attack is particularly difficult to counter because this edit happens in-place, completely bypassing DBMS access control or logging and leaving no direct trace.
Due to the nature of the attack, our solution relies on a somewhat more proactive approach of comparing the state of database indexes (which often exist by default, and are created to improve performance) and the state of the database table. The example below illustrates an attack that wipes out a record (#4) in-place. Note that any delete performed through the DBMS itself will look like records #2 and #6 (i.e., flagged as deleted, not erased). In this example, index still contains the value of the deleted record. However, even if the SysAdmin successfully wiped out "Seattle" entry from the index, the remaining index gap would also indicate that a value is missing because values are never physically wiped from indexes.Literature
-
Detecting Database File Tampering through Page Carving, EDBT 2018.
Database Management Systems (DBMSes) secure data against regular users through defensive mechanisms such as access control, and against privileged users with detection mechanisms such as audit logging. Interestingly, these security mechanisms are built into the DBMS and are thus only useful for monitoring or stopping operations that are executed through the DBMS API. Any access that involves directly modifying database files (at file system level) would, by definition, bypass any and all security layers built into the DBMS itself.
In this paper, we propose and evaluate an approach that detects direct modifications to database files that have already bypassed the DBMS and its internal security mechanisms. Our approach applies forensic analysis to first validate database indexes and then compares index state with data in the DBMS tables. We show that indexes are much more difficult to modify and can be further fortified with hashing. Our approach supports most relational DBMSes by leveraging index structures that are already built into the system to detect database storage tampering that would currently remain undetectable.
-
PLI: Augmenting Live Databases with Custom Clustered Indexes , SSDBM 2017.
RDBMSes only support one clustered index per database table that can speed up query processing. Database applications, that continually ingest large amounts of data, perceive slow query response times to long downtimes, as the clustered index ordering must be strictly maintained. In this paper, we show that application slowdown or downtime, however, can often be avoided if database systems expose the physical location of attributes that are completely or approximately clustered.
Towards this, we propose PLI, a physical location index, constructed by determining the physical ordering of an attribute and creating approximately sorted buckets that map physical ordering with attribute values in a live database. To use a PLI incoming SQL queries are simply rewritten with physical ordering information for that particular database. Experiments show queries with the PLI index significantly outperform queries using native unclustered (secondary) indexes, while the index itself requires a much lower maintenance overheads when compared to native clustered indexes.
-
Carving Database Storage to Detect and Trace Security Breaches, DFRWS 2017.
Database Management Systems (DBMS) are routinely used to store and process sensitive enterprise data. However, it is not possible to secure data by relying on the access control and security mechanisms (e.g., audit logs) of such systems alone - users may abuse their privileges (no matter whether granted or gained illegally) or circumvent security mechanisms to maliciously alter and access data. Thus, in addition to taking preventive measures, the major goal of database security is to 1) detect breaches and 2) to gather evidence about attacks for devising counter measures. We present an approach that evaluates the integrity of a live database, identifying and reporting evidence for log tampering. Our approach is based on forensic analysis of database storage and detection of inconsistencies between database logs and physical storage state (disk and RAM). We apply our approach to multiple DBMS to demonstrate its effectiveness in discovering malicious operations and providing detailed information about the data that was illegally accessed/modified.
-
Database Forensic Analysis with DBCarver, CIDR 2017.
The increasing use of databases in the storage of critical and sensitive information in many organizations has lead to an increase in the rate at which databases are exploited in computer crimes. While there are several techniques and tools available for database forensics, they mostly assume apriori database preparation, such as relying on tamper-detection software to be in place or use of detailed logging. Investigators, alternatively, need forensic tools and techniques that work on poorly-configured databases and make no assumptions about the extent of damage in a database.
In this paper, we present DBCarver, a tool for reconstructing database content from a database image without using any log or system metadata. The tool uses page carving to reconstruct both query-able data and non-queryable data (deleted data). We describe how the two kinds of data can be combined to enable a variety of forensic analysis questions hitherto unavailable to forensic investigators. We show the generality and efficiency of our tool across several databases through a set of robust experiments.
-
Database Image Content Explorer: Carving Data That Does not Officially Exist, DFRWS 2016.
When a file is deleted, the storage it occupies is de-allocated but the contents of the file are not erased. An extensive selection of file carving tools and techniques is available to forensic analysts – and yet existing file carving techniques cannot recover database storage because all database storage engines use proprietary and unique storage format. Database systems are widely used to store and process data – both on a large scale (e.g., enterprise networks) and for personal use (e.g., SQLite in mobile devices or Firefox). For some databases, users can purchase specialized recovery tools capable of discovering valid rows in storage and yet there are no tools that can recover deleted rows or make analysts aware of the “unseen” database content.
Deletion is just one of the many operations that create de-allocated data in database storage. We use our Database Image Content Explorer tool, based on a universal database storage model, to recover a variety of phantom data: a) data that was actually deleted by a user, b) data that is marked as deleted, but was never explicitly deleted by any user and c) data that is not marked as deleted and had been de-allocated without anyone’s knowledge. Data persists in active database tables, in memory, in auxiliary structures or in discarded pages. Strikingly, our tool can even recover data from inserts that were canceled, and thus never officially existed in a data table, which may be of immeasurable value to investigation of financial crimes. In this paper, we describe many recoverable database storage artifacts, investigate survival of data and empirically demonstrate across different databases what our universal, multi-database tool can recover.
-
Database Forensic Analysis through Internal Structure Carving, DFRWS 2015.
Forensic tools assist analysts with recovery of both the data and system events, even from corrupted storage. Thesetools typically rely on “file carving” techniques to restore files after metadata loss by analyzing the remaining rawfile content. A significant amount of sensitive data is stored and processed in relational databases thus creating the need for database forensic tools that will extend file carving solutions to the database realm. Raw database storage is partitioned into individual “pages” that cannot be read or presented to the analyst without the help of the database itself. Furthermore, by directly accessing raw database storage, we can reveal things that are normally hidden from database users.
There exists a number of database-specific tools developed for emergency database recovery, though not usually for forensic analysis of a database. In this paper, we present a universal tool that seamlessly supports many different databases, rebuilding table and other data content from any remaining storage fragments on disk or in memory. We define an approach for automatically (with minimal user intervention) reverse engineering storage in new databases, for detecting volatile data changes and discovering user action artifacts. Finally, we empirically verify our tool’s ability to recover both deleted and partially corrupted data directly from the internal storage of different databases.
Team
- Alexander Rasin
- James Wagner
- Karen Heart
- Tanu Malik
- Jonathan Grier