Project Final Report

Course: CSL3050-Database Systems
Instructor: Nitin Awathare
Team: Devesh Bharwaj (B23CS1014) && Diwanshu Yadav (B23CS1017)

Abstract

This report presents the implementation and evaluation of (1) a buffered Paged File (PF) layer with selectable page-replacement strategies (LRU / MRU), (2) a Slotted Page (SP) layer for variable-length records on top of PF, and (3) index-construction experiments using the AM layer (bulk vs incremental). The PF layer tracks dirty pages and records I/O statistics; the SP layer supports insertion, deletion, compaction and sequential scanning. We compare replacement strategies under mixed read/write workloads, show space-utilization benefits of slotted pages over static record allocation, and evaluate three index-construction methods. The assignment specification and testbed are those supplied with the course materials.

Objective Summary
Design & Implementation
- PF Layer: Buffer Manager (LRU / MRU)
- Dirty-flag & Write-back policy
- Statistics collection
- SP Layer: Slotted Page design & API
- AM Layer: Index construction methods
Testbed & Methodology
Results
- PF Layer: LRU vs MRU (I/O & hit rate)
- SP Layer: Space-utilization comparison
- AM Layer: Index construction timings & page accesses
Plots & Python code to reproduce visualizations
Analysis & Interpretation
Conclusion & Future Work
Appendix: Key logs / test outputs

1. Objective Summary

The assignment required:

Implementing a PF buffer manager with selectable LRU/MRU replacement strategies and configurable buffer-pool size; page-level dirty flags and write-back-on-eviction; and collection/reporting of logical/physical I/O counters and buffer hit rate. The system must produce statistics for different mixtures of read/write workloads and visualize them.
Implementing an SP layer that stores variable-length student records (insert/delete/scan/compaction) and compares space utilization to static record management (different maximum record sizes).
Using the AM layer to construct indexes in three ways (bulk from file/unsorted, incremental inserts, and efficient bulk-loading when pre-sorted) and compare performance.

2. Design & Implementation

2.1 PF Layer: Buffer Manager

Selectable policies: PF_OpenFile(..., policy) accepts an integer flag selecting the replacement policy. The flag is stored in a global bufferPolicy used by the allocator.

Victim selection: The internal allocator (PFbufInternalAlloc) traverses the buffer list:

LRU: start from tail (PFlastbpage) and choose the first unpinned page found (oldest).
MRU: start from head (PFfirstbpage) and choose the first unpinned page (most recently used).

The traversal is safe-guarded for pinned-page conditions and returns error if no candidate is found.

2.2 Dirty Flag & Write-back

Each buffer-frame maintains a dirty boolean.
PF_UnfixPage allows marking pages dirty.
On victim selection, if dirty == true then the page is written to disk before reuse; physical_writes++ is incremented.

2.3 Statistics Collection

Counters implemented and exposed:

logical_reads — incremented on every PF_GetThisPage call (a request for a page).
physical_reads — incremented when a requested page misses buffer and must be read from disk.
physical_writes — incremented on writes to disk (evictions of dirty pages).
Buffer hit rate = 1 - (physical_reads / logical_reads) (or computed using counts; reported as percent).

API:

PF_InitStats() — zero counters
PF_PrintStats() — output current counters & hit rate

2.4 SP Layer: Slotted Page design

Page layout: Page header (slot count, free offset), slot directory growing forward, and record area growing backward.
API:
- SP_InsertRec: find page with contiguous free space; if none, allocate new page. Slot directory updated with record offset & length.
- SP_DeleteRec: mark slot deleted (e.g., set length = -1 or a deleted flag).
- SP_CompactPage: called automatically when fragmentation exceeds threshold (e.g., >50% fragmented), compacts records, rebuilds slot directory, and recovers contiguous free space.
- SP_GetFirstRec / SP_GetNextRec: sequential scanning that skips deleted slots.

2.5 AM Layer: Index Construction Methods

We implemented and evaluated three methods using roll-no as key:

Method 1 — Bulk from existing data file (unsorted): Scan data file, insert keys into B+ tree (bulk creation by scanning & inserting).
Method 2 — Incremental index construction (empty file): Start with empty index and insert records one-by-one as they come.
Method 3 — Efficient bulk-loading from pre-sorted data: Sort records by key first and then build the B+ tree leaf nodes sequentially (faster as nodes are filled once).

3. Testbed & Methodology

Test programs used: test_stats, test_linear_scan, test_sp, testbl, test_indexing (provided/extended in assignment package).
Workloads: Randomized read/write mixes from 10% reads to 90% reads (in 10% increments) were run and averaged across multiple runs for LRU & MRU.
SP tests: Insertion of 1000 variable-length records, space utilization measurement; correctness tests (insert/retrieve/delete/compaction) via test_sp.
Index tests: Index creation on student data with N=17814 records; timings measured for Method 1 (bulk from file), Method 2 (incremental), and Method 3 (sorted bulk).

4. Results

The following results are from the test runs recorded while developing and evaluating the implementation (raw test output excerpt in Appendix).

4.1 PF Layer — LRU vs MRU (averaged over 5 runs each)

LRU Strategy Performance

Read %	Avg. Phys. Reads	Avg. Phys. Writes	Avg. Hit Rate %
10%	672.00	621.80	32.80%
20%	674.00	576.60	32.60%
30%	675.40	531.60	32.46%
40%	665.40	470.20	33.46%
50%	664.80	401.80	33.52%
60%	663.40	344.40	33.66%
70%	668.00	271.00	33.20%
80%	663.00	199.00	33.70%
90%	678.80	109.00	32.12%

MRU Strategy Performance

Read %	Avg. Phys. Reads	Avg. Phys. Writes	Avg. Hit Rate %
10%	675.00	628.40	32.50%
20%	677.60	572.20	32.24%
30%	666.20	520.60	33.38%
40%	670.00	465.00	33.00%
50%	674.00	403.20	32.60%
60%	670.40	335.60	32.96%
70%	666.20	263.80	33.38%
80%	665.00	189.40	33.50%
90%	669.40	111.80	33.06%

Observation: Under random access, both LRU and MRU show very similar hit rates (~32–34%) and similar physical I/O counts. This is expected for a workload with low temporal locality.

4.2 SP Layer — Space Utilization Comparison

From test_stats run (1,000 variable-length records; reported measured data used for comparison):

Management Strategy	Total Pages	Space Utilization
Slotted Page (Our Impl.)	9	99.66%
Static (Max Rec = 50)	13	53.64%
Static (Max Rec = 100)	25	27.89%
Static (Max Rec = 200)	50	13.95%
Static (Max Rec = 500)	125	5.58%
Static (Max Rec = 1000)	250	2.79%
Static (Max Rec = 2000)	500	1.39%

Observation: The slotted-page approach dramatically improves space efficiency for variable-length data when compared to static-fixed-record layouts that reserve a fixed (large) slot per record.

4.3 AM Layer — Index Construction Summary

Method 1 (Bulk from existing file / linear scan):

Processed: 17814 records
Index construction time: 0.013526 sec
PF Layer statistics (during index construction): Logical Reads = 66826, Physical Reads = 558, Physical Writes = 483, Buffer Hit Rate = 99.17%.
Query test: 100 random roll-number queries completed in 0.000068 sec, 0 records missing.

Method 2 (Incremental):

Construction time: 3.249306 sec
PF stats: Logical Reads = 4317997, Physical Reads = 4,309,889, Physical Writes = 35141, Buffer Hit Rate = 0.19%. (Note: huge cost due to repeated page churn from incremental inserts.)
Query test: 100 queries completed in 0.000113 sec, with 10 records not found (likely due to partial/incomplete indexing run or dataset handling — see Appendix).

Method 3 (Efficient sorted bulk-load):

Tested with 200 pre-sorted records in this run: construction time: 0.000071 sec
PF stats: Logical Reads = 903, Physical Reads = 1, Physical Writes = 0, Buffer Hit Rate = 99.89%.

Observation: Sorted bulk-loading (Method 3) is far more efficient in terms of I/O and time than incremental insertion. Building from an existing file (Method 1) is efficient if implemented to avoid repetitive I/O. Incremental construction is expensive in I/O and time.

5. Plots & Python Code to Reproduce Visualizations

Below are two visualization scripts (self-contained). Run them in Python 3 with matplotlib and pandas installed. They generate:

performance_graph.png — LRU vs MRU: physical reads & writes (bars) with the hit rate as a line.
space_utilization.png — Slotted page vs static-managed strategies.

5.1 Plot: LRU vs MRU (I/O & Hit Rate)

6. Analysis & Interpretation

PF Layer

Hit Rate & I/O: For the randomized workloads used, LRU and MRU show nearly identical hit rates (~32–34%). Physical read counts are also similar across read-mix values. This indicates that under low temporal locality (random access) replacement-policy choice is not the dominating performance factor.
Physical Writes trend: As workload read% increases, physical writes decrease (fewer updates), matching expectations: fewer writes → fewer dirty-page evictions.
When policy matters: Replacement policy differences become meaningful for workloads with structure:
- Sequential scan / streaming: MRU can be advantageous because recently used pages are unlikely to be reused in a forward-only scan, allowing MRU to evict them without harming future accesses.
- Hot-set workloads: LRU tends to preserve recently and frequently used pages, benefiting workloads with a small hot set accessed repeatedly.
Takeaway: Use policy selection as an adaptive knob based on workload characteristics rather than a one-size-fits-all choice.

SP Layer

Space utilization: The slotted-page implementation achieves near-optimal packing for variable-length records (≈99% in our measured run), dramatically outperforming static fixed-slot layouts when record size varies widely.
Fragmentation & compaction: The automatic compaction mechanism successfully reclaimed space when fragmentation exceeded the threshold, enabling insertion of large records post-deletion. This confirms the correctness and practicality of the compaction policy.
Correctness: Scanning, deletion, and per-record fetch semantics are correct (deleted records are skipped / inaccessible), as validated by the test_sp correctness suite.

AM Layer & Indexing

Method comparison:
- Method 1 (bulk from file / linear scan): Efficient when implemented to minimize repeated I/O; fast for building indexes from an existing data dump.
- Method 2 (incremental): Significantly more expensive in I/O and time due to frequent node splits and page churn; not recommended for large-scale initial construction.
- Method 3 (sorted bulk-load): Best performance when input is pre-sorted—low I/O and minimal node splitting by construction.
Practical advice: For large datasets, prefer a two-phase approach: sort (or external sort) by key and bulk-load, or build incrementally only for small update batches.

7. Conclusion & Future Work

Conclusions

The PF buffer manager correctly implements selectable LRU/MRU policies, dirty-flag write-backs, and accurate I/O statistics collection.
For randomized workloads with low locality, LRU and MRU produced similar performance; policy choice has bigger impact on non-random workloads.
The SP layer is highly effective for variable-length records; it outperforms static allocation in space efficiency and supports compaction to mitigate fragmentation.
Efficient index construction (sorted bulk-load) dramatically reduces build time and I/O compared to incremental insertion.

Future Work

Workload-driven policy selection: Implement heuristics to detect workload type (sequential vs. random vs. hotspot) and dynamically switch replacement policy (LRU/MRU/CLOCK).
Additional replacement policies: Add CLOCK and LFU to evaluate middle-ground behaviors between LRU and MRU.
Per-page metrics: Track per-page fragmentation, average lifetime, and reuse patterns to guide compaction thresholds and prefetch strategies.
Concurrency & recovery: Extend PF/SP layers for concurrent access (locks/latches) and integrate simple logging/recovery for crash consistency (WAL).
Performance regression tests: Automate end-to-end test harnesses to run standardized workloads and report results for CI-based performance tracking.

8. Appendix — Key Test Output Excerpts

A. Slotted Page / Storage Statistics

Test Summary

Total Records Inserted: 1000
Total Bytes of Actual Data: 28,560

Page Utilization Comparison

Management Strategy	Total Pages	Space Utilization
Slotted Page (Our Impl.)	9	99.66%
Static (Max Rec = 50)	13	53.64%
...	...	...

B. Indexing — Method 1 (Bulk Index Construction)

Test Summary

Total Records Inserted: 1000
Total Bytes of Actual Data: 28,560

Page Utilization Comparison

Management Strategy	Total Pages	Space Utilization
Slotted Page (Our Impl.)	9	99.66%
Static (Max Rec = 50)	13	53.64%
...	...	...

Execution Log

Records inserted into data file: 17,814
Index construction time: 0.013526 s
Index creation status: Successful

PF Layer Statistics

Logical Reads: 66,826
Physical Reads: 558
Physical Writes: 483
Buffer Hit Rate: 99.17%
...

C. Indexing — Method 2 (Incremental Insertion)

Execution Log

Index construction time: 3.249306 s

PF Layer Statistics

Logical Reads: 4,317,997
Physical Reads: 4,309,889
Physical Writes: 35,141
Buffer Hit Rate: 0.19%
...

How to Run

This project is built using make. The build process requires compiling several dependency layers in a specific order before building the final executable.

Prerequisites

git
make
A C/C++ compiler toolchain (e.g., GCC/g++)

1. Clone the Repository

First, clone the project repository to your local machine.

git clone https://github.com/bhardwajdevesh092005/Database_Assignment.git
cd Database_Assignment

2. Compile Dependencies

You must compile each layer sequentially to create the necessary object files (.o).

Compile pflayer:
```
cd pflayer
make
cd ..
```
Compile splayer:
```
cd splayer
make
cd ..
```
Compile bulklayer:
```
cd bulklayer
make
cd ..
```

3. Compile the Main Application

Once all dependencies are built, run make from the root directory to build the amlayer and the final test_indexing executable.

make

4. Run the Main Application

You can now execute the main performance test.

./test_indexing

Optional: Running Individual Layer Tests

After compiling a specific layer (as shown in Step 2), you can run its standalone tests to verify its functionality.

Slotted Page Layer (`splayer`) Tests

cd splayer
./test_sp
./test_stats
./test_linear_scan
cd ..

Bulk-Load Layer (`bulklayer`) Test

cd bulklayer
./testbl
cd ..

Clean Up

To remove all compiled object files and executables from all directories, run the following command from the project's root directory.

make clean
(cd pflayer && make clean)
(cd splayer && make clean)
(cd bulklayer && make clean)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
amlayer		amlayer
bulklayer		bulklayer
pflayer		pflayer
splayer		splayer
.DS_Store		.DS_Store
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
am.ps		am.ps
am_bulkload.c		am_bulkload.c
am_bulkload.h		am_bulkload.h
makefile		makefile
performance_graph.png		performance_graph.png
performance_graph_n.png		performance_graph_n.png
pf.ps		pf.ps
plot_stats.py		plot_stats.py
report.md		report.md
space_utilization_n.png		space_utilization_n.png
student.txt		student.txt
student_data.h		student_data.h
student_db		student_db
test_indexing.c		test_indexing.c

Folders and files

Latest commit

History

Repository files navigation

Project Final Report

Abstract

Table of Contents

1. Objective Summary

2. Design & Implementation

2.1 PF Layer: Buffer Manager

2.2 Dirty Flag & Write-back

2.3 Statistics Collection

2.4 SP Layer: Slotted Page design

2.5 AM Layer: Index Construction Methods

3. Testbed & Methodology

4. Results

4.1 PF Layer — LRU vs MRU (averaged over 5 runs each)

4.2 SP Layer — Space Utilization Comparison

4.3 AM Layer — Index Construction Summary

5. Plots & Python Code to Reproduce Visualizations

5.1 Plot: LRU vs MRU (I/O & Hit Rate)

6. Analysis & Interpretation

PF Layer

SP Layer

AM Layer & Indexing

7. Conclusion & Future Work

Conclusions

Future Work

8. Appendix — Key Test Output Excerpts

A. Slotted Page / Storage Statistics

B. Indexing — Method 1 (Bulk Index Construction)

C. Indexing — Method 2 (Incremental Insertion)

How to Run

Prerequisites

1. Clone the Repository

2. Compile Dependencies

3. Compile the Main Application

4. Run the Main Application

Optional: Running Individual Layer Tests

Slotted Page Layer (splayer) Tests

Bulk-Load Layer (bulklayer) Test

Clean Up

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Slotted Page Layer (`splayer`) Tests

Bulk-Load Layer (`bulklayer`) Test

Packages