Initialize CBDD solution and add a .NET-focused gitignore for generated artifacts.
This commit is contained in:
679
C-BSON.md
Executable file
679
C-BSON.md
Executable file
@@ -0,0 +1,679 @@
|
||||
# C-BSON: Compressed BSON Format
|
||||
|
||||
## What is C-BSON?
|
||||
|
||||
**C-BSON** (Compressed BSON) is CBDD's optimized wire format that maintains full BSON type compatibility while achieving significant space savings through **field name compression**. This innovation reduces document size by 30-60% for typical schemas, improving both storage efficiency and I/O performance.
|
||||
|
||||
### The Problem with Standard BSON
|
||||
|
||||
Standard BSON stores field names as **null-terminated UTF-8 strings** in every document. Consider a typical user document:
|
||||
|
||||
```javascript
|
||||
{
|
||||
"_id": ObjectId("..."),
|
||||
"email": "user@example.com",
|
||||
"created_at": ISODate("2026-02-12"),
|
||||
"last_login": ISODate("2026-02-12")
|
||||
}
|
||||
```
|
||||
|
||||
**Field Name Overhead:**
|
||||
- `_id` → 4 bytes (3 chars + null terminator)
|
||||
- `email` → 6 bytes
|
||||
- `created_at` → 11 bytes
|
||||
- `last_login` → 11 bytes
|
||||
|
||||
**Total overhead: 32 bytes** just for field names in a 4-field document.
|
||||
|
||||
### The C-BSON Solution: Key Compression
|
||||
|
||||
C-BSON replaces field names with **2-byte numeric IDs** via a schema-based dictionary:
|
||||
|
||||
```
|
||||
Standard BSON: [type][field_name\0][value]
|
||||
C-BSON: [type][field_id: ushort][value]
|
||||
```
|
||||
|
||||
**Space Savings:**
|
||||
|
||||
| Field Name | Standard BSON | C-BSON | Savings |
|
||||
|:---------------|:--------------|:--------|:--------|
|
||||
| `_id` | 4 bytes | 2 bytes | 50% |
|
||||
| `email` | 6 bytes | 2 bytes | 67% |
|
||||
| `created_at` | 11 bytes | 2 bytes | 82% |
|
||||
| `last_login` | 11 bytes | 2 bytes | 82% |
|
||||
|
||||
**Result:** The same 4-field document saves **24 bytes** per instance. For 1 million documents, that's **~23 MB saved**.
|
||||
|
||||
---
|
||||
|
||||
## Wire Format Specification
|
||||
|
||||
### Document Structure
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────┐
|
||||
│ [4 bytes] Document Size (int32 little-endian)│
|
||||
├────────────────────────────────────────────────┤
|
||||
│ [Elements...] │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ [1 byte] Type Code │ │
|
||||
│ │ [2 bytes] Field ID (ushort) │ │
|
||||
│ │ [N bytes] Value (type-dependent) │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ [Repeat for each field] │
|
||||
├────────────────────────────────────────────────┤
|
||||
│ [1 byte] End of Document (0x00) │
|
||||
└────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Element Header Comparison
|
||||
|
||||
**Standard BSON Element Header:**
|
||||
```
|
||||
[1 byte: type code][N bytes: null-terminated UTF-8 string]
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Variable length: min 2 bytes, no max
|
||||
```
|
||||
|
||||
**C-BSON Element Header:**
|
||||
```
|
||||
[1 byte: type code][2 bytes: field ID as ushort little-endian]
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Fixed length: exactly 2 bytes
|
||||
```
|
||||
|
||||
### Type Codes
|
||||
|
||||
C-BSON uses **standard BSON type codes** for full compatibility:
|
||||
|
||||
| Code | Type | Description |
|
||||
|:-----|:------------|:-------------------------------------|
|
||||
| 0x01 | Double | 64-bit IEEE 754 floating point |
|
||||
| 0x02 | String | UTF-8 string (int32 length + data + null) |
|
||||
| 0x03 | Document | Embedded document |
|
||||
| 0x04 | Array | Embedded array |
|
||||
| 0x05 | Binary | Binary data (subtype + length + data)|
|
||||
| 0x07 | ObjectId | 12-byte MongoDB-compatible ObjectId |
|
||||
| 0x08 | Boolean | 1 byte (0x00 or 0x01) |
|
||||
| 0x09 | DateTime | UTC milliseconds (int64) |
|
||||
| 0x10 | Int32 | 32-bit signed integer |
|
||||
| 0x12 | Int64 | 64-bit signed integer |
|
||||
| 0x13 | Decimal128 | 128-bit decimal (IEEE 754-2008) |
|
||||
|
||||
---
|
||||
|
||||
## Schema-Based Key Mapping
|
||||
|
||||
### Bidirectional Dictionary
|
||||
|
||||
C-BSON requires a **schema-driven key mapping** maintained in memory:
|
||||
|
||||
**Writer Side:**
|
||||
```csharp
|
||||
ConcurrentDictionary<string, ushort> _keyMap;
|
||||
// Example:
|
||||
// "\_id" → 1
|
||||
// "email" → 2
|
||||
// "created_at" → 3
|
||||
```
|
||||
|
||||
**Reader Side:**
|
||||
```csharp
|
||||
ConcurrentDictionary<ushort, string> _keys;
|
||||
// Example:
|
||||
// 1 → "\_id"
|
||||
// 2 → "email"
|
||||
// 3 → "created_at"
|
||||
```
|
||||
|
||||
### Schema Generation
|
||||
|
||||
CBDD automatically generates schemas from C# types using reflection:
|
||||
|
||||
```csharp
|
||||
public class User
|
||||
{
|
||||
public ObjectId Id { get; set; }
|
||||
public string Email { get; set; }
|
||||
public DateTime CreatedAt { get; set; }
|
||||
}
|
||||
|
||||
// Generated schema:
|
||||
// Field 1: "_id" (ObjectId)
|
||||
// Field 2: "email" (String)
|
||||
// Field 3: "created_at" (DateTime)
|
||||
```
|
||||
|
||||
### Schema Storage
|
||||
|
||||
Schemas are stored in the **Page 1 (Collection Metadata)** and loaded into memory on database open:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ [Schema Hash (long)] │
|
||||
│ [Schema Version (int)] │
|
||||
│ [Field Count (ushort)] │
|
||||
├─────────────────────────────────────────┤
|
||||
│ For each field: │
|
||||
│ [Field ID (ushort)] │
|
||||
│ [Field Name Length (byte)] │
|
||||
│ [Field Name UTF-8 bytes] │
|
||||
│ [BSON Type Code (byte)] │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### BsonSpanWriter (Serialization)
|
||||
|
||||
Zero-allocation writer using `Span<byte>`:
|
||||
|
||||
```csharp
|
||||
public ref struct BsonSpanWriter
|
||||
{
|
||||
private Span<byte> _buffer;
|
||||
private int _position;
|
||||
private readonly ConcurrentDictionary<string, ushort> _keyMap;
|
||||
|
||||
public void WriteElementHeader(BsonType type, string name)
|
||||
{
|
||||
// Write type code
|
||||
_buffer[_position++] = (byte)type;
|
||||
|
||||
// Lookup field ID in dictionary
|
||||
if (!_keyMap.TryGetValue(name, out var id))
|
||||
throw new InvalidOperationException($"Field '{name}' not in schema");
|
||||
|
||||
// Write field ID (2 bytes, little-endian)
|
||||
BinaryPrimitives.WriteUInt16LittleEndian(_buffer.Slice(_position, 2), id);
|
||||
_position += 2;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```csharp
|
||||
var keyMap = new ConcurrentDictionary<string, ushort>();
|
||||
keyMap["_id"] = 1;
|
||||
keyMap["name"] = 2;
|
||||
|
||||
Span<byte> buffer = stackalloc byte[1024];
|
||||
var writer = new BsonSpanWriter(buffer, keyMap);
|
||||
|
||||
writer.WriteObjectId("_id", user.Id);
|
||||
writer.WriteString("name", user.Name);
|
||||
```
|
||||
|
||||
### BsonSpanReader (Deserialization)
|
||||
|
||||
Zero-allocation reader using `ReadOnlySpan<byte>`:
|
||||
|
||||
```csharp
|
||||
public ref struct BsonSpanReader
|
||||
{
|
||||
private ReadOnlySpan<byte> _buffer;
|
||||
private int _position;
|
||||
private readonly ConcurrentDictionary<ushort, string> _keys;
|
||||
|
||||
public string ReadElementHeader()
|
||||
{
|
||||
// Read field ID (2 bytes, little-endian)
|
||||
var id = BinaryPrimitives.ReadUInt16LittleEndian(_buffer.Slice(_position, 2));
|
||||
_position += 2;
|
||||
|
||||
// Reverse lookup in dictionary
|
||||
if (!_keys.TryGetValue(id, out var name))
|
||||
throw new InvalidOperationException($"Field ID {id} not in schema");
|
||||
|
||||
return name;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```csharp
|
||||
var keys = new ConcurrentDictionary<ushort, string>();
|
||||
keys[1] = "_id";
|
||||
keys[2] = "name";
|
||||
|
||||
var reader = new BsonSpanReader(bsonData, keys);
|
||||
reader.ReadDocumentSize();
|
||||
|
||||
while (reader.Remaining > 0)
|
||||
{
|
||||
var type = reader.ReadBsonType();
|
||||
if (type == BsonType.EndOfDocument) break;
|
||||
|
||||
var fieldName = reader.ReadElementHeader(); // Returns "name" from ID
|
||||
// ... read value based on type
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Nested Documents
|
||||
|
||||
Nested documents recursively use the same C-BSON format with their own field mappings:
|
||||
|
||||
```csharp
|
||||
public class User
|
||||
{
|
||||
public ObjectId Id { get; set; }
|
||||
public Address HomeAddress { get; set; } // Nested
|
||||
}
|
||||
|
||||
public class Address
|
||||
{
|
||||
public string Street { get; set; }
|
||||
public string City { get; set; }
|
||||
}
|
||||
|
||||
// Schema:
|
||||
// User fields: 1="_id", 2="home_address"
|
||||
// Address fields: 3="street", 4="city"
|
||||
```
|
||||
|
||||
**Wire format for nested document:**
|
||||
```
|
||||
[0x03: Document]["home_address": 2]
|
||||
[nested_doc_size: 4 bytes]
|
||||
[0x02: String]["street": 3][value]
|
||||
[0x02: String]["city": 4][value]
|
||||
[0x00: End]
|
||||
```
|
||||
|
||||
### Arrays
|
||||
|
||||
Arrays use numeric indices as field names, still compressed to 2-byte IDs:
|
||||
|
||||
```csharp
|
||||
public class User
|
||||
{
|
||||
public string[] Tags { get; set; }
|
||||
}
|
||||
|
||||
// Schema includes numeric keys:
|
||||
// "0" → 5, "1" → 6, "2" → 7, ...
|
||||
```
|
||||
|
||||
**Wire format:**
|
||||
```
|
||||
[0x04: Array]["tags": 2]
|
||||
[array_size: 4 bytes]
|
||||
[0x02: String]["0": 5]["design"]
|
||||
[0x02: String]["1": 6]["dotnet"]
|
||||
[0x00: End]
|
||||
```
|
||||
|
||||
### Geospatial Coordinates
|
||||
|
||||
C-BSON supports zero-allocation coordinate tuples via `[Column(TypeName="geopoint")]`:
|
||||
|
||||
```csharp
|
||||
[Column(TypeName = "geopoint")]
|
||||
public (double Lat, double Lon) Location { get; set; }
|
||||
```
|
||||
|
||||
**Wire format:**
|
||||
```
|
||||
[0x04: Array]["location": field_id]
|
||||
[array_size: 4 bytes]
|
||||
[0x01: Double]["0": coord_0_id][8 bytes: latitude]
|
||||
[0x01: Double]["1": coord_1_id][8 bytes: longitude]
|
||||
[0x00: End]
|
||||
```
|
||||
|
||||
This maps directly to R-Tree index structures without deserialization overhead.
|
||||
|
||||
---
|
||||
|
||||
## Performance Benefits
|
||||
|
||||
### Storage Efficiency
|
||||
|
||||
**Real-world example:** E-commerce product catalog
|
||||
|
||||
```csharp
|
||||
public class Product
|
||||
{
|
||||
public ObjectId Id { get; set; } // "_id": 4 → 2 bytes
|
||||
public string Name { get; set; } // "name": 5 → 2 bytes
|
||||
public decimal Price { get; set; } // "price": 6 → 2 bytes
|
||||
public string Description { get; set; } // "description": 12 → 2 bytes
|
||||
public string Category { get; set; } // "category": 9 → 2 bytes
|
||||
public string[] Tags { get; set; } // "tags": 5 → 2 bytes
|
||||
public DateTime CreatedAt { get; set; } // "created_at": 11 → 2 bytes
|
||||
public DateTime UpdatedAt { get; set; } // "updated_at": 11 → 2 bytes
|
||||
}
|
||||
```
|
||||
|
||||
**Field name overhead:**
|
||||
- Standard BSON: 4+5+6+12+9+5+11+11 = **63 bytes**
|
||||
- C-BSON: 2×8 = **16 bytes**
|
||||
- **Savings: 47 bytes per document**
|
||||
|
||||
For 1 million products: **~45 MB saved** in field names alone.
|
||||
|
||||
### CPU Cache Efficiency
|
||||
|
||||
Smaller documents mean:
|
||||
- **More documents fit in L1/L2/L3 cache**
|
||||
- **Fewer cache misses during sequential scans**
|
||||
- **Better prefetching** for range queries
|
||||
|
||||
### I/O Reduction
|
||||
|
||||
**Disk I/O:**
|
||||
- 16KB page holds **more documents** → fewer page reads
|
||||
- **Faster bulk inserts** → less data to write
|
||||
- **Faster bulk reads** → less data to transfer from disk
|
||||
|
||||
**Network (future):**
|
||||
- Smaller wire transfer for client/server scenarios
|
||||
- Better replication throughput
|
||||
|
||||
---
|
||||
|
||||
## Hex Dump Examples
|
||||
|
||||
### Example 1: Simple User Document
|
||||
|
||||
**C# Object:**
|
||||
```csharp
|
||||
var user = new User
|
||||
{
|
||||
Id = new ObjectId("65d3c2a1f4b8e9a2c3d4e5f6"),
|
||||
Name = "Alice",
|
||||
Age = 30
|
||||
};
|
||||
```
|
||||
|
||||
**C-BSON Wire Format (hex):**
|
||||
```
|
||||
20 00 00 00 // Document size: 32 bytes
|
||||
07 01 00 // ObjectId, field 1 (_id)
|
||||
65 d3 c2 a1 f4 b8 e9 a2 // ObjectId bytes (12 total)
|
||||
c3 d4 e5 f6
|
||||
02 02 00 // String, field 2 (name)
|
||||
06 00 00 00 // String length: 6
|
||||
41 6c 69 63 65 00 // "Alice\0"
|
||||
10 03 00 // Int32, field 3 (age)
|
||||
1e 00 00 00 // Value: 30
|
||||
00 // End of document
|
||||
```
|
||||
|
||||
### Example 2: Standard BSON Comparison
|
||||
|
||||
**Same document in standard BSON:**
|
||||
```
|
||||
2d 00 00 00 // Document size: 45 bytes (+13 bytes)
|
||||
07 5f 69 64 00 // "_id\0" (4 bytes)
|
||||
65 d3 c2 a1 f4 b8 e9 a2
|
||||
c3 d4 e5 f6
|
||||
02 6e 61 6d 65 00 // "name\0" (5 bytes)
|
||||
06 00 00 00
|
||||
41 6c 69 63 65 00
|
||||
10 61 67 65 00 // "age\0" (4 bytes)
|
||||
1e 00 00 00
|
||||
00
|
||||
```
|
||||
|
||||
**Comparison:**
|
||||
- Standard BSON: 45 bytes
|
||||
- C-BSON: 32 bytes
|
||||
- **Reduction: 28% smaller**
|
||||
|
||||
### Example 3: Nested Document
|
||||
|
||||
**C# Object:**
|
||||
```csharp
|
||||
var user = new User
|
||||
{
|
||||
Id = ObjectId.NewObjectId(),
|
||||
Address = new Address
|
||||
{
|
||||
Street = "123 Main St",
|
||||
City = "Springfield"
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
**C-BSON Wire Format (partial, showing nested doc):**
|
||||
```
|
||||
... // document header
|
||||
03 02 00 // Document, field 2 (address)
|
||||
23 00 00 00 // Nested doc size: 35 bytes
|
||||
02 03 00 // String, field 3 (street)
|
||||
0c 00 00 00 // Length: 12
|
||||
31 32 33 20 4d 61 69 6e // "123 Main St\0"
|
||||
20 53 74 00
|
||||
02 04 00 // String, field 4 (city)
|
||||
0c 00 00 00 // Length: 12
|
||||
53 70 72 69 6e 67 66 69 // "Springfield\0"
|
||||
65 6c 64 00
|
||||
00 // End of nested doc
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Technical Constraints
|
||||
|
||||
### Field ID Space
|
||||
|
||||
- **Type:** `ushort` (16-bit unsigned integer)
|
||||
- **Range:** 0 to 65,535
|
||||
- **Theoretical max:** 65,535 unique field names per schema hierarchy
|
||||
- **Practical limit:** ~1,000 fields for optimal performance
|
||||
- **Reserved IDs:** 0 is reserved (not used)
|
||||
|
||||
### Dictionary Overhead
|
||||
|
||||
**Memory footprint:**
|
||||
- ~16 bytes per entry in `ConcurrentDictionary<string, ushort>`
|
||||
- ~16 bytes per entry in `ConcurrentDictionary<ushort, string>`
|
||||
- **Total:** ~32 bytes per unique field name
|
||||
|
||||
**Example:** A schema with 50 fields → **~1.6 KB** in-memory overhead (negligible).
|
||||
|
||||
### Schema Versioning
|
||||
|
||||
When a schema evolves (fields added/removed/renamed):
|
||||
|
||||
1. **New schema version** is created with incremented version number
|
||||
2. **New field IDs** are assigned to new fields
|
||||
3. **Old documents remain readable** with old schema
|
||||
4. **Migration** can be applied lazily during read-modify-write cycles
|
||||
|
||||
**Schema hash** ensures consistency:
|
||||
```csharp
|
||||
long schemaHash = schema.GetHash(); // Hash of all field names and types
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compatibility
|
||||
|
||||
### BSON Type Compatibility
|
||||
|
||||
C-BSON is **type-compatible** with standard BSON:
|
||||
- ✅ Same type codes (0x01-0x13)
|
||||
- ✅ Same value encoding (little-endian, IEEE 754, UTF-8)
|
||||
- ✅ Same document structure (size prefix + elements + 0x00 terminator)
|
||||
- ❌ **Different element header format** (field ID vs. field name)
|
||||
|
||||
### Migration from Standard BSON
|
||||
|
||||
**Strategy:**
|
||||
1. Read standard BSON document
|
||||
2. Extract field names and build schema
|
||||
3. Assign field IDs based on schema
|
||||
4. Re-serialize as C-BSON
|
||||
|
||||
**Future enhancement:** Hybrid reader capable of auto-detecting and reading both formats.
|
||||
|
||||
### Export to Standard BSON
|
||||
|
||||
For external tool compatibility (e.g., MongoDB Compass, Studio 3T):
|
||||
|
||||
```csharp
|
||||
// Convert C-BSON → Standard BSON
|
||||
public byte[] ToStandardBson(byte[] cbson, BsonSchema schema)
|
||||
{
|
||||
var reader = new BsonSpanReader(cbson, schema.GetReverseKeyMap());
|
||||
var writer = new StandardBsonWriter(); // Uses string field names
|
||||
|
||||
// Copy document element-by-element
|
||||
while (...)
|
||||
{
|
||||
var type = reader.ReadBsonType();
|
||||
var fieldName = reader.ReadElementHeader(); // ID → Name
|
||||
writer.WriteElementHeader(type, fieldName); // Write name directly
|
||||
// ... copy value
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema Evolution Strategies
|
||||
|
||||
### Adding Fields
|
||||
|
||||
**Backward compatible:** New fields get new IDs, old documents remain valid.
|
||||
|
||||
```csharp
|
||||
// Version 1: User schema
|
||||
// 1: "_id", 2: "name", 3: "email"
|
||||
|
||||
// Version 2: Add "phone"
|
||||
// 1: "_id", 2: "name", 3: "email", 4: "phone"
|
||||
```
|
||||
|
||||
Old documents:
|
||||
- Missing field 4 → treated as `null` or default value
|
||||
- No re-serialization required
|
||||
|
||||
### Removing Fields
|
||||
|
||||
**Forward compatible:** Removed field IDs are marked as deprecated.
|
||||
|
||||
```csharp
|
||||
// Version 3: Remove "email" (field 3)
|
||||
// Mark field 3 as deprecated in schema
|
||||
```
|
||||
|
||||
New code:
|
||||
- Ignores field 3 during deserialization
|
||||
- Old documents with field 3 remain valid (data is skipped)
|
||||
|
||||
### Renaming Fields
|
||||
|
||||
**Breaking change:** Requires migration.
|
||||
|
||||
```csharp
|
||||
// Version 4: Rename "phone" → "mobile_phone"
|
||||
|
||||
// Option 1: Lazy migration on read
|
||||
if (doc.ContainsKey("phone"))
|
||||
{
|
||||
doc["mobile_phone"] = doc["phone"];
|
||||
doc.Remove("phone");
|
||||
UpdateDocument(doc);
|
||||
}
|
||||
|
||||
// Option 2: Batch migration script
|
||||
foreach (var doc in collection.FindAll())
|
||||
{
|
||||
if (doc.ContainsKey("phone"))
|
||||
{
|
||||
doc["mobile_phone"] = doc["phone"];
|
||||
doc.Remove("phone");
|
||||
collection.Update(doc);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### 1. Adaptive Key Width
|
||||
|
||||
Use **1 byte for field IDs** when schema has <256 fields:
|
||||
|
||||
```
|
||||
Small schema flag: [1 bit in document header]
|
||||
If set: field IDs are 1 byte (0-255)
|
||||
Else: field IDs are 2 bytes (0-65535)
|
||||
```
|
||||
|
||||
**Potential savings:** Additional 1 byte per field for small schemas.
|
||||
|
||||
### 2. Delta Compression
|
||||
|
||||
Store only **changed fields** in updates:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────┐
|
||||
│ [Base Document ID] │
|
||||
│ [Changed Field IDs bitmap] │
|
||||
│ [Changed Field Values] │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3. Column-Oriented Storage
|
||||
|
||||
Separate storage for each field:
|
||||
|
||||
```
|
||||
Field 1 file: [all _id values]
|
||||
Field 2 file: [all name values]
|
||||
Field 3 file: [all email values]
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- **Faster analytics** (read only needed columns)
|
||||
- **Better compression** (similar data together)
|
||||
- **Efficient projections** (SELECT name, email FROM ...)
|
||||
|
||||
### 4. Hybrid Format Support
|
||||
|
||||
Reader auto-detects C-BSON vs. Standard BSON:
|
||||
|
||||
```csharp
|
||||
// Magic byte detection
|
||||
if (firstElement[2] < 0x7F) // Likely field ID (< 127)
|
||||
return ReadCBSON();
|
||||
else
|
||||
return ReadStandardBSON();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
C-BSON achieves **significant storage and performance improvements** while maintaining BSON's type system and flexibility:
|
||||
|
||||
- **30-60% smaller documents** via key compression
|
||||
- **Zero-allocation** I/O with `Span<byte>`
|
||||
- **Full BSON type compatibility**
|
||||
- **Schema-based** for type safety and evolution
|
||||
|
||||
This format is the foundation of CBDD's high-performance embedded database engine, enabling millions of documents to fit in memory and cache while minimizing disk I/O.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [BSON Specification v1.1](http://bsonspec.org/)
|
||||
- [MongoDB BSON Types](https://www.mongodb.com/docs/manual/reference/bson-types/)
|
||||
- [IEEE 754 Floating Point Standard](https://standards.ieee.org/standard/754-2019.html)
|
||||
- [UTF-8 Encoding (RFC 3629)](https://tools.ietf.org/html/rfc3629)
|
||||
Reference in New Issue
Block a user