dc9c0c950c
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.
External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths
Also fixes two tests that were not rename-related but became visible
while validating the rename:
- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
gateway service correctly maps to RpcException(Cancelled) per gRPC
convention was being misclassified as a stream fault. Added a sibling
catch on RpcException with StatusCode.Cancelled.
- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
and made it accept either a .git marker OR a .sln/.slnx next to src/
so the worker-exe walker works in non-git working copies.
clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.
Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
Tests: 472/472 pass
Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
IntegrationTests: 18/18 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
382 lines
13 KiB
Markdown
382 lines
13 KiB
Markdown
# Design Decisions
|
|
|
|
This document records current v1 choices for the MXAccess gateway design. These
|
|
decisions can change, but implementation should follow them until a later design
|
|
update says otherwise.
|
|
|
|
## Source References
|
|
|
|
Use these local analysis sources when answering MXAccess-specific design or
|
|
implementation questions:
|
|
|
|
```text
|
|
C:\Users\dohertj2\Desktop\mxaccess
|
|
C:\Users\dohertj2\Desktop\mxaccess\docs\MXAccess-Public-API.md
|
|
C:\Users\dohertj2\Desktop\mxaccess\docs\MXAccess-Reverse-Engineering.md
|
|
```
|
|
|
|
Use these local notes for Galaxy Repository SQL metadata:
|
|
|
|
```text
|
|
C:\Users\dohertj2\Desktop\lmxopcua\gr
|
|
```
|
|
|
|
## MXAccess COM Target
|
|
|
|
Decision: target the installed MXAccess COM interop surface directly from the
|
|
x86 worker.
|
|
|
|
Concrete COM details from the MXAccess analysis:
|
|
|
|
- Interop assembly:
|
|
`C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
|
|
- Assembly identity:
|
|
`ArchestrA.MxAccess, Version=3.2.0.0, PublicKeyToken=23106a86e706d0ae`
|
|
- COM class:
|
|
`ArchestrA.MxAccess.LMXProxyServerClass`
|
|
- CLSID:
|
|
`{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
|
- ProgID:
|
|
`LMXProxy.LMXProxyServer.1`
|
|
- Version-independent ProgID:
|
|
`LMXProxy.LMXProxyServer`
|
|
- Registered server:
|
|
`C:\Program Files (x86)\ArchestrA\Framework\Bin\LmxProxy.dll`
|
|
- Registry view:
|
|
`HKCR\Wow6432Node\CLSID\{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
|
- Threading model:
|
|
`Apartment`
|
|
|
|
Rationale: `LMXProxyServer` is a 32-bit in-process COM server, so a .NET 10 x64
|
|
gateway cannot instantiate it directly. The x86 sidecar worker is the reliable
|
|
parity path.
|
|
|
|
Implementation guidance:
|
|
|
|
- Worker should reference `ArchestrA.MXAccess.dll`.
|
|
- Worker should instantiate `new LMXProxyServerClass()` on the dedicated STA.
|
|
- Worker should expose the resolved class, ProgID, CLSID, interop assembly
|
|
version, and `LmxProxy.dll` path through `GetWorkerInfo` / `WorkerReady`.
|
|
- Keep the ProgID/path configurable for diagnostics, but the default should be
|
|
the installed MXAccess class above.
|
|
|
|
## Session Reconnect
|
|
|
|
Decision: no reconnectable sessions for v1.
|
|
|
|
One `OpenSession` creates one gateway session and one worker process. The
|
|
session ends on `CloseSession`, client disconnect policy, lease expiry, worker
|
|
fault, or gateway shutdown.
|
|
|
|
Rationale: reconnectable sessions require event replay, orphan ownership,
|
|
security checks, and more complicated worker lifetime rules. They are not needed
|
|
for the first parity slice.
|
|
|
|
## Event Subscribers
|
|
|
|
Decision: one active `StreamEvents` subscriber per session for v1.
|
|
|
|
A second subscriber should be rejected with a clear session error. Multi-client
|
|
fan-out may be added later with explicit backpressure semantics.
|
|
|
|
Rationale: one subscriber preserves simple event ordering and failure behavior
|
|
while parity is being proven.
|
|
|
|
### Alarms — superseded for the alarm subsystem
|
|
|
|
The single-subscriber rule above no longer applies to alarms. The gateway runs
|
|
an always-on central alarm monitor (`GatewayAlarmMonitor`) that owns one
|
|
gateway-managed worker session, caches the active-alarm set, and fans it out to
|
|
any number of clients through the session-less `StreamAlarms` RPC. Per-session
|
|
alarm auto-subscribe is removed; `AcknowledgeAlarm` is session-less and routes
|
|
through the monitor. Data-side `StreamEvents` remains one subscriber per
|
|
session. Rationale: alarm state is gateway-wide, not session-scoped — every
|
|
client wants the same current set plus updates, and forcing each to own a
|
|
worker would multiply AVEVA polling load for no benefit.
|
|
|
|
## Authentication
|
|
|
|
Decision: API key authentication for the public gateway.
|
|
|
|
API keys are stored in a gateway-owned SQLite database. Store hashed API key
|
|
secrets only; never store raw key material.
|
|
|
|
Recommended client format:
|
|
|
|
```text
|
|
authorization: Bearer mxgw_<key-id>_<secret>
|
|
```
|
|
|
|
Recommended SQLite tables:
|
|
|
|
```sql
|
|
CREATE TABLE api_keys (
|
|
key_id TEXT PRIMARY KEY,
|
|
key_prefix TEXT NOT NULL,
|
|
secret_hash BLOB NOT NULL,
|
|
display_name TEXT NOT NULL,
|
|
scopes TEXT NOT NULL,
|
|
created_utc TEXT NOT NULL,
|
|
last_used_utc TEXT NULL,
|
|
revoked_utc TEXT NULL
|
|
);
|
|
|
|
CREATE TABLE api_key_audit (
|
|
audit_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
key_id TEXT NULL,
|
|
event_type TEXT NOT NULL,
|
|
remote_address TEXT NULL,
|
|
created_utc TEXT NOT NULL,
|
|
details TEXT NULL
|
|
);
|
|
```
|
|
|
|
Recommended scopes:
|
|
|
|
- `session:open`
|
|
- `session:close`
|
|
- `invoke:read`
|
|
- `invoke:write`
|
|
- `invoke:secure`
|
|
- `events:read`
|
|
- `metadata:read`
|
|
- `admin`
|
|
|
|
Hashing recommendation:
|
|
|
|
- Use HMAC-SHA256 with a gateway-local secret/pepper stored outside SQLite, or
|
|
use Argon2id if a suitable dependency is already accepted.
|
|
- Compare hashes using constant-time comparison.
|
|
- Log only the key id or prefix, not the raw key.
|
|
|
|
Storage recommendation:
|
|
|
|
- Default SQLite path should be under `ProgramData` or another configured
|
|
gateway data directory.
|
|
- Apply restrictive filesystem ACLs for the gateway service identity and
|
|
administrators.
|
|
- Require TLS when the gateway is reachable off-machine.
|
|
|
|
## Authorization
|
|
|
|
Decision: start with scope checks by command category.
|
|
|
|
Suggested mapping:
|
|
|
|
- `OpenSession`: `session:open`
|
|
- `CloseSession`: `session:close`
|
|
- `Register`, `Unregister`, `AddItem`, `AddItem2`, `RemoveItem`, `Advise`,
|
|
`UnAdvise`, `AdviseSupervisory`, `AddBufferedItem`,
|
|
`SetBufferedUpdateInterval`, `Suspend`, `Activate`: `invoke:read`
|
|
- `Write`, `Write2`: `invoke:write`
|
|
- `WriteSecured`, `WriteSecured2`, `AuthenticateUser`,
|
|
`ArchestrAUserToId`: `invoke:secure`
|
|
- `StreamEvents`: `events:read`
|
|
- Galaxy SQL metadata endpoints if added: `metadata:read`
|
|
- worker shutdown diagnostics and key management: `admin`
|
|
|
|
## Worker Process Identity
|
|
|
|
Decision: run workers as the gateway service identity for v1.
|
|
|
|
Rationale: this avoids early COM/DCOM permission failures and keeps the first
|
|
implementation focused on MXAccess parity. The worker launcher should keep an
|
|
extension point for a restricted service account later.
|
|
|
|
## Event Backpressure
|
|
|
|
Decision: fail-fast bounded queues for v1 and parity testing.
|
|
|
|
If worker or gateway event queues fill, fault the session. Do not silently drop
|
|
or coalesce events in parity mode.
|
|
|
|
Rationale: event drops would hide parity defects. Production coalescing by item
|
|
handle can be added later as an explicit opt-in mode once event rates are
|
|
measured.
|
|
|
|
## Event-Rate Target
|
|
|
|
Decision: do not set a production event-rate target before measurement.
|
|
|
|
For v1, expose queue depth, event rate, stream send latency, and overflow
|
|
metrics. Keep bounded queues and fail-fast behavior. Use observed load from live
|
|
systems to set a later coalescing or scaling target.
|
|
|
|
## Command Batching
|
|
|
|
Decision: no public command batching for v1.
|
|
|
|
Use one command per request so replies, HRESULTs, status arrays, event ordering,
|
|
and failure behavior are easy to compare against direct MXAccess.
|
|
|
|
Batch tag registration can be added later if measured setup latency requires it.
|
|
|
|
## Bulk Command Family
|
|
|
|
Decision: the gateway exposes a fixed set of *bulk* command kinds —
|
|
`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`,
|
|
`SubscribeBulk`, `UnsubscribeBulk`, `WriteBulk`, `Write2Bulk`,
|
|
`WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk` — that carry a list of
|
|
entries in one round-trip and return one per-entry result. Each command kind
|
|
runs the corresponding single-item MXAccess COM call sequentially on the
|
|
worker STA; per-entry failures populate `was_successful = false` with the
|
|
underlying HRESULT and never throw. There is no transactional / fail-fast
|
|
semantic — bulk here means "one round-trip, per-entry results", not
|
|
"atomic".
|
|
|
|
Rationale: MXAccess COM itself has no native bulk API for any of these
|
|
operations. Surfacing the per-entry result list keeps parity transparent —
|
|
the caller sees the same per-item HRESULT they would see calling MXAccess
|
|
N times directly — while the bulk shape collapses the gateway/IPC overhead
|
|
to one round-trip per batch and lets the worker keep the STA hot.
|
|
|
|
`ReadBulk` is the only bulk command without a 1:1 MXAccess analogue. Two
|
|
choices were considered:
|
|
|
|
1. **Cache-then-snapshot** (chosen): when a requested tag is already in the
|
|
session's item registry AND advised, the worker returns the last cached
|
|
`OnDataChange` value without touching the subscription
|
|
(`was_cached = true`). Otherwise it takes the full `AddItem + Advise +
|
|
wait-for-first-OnDataChange + UnAdvise + RemoveItem` lifecycle itself
|
|
(`was_cached = false`) and leaves the session exactly as it was before
|
|
the call. The cache lives on a per-session `MxAccessValueCache`,
|
|
populated by `MxAccessBaseEventSink` on every `OnDataChange` after the
|
|
event clears the outbound queue.
|
|
|
|
2. **Always-snapshot**: take the AddItem-through-RemoveItem lifecycle for
|
|
every requested tag. Cleaner conceptually but pays the full lifecycle
|
|
cost on every call and would interfere with existing subscriptions if
|
|
MXAccess reuses item handles.
|
|
|
|
The chosen behavior matches what callers actually want from "current
|
|
value" — a free read of an already-streaming tag, and a one-shot snapshot
|
|
otherwise — and never disturbs subscriptions the caller did not create.
|
|
The decision intentionally does NOT synthesize an `OnDataChange` event
|
|
from the snapshot path: the snapshot value reaches the caller through
|
|
`ReadBulk`'s reply payload only, not through the event stream. This
|
|
preserves the "Don't synthesize events" rule that scopes the rest of the
|
|
worker.
|
|
|
|
`ReadBulk`'s wait loop pumps Windows messages on the worker STA
|
|
(`StaRuntime.PumpPendingMessages`) on every poll iteration so the inbound
|
|
MXAccess COM event can dispatch while the bulk executor still holds the
|
|
thread — without the pump the OnDataChange would never deliver.
|
|
|
|
## Graceful Worker Shutdown
|
|
|
|
Decision: best-effort cleanup before COM release.
|
|
|
|
During graceful shutdown, the worker should attempt:
|
|
|
|
1. `UnAdvise` for advised items.
|
|
2. `RemoveItem` for active item handles.
|
|
3. `Unregister` for active server handles.
|
|
4. Event detach.
|
|
5. COM release.
|
|
|
|
Failures during cleanup should be logged and preserved diagnostically, but the
|
|
gateway may still kill the worker after shutdown timeout.
|
|
|
|
## OperationComplete
|
|
|
|
Decision: model and forward `OperationComplete` only when native MXAccess fires
|
|
it. Do not synthesize `OperationComplete` from writes, command replies, ASB
|
|
completion queues, or other status frames.
|
|
|
|
Rationale: the event signature is known, but the MXAccess analysis has not yet
|
|
captured the runtime condition that triggers the public event. Synthesizing it
|
|
would risk breaking parity.
|
|
|
|
## Buffered Data Change
|
|
|
|
Decision: include `OnBufferedDataChange` in the protocol and worker event
|
|
model, but treat multi-sample payload conversion as capture-validated work.
|
|
|
|
The event signature and native path are known. A live buffered sample batch has
|
|
not yet been observed. Until then, preserve raw value, quality, timestamp, data
|
|
type, and status metadata whenever conversion is incomplete.
|
|
|
|
## Completion-Only Status Mapping
|
|
|
|
Decision: preserve completion-only operation-status bytes as raw diagnostic
|
|
metadata unless native MXAccess raises a public event or the MXAccess analysis
|
|
proves an exact `MXSTATUS_PROXY[]` mapping.
|
|
|
|
Do not guess status category/source/detail values for frames that MXAccess does
|
|
not expose through its public COM events.
|
|
|
|
## API Key Administration
|
|
|
|
Decision: v1 API key management is a local administrative CLI/tool, not a
|
|
public admin API.
|
|
|
|
The tool should support:
|
|
|
|
- initialize auth database,
|
|
- create key,
|
|
- list keys without showing secrets,
|
|
- revoke key,
|
|
- rotate key,
|
|
- print the raw secret exactly once at creation.
|
|
|
|
Public gRPC key-management endpoints can be added later only behind `admin`
|
|
scope and TLS.
|
|
|
|
## SQLite Migrations
|
|
|
|
Decision: use simple startup migrations with a `schema_version` table.
|
|
|
|
Recommended table:
|
|
|
|
```sql
|
|
CREATE TABLE schema_version (
|
|
id INTEGER PRIMARY KEY CHECK (id = 1),
|
|
version INTEGER NOT NULL,
|
|
applied_utc TEXT NOT NULL
|
|
);
|
|
```
|
|
|
|
Migrations should be idempotent, run inside transactions, and fail gateway
|
|
startup if the database is newer than the running binary understands.
|
|
|
|
## Web Dashboard
|
|
|
|
Decision: host a basic gateway dashboard with Blazor Server and Bootstrap
|
|
CSS/JS.
|
|
|
|
The dashboard should show gateway health, active sessions, worker instances,
|
|
basic metrics, queue depths, and recent faults. It should update in real time
|
|
through Blazor Server component updates.
|
|
|
|
Allowed UI stack:
|
|
|
|
- Blazor Server,
|
|
- Bootstrap CSS,
|
|
- Bootstrap JavaScript,
|
|
- small local CSS.
|
|
|
|
Do not use MudBlazor or other Blazor UI component libraries for v1.
|
|
|
|
Dashboard access should require API-key-backed dashboard authentication with
|
|
`admin` scope when enabled. For local development, anonymous localhost access
|
|
is enabled by default through `Dashboard:AllowAnonymousLocalhost`; the bypass is
|
|
limited to loopback requests.
|
|
|
|
## Later Revisit Items
|
|
|
|
These are explicit post-v1 revisit items, not open blockers:
|
|
|
|
- reconnectable sessions,
|
|
- multiple event subscribers per session,
|
|
- restricted worker service account,
|
|
- production coalescing by item handle,
|
|
- command batching for high-volume tag setup.
|
|
|
|
## Related Documentation
|
|
|
|
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
|
|
- [MXAccess Worker Instance Detailed Design](./MxAccessWorkerInstanceDesign.md)
|
|
- [Authentication](./Authentication.md)
|
|
- [Authorization](./Authorization.md)
|
|
- [Galaxy Repository](./GalaxyRepository.md)
|