Mailing List Archive

[master] 6e616f98d Chapter 5 & 6 in the CHERI saga
commit 6e616f98d83cced716fd300388fca86a00f24076
Author: Poul-Henning Kamp <phk@FreeBSD.org>
Date: Wed Nov 30 14:38:34 2022 +0000

Chapter 5 & 6 in the CHERI saga

diff --git a/doc/sphinx/phk/cheri5.rst b/doc/sphinx/phk/cheri5.rst
new file mode 100644
index 000000000..762d72c0a
--- /dev/null
+++ b/doc/sphinx/phk/cheri5.rst
@@ -0,0 +1,197 @@
+.. _phk_cheri_5:
+
+How Varnish met CHERI 5/N
+=========================
+
+Varnish Workspaces
+------------------
+
+To process a HTTP request or response, varnish must allocate bits
+of memory which will only be used for the duration of that processing,
+and all of it can be released back at the same time.
+
+To avoid calling ``malloc(3)`` a lot, which comes with a locking
+overhead in a heavily multithreaded process, but even more to
+avoid having to keep track of all these allocations in order to be able
+to ``free(3)`` them all, varnish has "workspaces":
+
+.. code-block:: none
+
+ struct ws {
+ […]
+ char *s; /* (S)tart of buffer */
+ char *f; /* (F)ree/front pointer */
+ char *r; /* (R)eserved length */
+ char *e; /* (E)nd of buffer */
+ };
+
+The ``s`` pointer points at the start of a slab of memory, owned
+exclusively by the current thread and ``e`` points to the end.
+
+Initially ``f`` is the same as ``s``, but as allocations are made
+from the workspace, it moves towards ``e``. The ``r`` pointer is
+used to make "reservations", we will ignore that for now.
+
+Workspaces look easy to create:
+
+.. code-block:: none
+
+ ws->s = space;
+ ws->e = ws->s + len;
+ ws->f = ws->s;
+ ws->r = NULL;
+
+… only, given the foot-shooting-abetting nature of the C language,
+we have bolted on a lot of seat-belts:
+
+.. code-block:: none
+
+ #define WS_ID_SIZE 4
+
+ struct ws {
+ unsigned magic;
+ #define WS_MAGIC 0x35fac554
+ char id[WS_ID_SIZE]; /* identity */
+ char *s; /* (S)tart of buffer */
+ char *f; /* (F)ree/front pointer */
+ char *r; /* (R)eserved length */
+ char *e; /* (E)nd of buffer */
+ };
+
+ void
+ WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
+ {
+ unsigned l;
+
+ DSLb(DBG_WORKSPACE,
+ "WS_Init(%s, %p, %p, %u)", id, ws, space, len);
+ assert(space != NULL);
+ assert(PAOK(space));
+ INIT_OBJ(ws, WS_MAGIC);
+ ws->s = space;
+ l = PRNDDN(len - 1);
+ ws->e = ws->s + l;
+ memset(ws->e, WS_REDZONE_END, len - l);
+ ws->f = ws->s;
+ assert(id[0] & 0x20); // cheesy islower()
+ bstrcpy(ws->id, id);
+ WS_Assert(ws);
+ }
+
+Let me walk you through that:
+
+The ``DSLb()`` call can be used to trace all operations on the
+workspace, so we can see what actually goes on.
+
+(Hint: Your ``malloc(3)`` may have something similar,
+look for ``utrace`` in the manual page.)
+
+Next we check the provided space pointer is not NULL, and
+that it is properly aligned, these are both following
+a varnish style-pattern, to sprinkle asserts liberally,
+both as code documentation, but also because it allows
+the compiler to optimize things better.
+
+The ``INIT_OBJ() and ``magic`` field is a style-pattern
+we use throughout varnish: Each structure is tagged with
+a unique magic, which can be used to ensure that pointers
+are what we are told, when they get passed through a ``void*``.
+
+We set the ``s`` pointer.
+
+We calculate a length at least one byte shorter than what
+we were provided, align it, and point ``e`` at that.
+
+We fill that extraspace at and past ``e``, with a "canary" to
+stochastically detect overruns. It catches most but not
+all overruns.
+
+We set the name of the workspace, ensuring it is not already
+marked as overflowed.
+
+And finally check that the resulting workspace complies with
+the defined invariants, as captured in the ``WS_Assert()``
+function.
+
+With CHERI, it looks like this:
+
+.. code-block:: none
+
+ void
+ WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
+ {
+ unsigned l;
+
+ DSLb(DBG_WORKSPACE,
+ "WS_Init(%s, %p, %p, %u)", id, ws, space, len);
+ assert(space != NULL);
+ INIT_OBJ(ws, WS_MAGIC);
+ assert(PAOK(space));
+ ws->s = cheri_bounds_set(space, len);
+ ws->e = ws->s + len
+ ws->f = ws->s;
+ assert(id[0] & 0x20); // cheesy islower()
+ bstrcpy(ws->id, id);
+ WS_Assert(ws);
+ }
+
+All the gunk to implement a canary to detect overruns went
+away, because with CHERI we can restrict the ``s`` pointer so writing
+outside the workspace is *by definition* impossible, as long as your
+pointer is derived from ``s``.
+
+Less memory wasted, much stronger check and more readable source-code,
+what's not to like ?
+
+When an allocation is made from the workspace, CHERI makes it possible
+to restrict the returned pointer to just the allocated space:
+
+.. code-block:: none
+
+ void *
+ WS_Alloc(struct ws *ws, unsigned bytes)
+ {
+ char *r;
+
+ […]
+ r = ws->f;
+ ws->f += bytes;
+ return(cheri_bounds_set(r, bytes));
+ }
+
+Varnish String Buffers
+----------------------
+
+Back in the mists of time, Dag-Erling Smørgrav and I designed a
+safe string API called ``sbuf`` for the FreeBSD kernel.
+
+The basic idea is you set up your buffer, you call functions to stuff
+text into it, and those functions do all the hard work to ensure
+you do not overrun the buffer. When the string is complete, you
+call a function to "finish" the buffer, and if returns a flag which
+tells you if overrun (or other problems) happened, and then you can
+get a pointer to the resulting string from another function.
+
+Varnish has adopted sbuf's under the name ``vsb``. This should
+really not surprise anybody: Dag-Erling was also involved
+in the birth of varnish.
+
+It should be obvious that internally ``vsb`` almost always operate
+on a bigger buffer than the result, so this is another obvious
+place to have CHERI cut a pointer down to size:
+
+.. code-block:: none
+
+ char *
+ VSB_data(const struct vsb *s)
+ {
+
+ assert_VSB_integrity(s);
+ assert_VSB_state(s, VSB_FINISHED);
+
+ return (cheri_bounds_set(s->s_buf, s->s_len + 1));
+ }
+
+Still no bugs though.
+
+*/phk*
diff --git a/doc/sphinx/phk/cheri6.rst b/doc/sphinx/phk/cheri6.rst
new file mode 100644
index 000000000..adf5d83cb
--- /dev/null
+++ b/doc/sphinx/phk/cheri6.rst
@@ -0,0 +1,179 @@
+.. _phk_cheri_6:
+
+How Varnish met CHERI 6/N
+=========================
+
+Varnish Socket Addresses
+------------------------
+
+Socket addresses are a bit of a mess, in particular because nobody
+dared shake up all the IPv4 legacy code when IPv6 was introduced.
+
+In varnish we encapsulate all that uglyness in a ``struct suckaddr``,
+so named because it sucks that we have to spend time and code on this.
+
+In a case like this, it makes sense to make the internals strictly
+read-only, to ensure nobody gets sneaky ideas:
+
+.. code-block:: none
+
+ struct suckaddr *
+ VSA_Build(void *d, const void *s, unsigned sal)
+ {
+ struct suckaddr *sua;
+
+ [… lots of ugly stuff …]
+
+ return (RO(sua));
+ }
+
+It would then seem logical to use C's ``const`` to signal this fact,
+but since the current VSA api is currently defined such that users
+call ``free(3)`` on the suckaddrs when they are done with them, that does
+not work, because the prototype for ``free(3)`` is:
+
+.. code-block:: none
+
+ void free(void *ptr);
+
+So you cannot call it with a ``const`` pointer.
+
+All hail the ISO-C standards committee!
+
+This brings me to a soft point with CHERI: Allocators.
+
+How to free things with CHERI
+-----------------------------
+
+A very common design-pattern in encapsulating classes look something
+like this:
+
+.. code-block:: none
+
+ struct real_foo {
+ struct foo foo;
+ [some metadata about foo]
+ };
+
+ const struct foo *
+ Make_Foo([arguments])
+ {
+ struct real_foo *rf;
+
+ rf = calloc(1, sizeof *rf);
+ if (rf == NULL)
+ return (rf);
+ [fill in rf]
+ return (&rf->foo);
+ }
+
+ void
+ Free_Foo(const struct foo **ptr)
+ {
+ const struct foo *fp;
+ struct real_foo *rfp;
+
+ assert(ptr != NULL);
+ fp = *ptr;
+ assert(fp != NULL);
+ *ptr = NULL;
+
+ rfp = (struct real_foo*)((uintptr_t)fp);
+ [clean stuff up]
+ }
+
+We pass a ``**ptr`` to ``Free_Foo()``, another varnish style-pattern,
+so we can NULL out the holding variable in the calling function,
+to avoid a dangling pointer to the now destroyed object from
+causing any kind of trouble later.
+
+In the calling function this looks like:
+
+.. code-block:: none
+
+ const struct foo *foo_ptr;
+ […]
+ Free_Foo(&foo_ptr);
+
+If we use CHERI to make the foo truly ``const`` for the users of
+the API, we cannot, as above, wash the ``const`` away with a trip through
+``uintptr_t`` and then write to the metadata.
+
+The CHERI C/C++ manual, a terse academic tome, laconically mention that:
+
+*»Therefore, some additional work may be required to derive
+a pointer to the allocation’s metadata via another global capability,
+rather than the one that has been passed to free().«*
+
+Despite the understatement, I am very much in favour of this, because
+this is *precisely* why my own
+`phkmalloc <https://papers.freebsd.org/1998/phk-malloc/>`_
+became a big hit twenty years ago: By firmly separating the metadata
+from the allocated space, several new classes of mistakes using the
+``malloc(3)`` API could, and was, detected.
+
+But this *is* going to be an embuggerance for CHERI users, because
+with CHERI getting from one pointer to different one is actual work.
+
+The only "proper" solution is to build some kind of datastructure:
+List, tree, hash, DB2 database, pick any poison you prefer, and
+search out the metadata pointer using the impotent pointer as key.
+Given that CHERI pointers are huge, it may be a better idea to embed
+a numeric index in the object and use that the key,
+
+An important benefit of this »additional work« is that if your
+free-function get passed a pointer to something else, you will
+find out, because it is not in your data-structure.
+
+It would be a good idea if CHERI came with a quality implementation
+of "Find my other pointer", so that nobody is forced to crack The
+Art of Computer Programming open for this.
+
+When the API is "fire-and-forget" like VSA, in the sense that there
+is no metadata to clean up, we can leave the hard work to the
+``malloc(3)`` implementation.
+
+Ever since ``phkmalloc`` no relevant implementation of ``malloc(3)``
+has dereferenced the freed pointer, in order to find the metadata
+for the allocation. Despite its non-const C prototype ``free(3)``,
+will therefore happily handle a ``const`` or even CHERIed read-only
+pointer.
+
+But you *will* have to scrub the ``const`` off with a ``uintptr_t``
+to satisfy the C-compiler:
+
+.. code-block:: none
+
+ void
+ VSA_free(const struct suckaddr **vsap)
+ {
+ const struct suckaddr *vsa;
+
+ AN(vsap);
+ vsa = *vsap;
+ *vsap = NULL;
+ free((char *)(uintptr_t)vsa);
+ }
+
+Or in varnish style:
+
+.. code-block:: none
+
+ void
+ VSA_free(const struct suckaddr **vsap)
+ {
+ const struct suckaddr *vsa;
+
+ TAKE_OBJ_NOTNULL(vsa, vsap, SUCKADDR_MAGIC);
+ free(TRUST_ME(vsa));
+ }
+
+
+Having been all over this code now, I have decided to constify ``struct
+suckaddr`` in varnish, even without CHERI, it is sounder that way.
+
+It is not bug, but CHERI made it a lot simpler and faster for me
+to explore the consequences of this change, so I will forfeit
+a score of "half a bug" to CHERI at this point.
+
+*/phk*
diff --git a/doc/sphinx/phk/index.rst b/doc/sphinx/phk/index.rst
index 5d33fef4e..4b792547e 100644
--- a/doc/sphinx/phk/index.rst
+++ b/doc/sphinx/phk/index.rst
@@ -13,6 +13,8 @@ You may or may not want to know what Poul-Henning thinks.
.. toctree::
:maxdepth: 1

+ cheri6.rst
+ cheri5.rst
cheri4.rst
cheri3.rst
cheri2.rst
_______________________________________________
varnish-commit mailing list
varnish-commit@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-commit