Vincent Zimmer's blog

Sunday, March 24, 2024

Sneers, CNAs, licenses, and fuzzing

Let's start off with something I occasionally see in industry, namely 'the grand sneer' mentioned in https://buttondown.email/hillelwayne/archive/know-of-the-right-tool-for-the-job/. I sometimes see the 'sneering' if often a sign of youth or narrow experience or not exploring outside of your domain or https://twitter.com/vincentzimmer/status/1762972464169296002...

The more you know often leads to greater humility borne of realizing how much knowledge there is in the world that you don't know.

Another interesting posting of late was the fact that the Linux kernel is now a CNA https://amanitasecurity.com/posts/dear-linux-kernel-cna-what-have-you-done/ https://news.ycombinator.com/item?id=39627302. I noted that there are similar challenges in other open source infrastructure like https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Security-Issues in https://twitter.com/vincentzimmer/status/1768351312205484380.

Another posting in that thread clicked into the SBOM topic with an advocacy for the VEX format. Some work in this space can be found in https://github.com/hughsie/uefi-sbom-best-practices/blob/main/index.rst, too.

So a lot of these thoughts are borne of experience. Amazon has a famous quote that goes something like "there is no compression algorithm for experience," but I'd have to say things are getting pretty good with LLM's. In fact I am glad that my longer form works were published prior to chatGPT. Maybe the world of text will be bifurcated into BG and PG - "Before GPT" and "After GPT."

I don't subscribe to the dystopian 'paperclip' https://cepr.org/voxeu/columns/ai-and-paperclip-problem style apocalypse of AI but I do admire the foundations upon which these large foundation models are built, namely the sum of human knowledge, or the internet. From the hockey-puck style growth of the net in '97 from the Metacrawler era http://vzimmer.blogspot.com/2021/01/memories-from-uw-and-cornell.html to today's corpus of information on the web, it's truly staggering.

Some examples of oopsies around folks leveraging chatGPT a little too much include https://www.sciencedirect.com/science/article/abs/pii/S2468023024002402 https://simonwillison.net/2024/Mar/15/certainly-here-is-google-scholar/ and https://news.ycombinator.com/item?id=39733605.

Speaking of experience, Subrata made a nice posting https://twitter.com/abarjodi/status/1771948383529247011

namely the "FSP Customization - Remove non-mandatory components in the Intel FSP" for the Open Source Firmware Foundation (OSFC) Byte talks - volume 1, March 8, 2024 https://opensourcefirmware.foundation/events/bytetalks-vol.-1/. The video is now posted at https://www.youtube.com/watch?v=0ciYjPSu56A. This builds on work trying to help the various communities https://www.phoronix.com/news/Google-Intel-More-FSP-Flexible

https://blog.osfw.foundation/breaking-the-boundary-a-way-to-create-your-own-fsp-binary/. In the past, we responded to the concerns about FSP licensing described in https://www.phoronix.com/news/Intel-Better-FSP-License

https://mail.coreboot.org/pipermail/coreboot/2018-August/087220.html

It's hard to 'sneer' when the community is seeing problem statements not necessarily experience in your own environment or workflow.

Sometimes folks don't sneer but ignore. For example the use of SIMICs https://github.com/intel/tsffs for fuzzing firmware mentioned in https://twitter.com/jerry_Intel/status/1762220373503005056 regrettably didn't cite https://ieeexplore.ieee.org/document/9218694 in their blog https://community.intel.com/t5/Blogs/Products-and-Solutions/Security/Chips-Salsa-This-Hardware-Does-Not-Exist/post/1572067. I ordinarily wouldn't call folks out if it weren't for the fact that in an internal presentation of their work I mentioned the preceding development on UEFI SIMICS fuzzing and the ensuing paper to the TSFFS folks, with a response from the TSFFS lead that "Oh yes, we leveraged that work. We were disappointed that you published first so that we couldn't." So at least not a sneer :)

On a more positive note, the team did some great evolution, including extending 'beyond BIOS' use-case, getting it open source, and finally, against many odds within large companies enamored of Python et al these days, evolving the feature to use the Rust language.

And additional props go out to my former software division that delivered TSFF to the open source for their work in evolving HBFA https://github.com/tianocore/tianocore.github.io/wiki/Host-Based-Firmware-Analyzer with their https://github.com/intel/HBFA-FL project. They did a nice job on ack'ing the earlier work, too https://www.intel.com/content/dam/develop/external/us/en/documents/intel-usinghbfatoimproveplatformresiliency-820238.pdf.

Although a lot of the constituent elements like https://github.com/S2E are in the open, I wasn't able to get the symbolic execution work described in https://www.usenix.org/conference/woot15/workshop-program/presentation/bazhaniuk across the open source finish line. The lure of retirement, Amazon, and Eclypsium ended up disbanding that team over time and no new team emerged from the ashes to carry it forward.

Tuesday, March 29, 2022

Synthesize it?

I was happy to see the public SIMICS announcement https://community.intel.com/t5/Blogs/Products-and-Solutions/Software/The-Public-Release-of-Intel-Simics-and-More/post/1372402, including mention of the UEFI boot based upon the QSP work I got started https://github.com/tianocore/edk2-platforms/tree/master/Platform/Intel/SimicsOpenBoardPkg. Also a good time to revisit what it will take to get https://ieeexplore.ieee.org/document/9218694 into SIMICS.

The posting also provided details on the SIMICS Device Modeling Language (DML) https://github.com/intel/device-modeling-language. Now that DML is open perhaps I can explore releasing the DML-to-TSL models from Termite2 https://github.com/termite2/Termite described in the https://www.intel.com/content/dam/www/public/us/en/documents/research/2013-vol17-iss-2-intel-technology-journal.pdf article. At the time of the Intel Technical Journal article we were prohibited from releasing the DML, thus hampering having a public demonstration of the full DML + UEFI device specification to working EDKII source code.

I have to admit that opacity of the information wasn't the biggest problem, though. The real issue was in the readability of the machine-generated code.

Similar to issues with 'certified code' like Certikos https://github.com/npe9/certikos. Coq proof to hard-to-read C code. And even if you can read the code, the idea is to do maintenance on the proof and not the auto-generated C code. seL4 tries to do proof on C code which is more aligned with today's development process. And given works like below were published in 1979, it's apparent that these issues are not trivial to solve

Perhaps the same semantic gaps exists with hardware design language innovation. The Scala based Chisel looks promising, but the SiFive folks mentioned in a Seattle training that their cores are optimized Verilog. This is an instance of the broader question of High Level Synthesis (HLS) to get more efficiency. At some point maybe economics will win. Good enough will prevail in the same fashion that we see the prevalence of Python even though it is wildly inefficient relative to C/C++/Rust compiled languages.

Synthesis from specification is definitely at the other extreme of today's copy-paste-modify approach to software development. I recall pushing the min-core, a stripped set of EDKII packages. The subset of sources helped with cognitive complexity but unless delivered alongside some additional business value, such as unit-test coverage, but the effort was deemed worse than existing code since latter had years of evidence-of-use.

I should add copy-paste-modify development (CPMD) acronym to my other sarcastic takes on Test Driven Development (TDD), such as Promotion Driven Development (PDD), Fear Driven Development (FDD), or....

Regarding such development anti-patterns, I still recall a comment about writing firmware in Rust will be 'too hard.' Seems to be a trade-off of difficulty in initial creation of critical code written in an 'easy' language like C and then mitigate the lack of rigor at compile time with field patching & updates. I look forward to when Rust practices like https://highassurance.rs/ can reach the same level of assurance and provability as ADA Spark.