Very good idea. I mean there are frameworks for programmers to do exaclty that, like LangChain. But I also end up doing this manually. I use Kobold.cpp and most of the times I just switch it to Story mode and I get one lage notebook / text area. I’ll put in the questions, prompts, special tokens if it’s an instruct-tuned variant and start the bullet point list for it. Or click on generate after I’ve already typed in the chapter names or a table of contents. Or opened the code block with the proper markdown. So pretty much like what you lined out. It’s super useful to guide the LLM into the proper direction. Or steer it back on track with a small edit in its output, and a subsequent call to generate from there.
Kobold.cpp is using llama.cpp under the hoods. It just adds a few extras and a webserver and an user interface. Plus some backwards compatibility for older model file formats, and it’s relatively easy to install. But the project builds upon llama.cpp and uses that same code for inference.