Table of content:
Being a person which make notes all the time, when reading pdf books, I was trying to make annotation
We already have several markups out there which are used very heavily, Markdown and XML are most popular ones. XML is data oriented but is ineficient in several ways.
Mardown is a presentional markup, having a lot of what I would want but is not a semantic one. Almost, because for example, the primary limitation of Markdown is that it is linked to the HTML markup, which is far from perfect and serve other purpose now (HTML5) now than representing text. If you check Commonmark specs which is a better specification, it does exactly the same. Unlink HTML from the markup remove a lot of constraints and permit better data representation and links inside a text.
The IDK markup goal is to have both of the two worlds, a markup which balance towards a plain text format and towards data representation.
Being someone which write up eveything I found interesting on my smartphone, computer or paper. I have a very need to be able to make a search for specific ones when I am working or reading on a subject. Sometimes this is directly written by myself but sometimes it come from a source I am reading. In this case I need an untrusive way to quickly pick the text and move on in order to avoid cutting the reading flow as possible.
Text extraction software are clunky or slow or both
Numeric book data extraction:
All of them does not support text highlighting properly: clunky selection, data extraction not well designed or they completely lack it. For example Microsoft Edge drop them recently. The reason why they fail to provide these features is because these software are for different use than that, which made me think that a specialized one is needed (but not present in the market).
Note taking:
They use a subset of Markdown, which is a great markup to specify semantic to a text. But it lack some features I would want.
It is not as readable as it could be for my taste, because they provide visuals and user experience on top it is ok to use it for that, but Markdown is a very bad markup for data representation They go around that by leveraging the software but it still means that all the software is built around it, which make it hard to implement innovative features.
A highly wanted feature would be to be able to highlights chunk of text from a numeric file and add some details to it, which we could link to other notes and be searchable, they don't have it. It means if we want to add the text taken from the source we must write ourselves all the details. It would'nt be the end of the end if we didn't losse a very important piece of data : its context.
Any text written has a specific context, its date of creation telling us the state of the world when it was written, the whole text in which the chunk of text was written and much more.
The data is here somewhat though, when we read an interesting text, some piece of it is more important than others, if we just extract them and leave the rest unsaved, it will miss a lot of the information. Maybe it makes sense when we write it on the fresh day, but read back the text in few years and the context is gone completely. It is not rare to read something you've written and don't have a clue why you wrote it. This happens a lot on quick notes, but still, being able redive on enought depth on the subject is key sometimes.
We can see that they are not the tools for my use case, which means that if I don't create one myself, there is low chance to find later one that will. Because this is very important to me, I can't live without it, I need to build one.
If you want a software
How to fix it:
The markup must provide ways to search and mark text as data, numeric, token, text. All of them should have ways to encode fine details in them.
The markup should be usable at first try, added feature should not break that. If someone new to it is having hard time to understand why the way he wrotes the text fail or having lot of error while writing simple text, there is high probability that the markup should change.
It should have possibilities to encode complex writing and relations between part of texts.
It should keep the basic writing features very easy to read, for comparison it should be a bit more readable than Markdown file format.
Having very detailed logs is not the goal yet, but it must catch error with precise location and avoid stack trace like Java. The parsing should fix as possible as error it can, helping the writer to not much think about his writing.
It should be as a tool which have two personalities, one side will permit very easy writing, the other will be highly capable of transforming the text and represent its meaning.
No compromise should be made on the performance, the parser should be able to parse hundreds of file, checks their links in a very short time scale. The purpose is to be able to use it for computation, for the bonus it will have responsive UX.
Something is right for every file format adoption, it must be easy to implement. To achieve that there is several ways, provide tools easy to use and integrate, be open. The markup should do all of them.
By having a possibily to do it will permit writer to alter the markup for their own needs. The way I see this is having a standard set of rules for the markup, then you permit anyone to add features which will never breaks the standard. Maybe a file that specify markup rules computation addition. Never allow to modify the standard is very important though.
Majority of the IDK possibilities are available everywhere where it make sense. Id est a table's cell must have the formating capacity, footnote reference and more. So keep in mind that if you see a feature listed below there is high probability to be available for your exotic use case.
IDK must produce HTML which is compliant to the specs, the output is checked regularly to see if it does.