IDK Markup goals2024/10/132024/10/13From 1970 and the new age of computing until today, text markups emerged naturally. Today just a few of them are used largely. Because computer age is still a young discipline, these markups are not perfect and lack some features. In next years we will surely found better ones that will replace them entirely with better features, IDK is a try to be one of them. "/html/idk/idk_goals.html"idk

IDK Markup goals

Why another markup?

Being a person which make notes all the time, when reading pdf books, I was trying to make annotation

We already have several markups out there which are used very heavily, Markdown and XML are most popular ones. XML is data oriented but is ineficient in several ways.

Mardown is a presentional markup, having a lot of what I would want but is not a semantic one. Almost, because for example, the primary limitation of Markdown is that it is linked to the HTML markup, which is far from perfect and serve other purpose now (HTML5) now than representing text. If you check Commonmark specs which is a better specification, it does exactly the same. Unlink HTML from the markup remove a lot of constraints and permit better data representation and links inside a text.

The IDK markup goal is to have both of the two worlds, a markup which balance towards a plain text format and towards data representation.

How it started

Being someone which write up eveything I found interesting on my smartphone, computer or paper. I have a very need to be able to make a search for specific ones when I am working or reading on a subject. Sometimes this is directly written by myself but sometimes it come from a source I am reading. In this case I need an untrusive way to quickly pick the text and move on in order to avoid cutting the reading flow as possible.

Text extraction software are clunky or slow or both

I have tried several softwares for that.

Numeric book data extraction:

  1. a.Web Browsers
  2. b.SumatraPDF
  3. c.Calibre
  4. d.Adobe Acrobat
  5. e.Foxit

some more ...

All of them does not support text highlighting properly: clunky selection, data extraction not well designed or they completely lack it. For example Microsoft Edge drop them recently. The reason why they fail to provide these features is because these software are for different use than that, which made me think that a specialized one is needed (but not present in the market).

Note taking:

  1. a.Obsidian
  2. b.Logseq

They use a subset of Markdown, which is a great markup to specify semantic to a text. But it lack some features I would want.

It is not as readable as it could be for my taste, because they provide visuals and user experience on top it is ok to use it for that, but Markdown is a very bad markup for data representation They go around that by leveraging the software but it still means that all the software is built around it, which make it hard to implement innovative features.

A highly wanted feature would be to be able to highlights chunk of text from a numeric file and add some details to it, which we could link to other notes and be searchable, they don't have it. It means if we want to add the text taken from the source we must write ourselves all the details. It would'nt be the end of the end if we didn't losse a very important piece of data : its context.

Any text written has a specific context, its date of creation telling us the state of the world when it was written, the whole text in which the chunk of text was written and much more.

The data is here somewhat though, when we read an interesting text, some piece of it is more important than others, if we just extract them and leave the rest unsaved, it will miss a lot of the information. Maybe it makes sense when we write it on the fresh day, but read back the text in few years and the context is gone completely. It is not rare to read something you've written and don't have a clue why you wrote it. This happens a lot on quick notes, but still, being able redive on enought depth on the subject is key sometimes.

They don't fit

We can see that they are not the tools for my use case, which means that if I don't create one myself, there is low chance to find later one that will. Because this is very important to me, I can't live without it, I need to build one.

Why?

If you want a software

How to fix it:

  1. 1.Seamless UX.
  2. 2.Performant.

What would we want as a text markup?

  1. 1.Semantic
  2. 2.Presentional
  3. 3.Programmable
  4. 4.Automatic data extraction
  5. 5.Meta capacity

Data oriented

The markup must provide ways to search and mark text as data, numeric, token, text. All of them should have ways to encode fine details in them.

Simple writing

The markup should be usable at first try, added feature should not break that. If someone new to it is having hard time to understand why the way he wrotes the text fail or having lot of error while writing simple text, there is high probability that the markup should change.

While allowing complex ones

It should have possibilities to encode complex writing and relations between part of texts.

A brise to read

It should keep the basic writing features very easy to read, for comparison it should be a bit more readable than Markdown file format.

Good enough error/warning logs

Having very detailed logs is not the goal yet, but it must catch error with precise location and avoid stack trace like Java. The parsing should fix as possible as error it can, helping the writer to not much think about his writing.

Having two side faces

It should be as a tool which have two personalities, one side will permit very easy writing, the other will be highly capable of transforming the text and represent its meaning.

Highly performant

No compromise should be made on the performance, the parser should be able to parse hundreds of file, checks their links in a very short time scale. The purpose is to be able to use it for computation, for the bonus it will have responsive UX.

Simple to reproduce

Something is right for every file format adoption, it must be easy to implement. To achieve that there is several ways, provide tools easy to use and integrate, be open. The markup should do all of them.

Roadmap

  1. 1.Experimenting with the actual markup design.
  2. 2.Fix the current specification (bugs, unperfect design).
  3. 3.Add new features and specs to the markup.
  4. 4.Finish the HTML/javascript conversion (IDK to HTML and HTML to IDK).
  5. 5.Create tools for code editors integration.
  6. 6.Create a stand alone software with a UI, which will use this markup but with enhanced writing and permit data visualization (graphs of data, permit saving of PDF chunk of text and give them details and more).

Having metawriting

By having a possibily to do it will permit writer to alter the markup for their own needs. The way I see this is having a standard set of rules for the markup, then you permit anyone to add features which will never breaks the standard. Maybe a file that specify markup rules computation addition. Never allow to modify the standard is very important though.

Majority of features is available anywhere

Majority of the IDK possibilities are available everywhere where it make sense. Id est a table's cell must have the formating capacity, footnote reference and more. So keep in mind that if you see a feature listed below there is high probability to be available for your exotic use case.

HTML conversion

Be compliant to the standard

IDK must produce HTML which is compliant to the specs, the output is checked regularly to see if it does.