#2957 Issue closed: Improve documentation for AI tools?

Labels: documentation, no-issue-activity

schlomo opened issue at 2023-03-15 16:00:

In the age of AI tools it becomes more important to be visible also for AI tools that users will be using to find solutions to their problem. To see where we stand I tried to ask ChatGPT (GPT-4) for help with Linux disaster recovery. It produces an impressively detailed howto for Clonezilla, and even knows about ReaR when asked - but the quality of its ReaR howto is much lower (and it is actually partially wrong).

I can only surmise that this has to be because there is more Internet content about Clonezilla than about ReaR.

Maybe somebody will feel motivated to write more about ReaR to fix this problem, otherwise this issue is for information/documentation.
image

jsmeix commented at 2023-03-17 07:29:

When googling for 'clonezilla' I get
"Ungefähr 1.500.000 Ergebnisse (0,28 Sekunden)"
i.e. "About 1.500.000 results (0.28 seconds)"

When googling for 'relax and recover' or 'relax-and-recover' I get
"Ungefähr 7 Ergebnisse (0,32 Sekunden)"
i.e. "About 7 results (0.32 seconds)"

When googling for 'linux disaster recovery' I get
"Ungefähr 14.100.000 Ergebnisse (0,38 Sekunden)"
i.e. "About 14.100.000 results (0.38 seconds)"
with only entries about Relax-and-Recover
on the first page of the Google results
(nothing about Clonezilla there)

When googling for plain 'rear' I get
"Ungefähr 1.300.000.000 Ergebnisse (0,37 Sekunden)"
i.e. "About 1.300.000.000 results (0.37 seconds)"
;-)

jsmeix commented at 2023-03-17 07:47:

I guess the high number of results when
googling for 'clonezilla' is because Clonezilla
also works to clone Windows disks as far as I read in
https://www.chip.de/downloads/Clonezilla-BIOS-Version-64-Bit_32145513.html
that states (excerpts and translated into English here)

Clonezilla (BIOS-Version) 64 Bit
Version 3.0.3-22
Ranking 4 / 208 at CHIP in category : Backup-Software
...
Compatible with Windows 10
and 7 more systems.
    Windows 10
    Windows 8
    Windows 7
    Windows Vista
    Win 2003 Server
    Win XP
    Win 2008 Server
    Windows 11

Linux is not even mentioned there which tells a lot!

So in practice Clonezilla is perhaps much more often
used and publicly discussed by Windows users
than by Linux users - the base rate is crucial,
so perhaps there is just a base rate fallacy?
Cf.
https://en.wikipedia.org/wiki/Base_rate_fallacy

I mean that ChatGPT has a base rate fallacy?

schlomo commented at 2023-03-17 09:20:

Yes, totally agree with you. Two things really impressed me in this ChatGPT exchange:

  1. it produced a really well-written Clonezilla how-to, that even was specific
  2. it understood my vague prompt "what about rear" to mean ReaR, which is a lot of contextualised knowledge

Anyway, I thought to share this with you ReaR developers and users here, even if it is just for your amusement.

And if somebody has any practical idea for increasing ReaR's ranking - please go for it.

jsmeix commented at 2023-03-17 11:18:

Only FYI
regarding "really well-written" texts from ChatGPT:

As far as I know the core of ChatGPT (and likely other chat bots)
essentially is a Markov chain, for example see
https://blog.ephorie.de/create-texts-with-a-markov-chain-text-generator-and-what-this-has-to-do-with-chatgpt?utm_source=rss&utm_medium=rss&utm_campaign=create-texts-with-a-markov-chain-text-generator-and-what-this-has-to-do-with-chatgpt
and
https://news.ycombinator.com/item?id=33939805
where the latter reads (excerpt)

jameshart 3 months ago

It’s not ‘more complexity’ than a Markov chain - it essentially
is a Markov chain, just looking at a really deep sequence of
preceding tokens to decide the probabilities for what comes next.

And it’s not just looking that up in a state machine,
it’s ‘calculating’ it based on weights.

But in terms of
‘take sequence of input tokens; use them to decide probable next token’,
it’s functionally indistinguishable from a Markov chain.

Unfortunately neither
https://en.wikipedia.org/wiki/ChatGPT
nor
https://en.wikipedia.org/wiki/Generative_pre-trained_transformer
tell what the core of ChatGPT and the like is.

But
https://en.wikipedia.org/wiki/Markov_chain
tells what the core of a Markov chain is:

A Markov process is a stochastic process

Accordingly at its core ChatGPT is a clueless "empty talk" bot.

This matches what I have seen so far how ChatGPT
sometimes talks plain nonsense semantically
but always syntactically "really well-written" text
(the really evil looks good).

Essentially ChatGPT parrots how others had before
continued some prefix text snippet but ChatGPT has
no "understanding" of what it is talking about.

E.g. when others had before continued the prefix text
snippets 'ham tastes' and 'Spam tastes' like this:

ham tastes -> ham tastes very good (with 30 % probability)
ham tastes -> ham tastes well (with 40 % probability)
ham tastes -> ham tastes bad (with 20 % probability)
ham tastes -> ham tastes rotten (with 10 % probability)

Spam tastes -> Spam tastes good (with 10 % probability)
Spam tastes -> Spam tastes well (with 20 % probability)
Spam tastes -> Spam tastes odd (with 40 % probability)
Spam tastes -> Spam tastes bad (with 30 % probability)

Then ChatGPT will continue the prefix texts
'ham tastes' and 'Spam tastes' in the same way.

So ChatGPT may generate a "well-written" text like
"usually ham tastes well
but sometimes ham tastes rotten
while often Spam tastes odd"
regardless that ChatGPT has no "knowledge" about ham and Spam.

jsmeix commented at 2023-03-23 15:24:

An addedum only FYI
regarding whether or not ChatGPT actually is (only) a Markov chain
(versus "at its core ChatGPT essentially is a Markov chain"):

I found this YouTube video that explains in a bit more detail
"How ChatGPT's AI works behind the scenes":
https://www.youtube.com/watch?v=-9SdOPe294w

So I think it should be better phrased as something like:
"Essentially ChatGPT behaves like a Markov chain (on steroids)"

github-actions commented at 2023-05-23 02:22:

Stale issue message


[Export of Github issue for rear/rear.]