The Genetic Blueprint and The Internet is like a Messy Library

1, The Genetic Blueprint

A decade after the invention of the World Wide Web, Tim Berners-Lee promotes the "Semantic Web". The Internet has so far been the repository of digital content. It has a rudimentary inventory system and very rudimentary data location services. As a result, most of the content is hidden and inaccessible. Furthermore, the Internet manipulates sequences of symbols, not logical or semantic propositions. In other words, the Net compares values ​​but does not know the meaning of the values ​​it manipulates in this way. It cannot interpret strings, infer new facts, deduce, inductive, derive, or understand what it is doing. In short, he doesn't understand the language. Run a vague term through any search engine and those gaps become painfully obvious. The lack of understanding of the semantic underpinnings of raw materials (data, information) prevents applications and databases from sharing resources and providing them with each other. The Internet is discrete and discontinuous. It looks like an archipelago, with users jumping from island to island in a frantic search for relevance.

Even visionaries like Berners-Lee don't envision a "smart Web". They only suggest allowing users, content creators, and web developers to assign descriptive meta tags ("hotel names") to fields or strings of symbols ("Hilton"). These meta tags (organized into semantic and relational "ontologies" - lists of meta tags, their meanings and relationships to each other) will be read and enabled by different applications. handle the associated symbol strings correctly (put the word "Hilton" in your address book under "hotels"). This will make the information search more efficient and reliable, and the retrieved information will necessarily be more relevant and suitable for higher-level processing (statistics, development of rules, etc.). heuristics, etc.). We move from HTML (with tags related to visual interface and content indexing) to languages ​​like DARPA Agent Markup Language, OIL (Ontological Inference Layer or Ont Exchange Language) learning) or even XML (with tags related to content classification, document structure, and semantics). This will bring the Internet closer to the classic library card catalog.

Even in the present, pre-semantic, hyperlinked-dependant Internet, the Internet is reminiscent of Richard Dawkins' famous work "The Selfish Gene" (OUP, 1976). This would be doubly true for the Semantic Web.

Dawkins suggests generalizing the principle of natural selection to a stable law of survival. "A stable thing is a collection of atoms long enough or common enough to deserve a name." He then described the emergence of "replicators" - molecules that have made copies of themselves. Copiers that survive the competition for rare materials are characterized by long life, reproduction, and copy fidelity. Replicators (now known as 'genes') have created 'survival machines' (organisms) to protect them from the erratic changes of an increasingly harsh environment.

It is very reminiscent of the Internet. "Stable things" are HTML-encoded web pages. They are copycats - they make copies of themselves whenever their "web address" (URL) is clicked on. The HTML code of a web page can be considered "genetic material". It contains all the information needed to regenerate the page. And, just like in the wild, longevity, richness (measured by links to the site from other sites) and loyalty to copying HTML code - its chances of survival. (like a website) the higher.

The replicator (DNA) and HTML replicator molecules have one thing in common: they are both encapsulated information. In the right context ("appropriate biochemical soup" in the case of DNA, the right software application in the case of HTML code) - this information creates a "survival machine" (an organism or a web).

Semantic Web will only increase the lifespan, richness, and fidelity of the copy of the underlying code (in this case, OIL or XML instead of HTML). By facilitating more interactions with many other websites and databases, the underlying "replicator" code ensures the "survival" of "its" site (= the machine survives). at its). In a similar way, the site's "DNA" (its OIL or XML code) contains "unique genes" (semantic meta tags). The whole process of life is the implementation of some kind of Semantic Web.

In a prophetic passage, Dawkins described the Internet:

The first thing to grasp about a modern replication mechanism is that they are very sociable. The survival machine is a vehicle that contains not one gene but thousands of genes. The making of an organism is such a complex cooperative enterprise that it is virtually impossible to separate the contribution of one gene from the genes of another. A given gene will have different effects on quite different parts of the body. A certain part of the body will be influenced by many genes, and the effects of any gene depend on the interaction with many other genes... In the same way, a certain blueprint page deals with to many parts other than the building; and each page makes sense only when referring to many others."

What Dawkins overlooked in his important work was the concept of networks. People gather in cities, mate and breed, providing genes for the new "survival machine". But Dawkins himself has suggested that the new Replicator is a "meme" - an idea, belief, technique, technology, artwork, or information. Memes use the human brain as a "survival machine" and they jump from brain to brain, across time and space ("communication") in cultural evolution (as opposed to biological). The internet is a playground for latter-day memes. But, more importantly, it is a network. Genes move from one container to another through a linear, serial, and tedious process that includes prolonged periods of individual gene mixing ("sex") and a prolonged gestation period. long. Memes use the network. Therefore, their spread is parallel, rapid, and widespread. The internet is indicative of the meme's growing dominance over genes. And the Semantic Web is perhaps to the Internet what artificial intelligence is to classical computing. We may be on the threshold of a web of self-awareness.

2. The Internet is like a messy library

A. Cataloging Problem

The Internet is a collection of billions of pages containing information. Some of them are visible and others are created from a hidden database at the request of the user ("Invisible Internet").

The Internet has no clear order, classification or classification. Surprisingly, unlike "classic" libraries, no one has yet invented a (much-needed) Internet cataloging standard (remember Dewey?). Some websites actually apply the Dewey decimal system to their content (Suite101). Others default directory structure (Open Directory, Yahoo!, Look Smart, and others).

If such a standard existed (an agreed upon method of digital cataloging) - each site could self-categorize. Websites will benefit from this to increase their visibility. Of course, this would eliminate the need for today's clumsy, incomplete, and (very) inefficient search engines.

As a result, a site with a number starting with 900 will be immediately identified as historical processing, and multiple classifications will be incentivized to allow for better cuts. One example of such emerging "self-categorization" and "self-publishing" technology (although limited to academic sources) is the "Academic Resource Channel" by Sciindex.

In addition, users will not be required to remember rows of numbers. Future browsers will be like catalogs, like applications used in modern libraries. Compare this utopia with the present utopia. Users grapple with piles of unrelated documents to eventually arrive at a partial and disappointing destination. At the same time, there may be websites that exactly match the needs of users poorly. However, what currently determines the chances of a happy encounter between users and content are the whims of the particular search engine being used and things like meta tags, titles, fees paid. or a suitable opening phrase.

B. Screen Versus Page

Computer monitors, due to physical limitations (size, the fact that the screen must be scrolled) cannot compete effectively with printed pages. The latter remains the most ingenious means ever invented for storing and disseminating textual information. Granted:

Computer screens are more effective at highlighting discrete units of information. Thus, these different abilities draw battle lines:

structure (printed page) versus unit (screen), continuous and easy to reverse (print) versus discrete (screen).

The solution was to find an efficient way to translate the computer screen into a printed document. It's hard to believe, but nothing like it exists. The computer screen still doesn't like offline printing. In other words:

if a user copies information from the internet to their word processor (or vice versa, for that matter) - they get a fragmented, junk, and unsightly document.

Very few website developers try to do anything about it - even less succeed.

C. Dynamic and Static Interaction

One of the biggest mistakes content providers make is that they don't provide "static-dynamic interaction".

Internet-based content can now easily interact with other media (e.g. CD-ROM) and with non-PC platforms (PDA, mobile phones). Examples abound:


The CD-ROM shopping catalog interacts with the website to allow users to order products. The catalog can also be updated via the website (as is customary with CD-ROM encyclopedias). The advantages of the CD-ROM are obvious:

extremely fast access times (tens of times faster than accessing a website over a dial-up connection) and data storage capacity hundreds of times larger than that of a regular web page.

Another example:


The disposable smart PDA plug-in contains hundreds of ads or "yellow pages". Consumers choose the ad or item they want to see and connect to the Internet to watch relevant video. She can then also have an interactive chat (or webinar) with the seller, get information about the company, the ad, the ad agency that created the ad, and more.

Encyclopedias on CD-ROM (such as Britannica or Encarta) already contain hyperlinks that direct users to web pages selected by the editor.

Note

CD-ROM is probably a doomed medium. Storage capacity continues to grow exponentially, and within a year, desktops with 80 GB hard drives will become commonplace. In addition, the much-publicized Network Computing - a simplified version of the personal computer - will provide the average user with terabytes of storage and the processing power of a supercomputer. What separates computer users from this utopia is communication bandwidth. With the advent of satellite and wireless broadband services, DSL and ADSL, cable modems and advanced compression standards - video (on demand), audio and data will be available quickly and stylishly rich.

CD-ROMs, on the other hand, are not portable. This requires the installation and use of complex hardware and software. It is not a user-friendly push technology. He is a nerd-oriented person. Therefore, CD-ROM is not a live media. There is a long period of time between purchase and data access. Compare that to a book or a magazine. Data from these older vehicles is immediately available to the user and allows simple and precise "backward" and "forward" functions to be performed.

Perhaps the biggest mistake CD-ROM manufacturers make is that they don't provide an integrated hardware and software package. CD-ROMs are not compact. Walkman is a compact hardware-software suite. It is easily transportable, thin, contains many sophisticated functions and is user-friendly, allowing immediate data access. The same goes for discs, or MP3-man, or the new generation of e-books (e.g. E-Ink). The same cannot be said for CD-ROMs. By tying its future to the outdated concept of a self-contained, expensive, inefficient, and technologically unreliable personal computer, CD-ROM has cast itself into oblivion (poss except for references). D. Online reference

A visit to Encyclopaedia Britannica online reveals some of the great and amazing possibilities of online reference - as well as a number of obstacles.

Each entry in this massive reference work is linked with hyperlinks to relevant web pages. Sites are carefully selected. Links are available for data in a variety of formats, including audio and video. Everything can be copied to the hard drive or CD R/W.

This is a new approach to a knowledge center - not just a bunch of hardware. Flexible and endlessly rich content. It can be connected to a voice question/answer center. Subscriber inquiries can be answered by e-mail, fax, posted on the website, hard copies can be sent by post. This 'trivial tracing' or 'homework' service can be very popular - there is a huge demand for 'just-in-time' information. The Library of Congress - along with a few other libraries - is in the process of making such a service available to the public (CDRS - Collaborative Digital Reference Service).

E. Derivative Content

The Internet is a vast store of information that is freely accessible, even in the public domain.

With minimal investment, this information can be assembled into coherent, thematic, and inexpensive compilations (on CD-ROM, print, e-book, or other media).

F. Electronic Publications

The Internet is by far the largest publishing platform in the world. It combines frequently asked questions (questions and answers on nearly every technical topic in the world), e-magazines (e-magazines), electronic editions of printed newspapers and periodicals. periodicals (in conjunction with online news and information services), references, e-books, monographs, articles, discussion reports ("topics"), conference proceedings, etc.

The Internet represents the main asset for publishers. Consider the electronic version of p.

Publishing an e-magazine drives sales of the print edition, helps attract subscribers, and leads to the sale of ad space. Electronic archiving (see next section) eliminates the need to resubmit problems, the physical space required to do so, and tedious searching for data items.

The future trend is to subscribe to a combination of both the electronic edition (mainly for archival value and the ability to hyperlink to additional information) and the print edition (which is easier to browse the current issue). The Economist has provided free access to its electronic archive as an incentive to print subscribers.

Electronic newspapers have other advantages:


It allows for immediate feedback and smooth, near-real-time communication between editors and readers. Therefore, the electronic version has a gyro function:

a navigation tool, always pointing out deviations from the "right" title. Content can be updated instantly, and the latest news is integrated into older content. Dedicated mobile devices have made it possible to download and store large amounts of data (up to 4,000 printed pages). Users access libraries containing hundreds of texts, suitable for downloading, storing, and reading on a particular device. Again, a convergence of standards is expected in this area as well (the final candidate will likely be Adobe's PDF versus Microsoft's MS-Reader).

Currently, e-books are considered dichotomous as follows:


Continuation of the printed book (p-book) by other means, or as a whole new editorial universe.

Since e-books are a more convenient medium than e-books, they will prevail in any battle over "media substitution" or "media displacement". In other words, if publishers continue to convert p-books into e-books easily and directly, e-books will be doomed. They are simply inferior and cannot offer the comfort, tactile pleasure, navigability and readability of books p.

But e-books - being digital - open up an opportunity with previously overlooked possibilities. These will only be improved and enriched with the introduction of e-paper and e-ink. Among them:

Hyperlinks in e-books and without it - to web content, reference books, etc. ;

Integrated instant purchase and order link;

Various scenarios, user interaction, decision-based;

Interact with other e-books (using wireless standards) - author collaboration or reading groups;

Interaction with other eBooks - games and community activities;

Content is updated automatically or periodically;

Multimedia;

Database maintenance, bookmarks, annotations, and history (archived records of reading habits, purchasing habits, interactions with other readers, plot decisions, etc.);

Built-in, automatic audio translation and conversion capabilities;

Full wireless piconet and distributed network capabilities.

The technology is not perfect yet. The war is raging in wireless networks and e-books. Platforms compete with each other. Conflicting standards. Guru argues. But convergence is inevitable and with it comes the e-book of the future.


G. Storage Function

The Internet is also the largest cemetery in the world:

tens of thousands of inappropriate, still accessible websites - the "ghost sites" of this electronic frontier.

In a way, it's collective memory. One of the main functions of the Internet is to store and transfer knowledge over time. It's called "memory" in biology - and "storage" in library management. The history of the Internet is recorded by both search engines (Google) and specialized services (Alexa).


Comments

Popular posts from this blog

Intranet