Arquivo para a ‘Mineração de dados’ Categoria

The pandemic plateau remains

06 Jul

The data observed in the last week of deaths by the corona virus, which are reliable data, since the curve of infected people depends on testing, which is done by companies and is still low, indicate that the plateau remains and the pandemic is internalized in the Brazil (see graph), we have already stressed the importance of making the logarithm to better visualize the slope of the curve, which is exponential.

What the policy would be for this moment is to continue maintaining social isolation, personal hygiene, and social distancing habits, in addition to precautions in relation to municipal policies.

Any prospect of a peak, at least the data indicates, seems meaningless, the number of infections remains around a thousand daily deaths, and a #lockdown is no longer viable, as the virus has already spread and regional isolation does not mean pandemic control.

We will navigate through uncertainties, already tired of a long period of isolation and with an open and close policy that does not have much effective results, except to contain a greater contagion, without meaning any effective result of controlling the pandemic at the federation level.

The economic costs that would be great in the case of a #lockdown period, will now be higher because both the commerce and the services that effectively need face-to-face contact would not be justified to keep them disabled, and few services are non-essential.

The plan is to continue the so-called “social isolation”, whose more certain name in the Brazilian case we have already said, is “social distance” which is compatible with some open services.

The essential is therefore to maintain personal care and hope that the curve falls “naturally”. 


Towards serverless computing

27 Jan

Among the trends pointed out by Nasdaq, the electronics stock exchange, is the so-called serverless computing, with the transfer of functions to cloud storage.

Clouds start to manage the functions and storage made by servers, computing is more agile and less dependent on mobile devices, which also begin to migrate to the IoT (Internet of Things) and so the general trend may be a digital transformation, not the buzzword of fashion, but in the very structure of the digital universe.

Another consequence will be the transfer and simplification and many functions for the Web, which is confused with the Internet, but it is just a thin layer on it, written through an interpreter (a computer language with high interactivity) which is HTTP.

The creation and execution of applications is thus simpler, but this is not exactly serverless computing as indicated by a superficial literature in the area, but one of the important consequences of it.

Function as a Service (FaaS) technology is different from the application definitions in Clouds (IaaS, Infrastructure as a Service, and PaaS, Platform as a Service), where codes are written without having to know on which server that application will be executed.


We’re getting close, but what?

17 Sep

At age 20, Carl Seagan’s book “Contact” (1985) impressed me in such a way that it never left my

Film Contact, Hormholes and AI detect.

imagination, spoke of wormholes (wormholes are possible paths for the fourth dimension), theology and search for lives on other planets, I made a road to materialism that lasted 20 years, any ilusions.

At 42´s years old, the film Contact (1987) once again impressed me, the protagonist to Ellie Arroway (Jody Foster), in the fiction era of SETI (Search for Extraterrestrial Intelligence), I now discover that the department exists at the University of Berkeley and there they are picking up signals from a star coming from a distant star.

Curious and thought-provoking, it is precisely the phase in which I return to study Teilhard de Chardin’s Noosphere and search the fourth dimension, we are preparing a hologram and an Ode to Christus Hypercubus at Lisbon, just a reference of Salvador Dali’s fourth dimension.

SETI researchers from Berkeley, led by student Gerry Zhang and some collaborators, used machine learning to build a new algorithm for radio signals they identified in a 5-hour period on August 26, 2017 (pull my birthday ), but it should only be a coincidence.

Zhang and his colleagues with the new algorithm have resolved to reanalyze the data for 2017 and found 72 additional explosions, the signals do not seem like communications as we know, but real explosions, and Zhang and his colleagues foresee a new future for the analysis of radio astronomy signals with use of machine learning.

As in the film the signal needed a long time to be decoded, Turing who studied the Enigma machine captured from the German army during World War II, would love to study it today, he deciphered it. The code universe is not therefore a human artifact, space is full of it, not to say it is from any civilization, but they are there, the background radiation for example, discovered in 1978 by Penzias and Wilson, ratified the Big Bang and gave them a Nobel Prize in Physics.

The new results will be published this month in The Astrophycal Journal, and is available on the Breaktrough Listen website.  



Basic Questions of Semantic Web and Ontologies

05 Jul

We are always faced with concepts that seem common sense and are not, is the case of many examples: social networks (confused with the media), fractals (numbers still too generic to be used in everyday life, but important), the artificial intelligence, finally innumerable cases, being able to go to the virtual (it is not the unreal), the ontologies, etc.

These are the cases of Semantic Web and Ontologies, where all simplification leads to an error. Probably so, one of the forerunners of the Semantic Web Tim Hendler, wrote a book Semantic Web for Ontologists modeling (Allemang, Hendler, 2008).

The authors explain in Chapter 3 that when we speak of Semantic Web “of a programming language, we usually refer to the mapping of language syntax to some formalism that expresses the” meaning “of that language.

Now when we speak of ‘semantics’ of natural language, we often refer to something about what it means to understand the utterance – how to go from the structured lyrics or sounds of a language to some kind of meaning behind them.

Perhaps the most primitive part of this notion of semantics is a representation of the connection of a term in a statement to the entity in the world to which the term refers.” (Allemang, Hendler, 2008).

When we talk about things in the world, in the case of the Semantic Web we talk about Resources, as the authors say perhaps this is the most unusual thing for the word resource, and for them a definition language called RDF has been created as a Resource Description Framework, and they on the Web have a basic identification unit called URI, along with a Uniform Resource Identifier.

In the book the authors develop an advanced form of RDF called RDF Plus, which already has many users and developers, to also model ontologies using a language of their own that is OWL, the first application is called SKOS, A Simple Organization of Knowledge, which proposes the organization of concepts such as thesaurus dictionaries, taxonomies and controlled vocabularies in RDF.

Because RDF-Plus is a modeling system that provides considerable support for distributed information and federation of information, it is a model that introduces the use of ontologies in the Semantic Web in a clear and rigorous, though complex, way.

Allemang, D. Hendler, J. Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL, Morgan Kaufmann Publishing, 2008.



Trends in Artificial Intelligence

10 May

By the late 1980s the promises and challenges of artificial intelligence seemed to crumble Hans Moracev’s phrase: “It is easy to make computers display adult-level performance on intelligence tests or play checkers, and it is difficult or impossible to give them the one-year-old’s abilities when it comes to perception and mobility, “in his 1988 book” MInd Children. ”
Also one of the greatest precursors of AI (Artificial Intelligence) Marvin Minsky and co-founder of the Laboratory of Artificial Intelligence, declared in the late 90s: “The history of AI is funny, because the first real deeds were beautiful things, a machine that made demonstrations in logic and did well in the course of calculation. But then, trying to make machines capable of answering questions about simple historical, machine … 1st. year of basic education. Today there is no machine that can achieve this. “(KAKU, 2001, p 131)
Minsky, along with another AI forerunner: Seymor Papert, came to glimpse a theory of The Society of Mind, which sought to explain how what we call intelligence could be a product of the interaction of non-intelligent parts, but the path of AI would be the other, both died in the year 2016 seeing the turn of the AI, without seeing the “society of the mind” emerge.
Thanks to a demand from the nascent Web whose data lacked “meaning,” AI’s work will join the efforts of Web designers to develop the so-called Semantic Web.
There were already devices softbots, or simply bots, software robots that navigated the raw data looking for “to capture some information,” in practice were scripts written for the Web or the Internet, which could now have a nobler function than stealing data.
The idea of ​​intelligent agents has been revived, coming from fragments of code, it has a different function on the Web, that of tracking semi-structured data, storing them in differentiated databases, which are no longer Structured Query Language (SQL) but look for questions within the questions and answers that are made on the Web, then these banks are called No-SQL, and they will also serve as a basis for Big-Data.
The emerging challenge now is to build taxonomies and ontologies with this scattered, semi-structured Web information that is not always responding to a well-crafted questionnaire or logical reasoning within a clear formal construction.
In this context the linked data emerged, the idea of ​​linking data of the resources in the Web, investigating them within the URI (Uniform Resource Identifier) ​​that are the records and location of data in the Web.

The disturbing scenario in the late 1990s had a semantic turn in the 2000’s.

KAKU, M. (2008) The Physics of the Impossible: a scientific exploration of the world of fasers, force fields, teleportation, and time travel. NY: Doubleday.



AI can detect hate speech

17 Oct

It is growing in the social media hate speech, identifying it with a single sourceAnFearEn can be dangerous and biased, because of this, researchers from Finland trained a learning algorithm to identify the discourse of hate by comparing it computationally with what differentiates the text which includes discourse in a system of categorization as “hateful.”
The researchers used the algorithm daily to visualize all the open content that candidates in municipal elections generated on both Facebook and Twitter.
The algorithm was taught using thousands of messages, which were cross-checked to confirm the scientific validity, according to Salla-Maaria Laaksonen of the University of Helsinki: “When categorizing messages, the researcher must take a position on language and context and therefore it is important that several people participate in the interpretation of the didactic material “, for example, make a hateful speech to defend themselves from an odious action.
The algorithm was taught using thousands of messages, which were cross-checked to confirm scientific validity, explains Salla-Maaria: “When categorizing messages, the researcher must take a position on language and context and therefore it is important that several people participate in the interpretation of the didactic material “, otherwise the hatred can be identified only unilaterally.
She says social media services and platforms can identify hate speech if they choose, and thus influence the activities of Internet users. “there is no other way to extend it to the level of individual citizens,” says Laaksonen, that is, they are semi-automatic because they predict human interaction in categorization.
The full article can be read on the website of Aalto University of Helsinki.


The Web 4.0 emerges?

31 Oct

The initial impulse of Tim Berners-Lee to create in the mid-90´s aoutro protocol on the Internet (Web and Internet are different) was to spread more quickly scientific information, then we can say it was a Web-centered information.
The Web quickly became popular, then the growth of concern for the Semantic Web has Berners-Lee, James Hendler and Ora Lassila published the inaugural paper emantic Web: new form of Web content that is meaningful to computers will unleash a revolution of new possibilities further development there was designed as knowledge representation, ontologies, intelligent agents and finally an “evolution of knowledge.”
Web 2.0 had the initial feature interactivity (O’Reilly, 2005) where users become more free to interact in web pages and can tag, comment and share documents found online.
The article pointed the way of ontologies as a way “natural” for the development and add meaning to information in the Semantic Web, with methodologies from the Artificial Intelligence, which in the eyes of James Hendler (Web 3.0) went through a “winter” creative.
But three integrated tools just indicating a new path: ontologies helped build simple organization called knowledge schemes (SKOS – Simple Organization of Knowledge System), a database for consultation, with a language called SPARQL and what was already basic Semantic Web, which was the RDF (Resource description Framework) in its simple descriptive language: XML.
The first major project was the DBpedia, a database proposed by the Free University of Berlin and the University of Leipzig in collaboration with OpenLink Software project in 2007, which was structured around the Wikipedia, using 3.4 billion of concepts to form 2:46 RDF triples (resource, property and value) or more simply subject-predicate-object, indicating a semantic relationship.
There are several types of Intelligent Agents in development, little or no use “intelligence” of Web 3.0, there will be in the future new developments? We pointed out in a recent article Semantic Scholar Tool Paul Allen Foundation, but also the connection to the Web 3.0 (projects related to linked data) is not clear.
2016 definitely has not been the year of the Smart Web, or if you want the Web 4.0, but we are approaching, personal assistants (Siri, Cortana, the “M” of Facebook), home automation (Apple Homekit, Nest), recognition image and driverless cars are right there around the corner.
Home automation is the home smart features, this field AI grows fast.


(Português) Scholar Semantic é uma novidade ?

26 Oct

Sorry, this entry is only available in Brazilian Portuguese.


Significant technologies for Big Data

20 Sep

Big Data is still an emerging technology, the cycle from emergence of a technology until their maturity,sxsw if we look at the hypo cycle Gartner curve, we see in it the Big Data on seed from the appearance, to the disappointment, but then comes the maturity cycle.


To answer the questions posed in the TechRadar: Big Data, Q1 2017, a new report was produced saying the 22 possible technologies maturities in the next life cycle, including 10 steps to “mature” the Big Data technologies.


In view of this research, the ten points that can, to increase the Big Data are:


  1. Predictive analytics: software solutions and / or hardware that enable companies to discover, evaluate, optimize and deploy predictive models through the analysis of large data sources to improve business performance and risk mitigation.
  2. It will take NoSQL databases: key-value, documents and graphic databases.
  3. Research and knowledge discovery: tools and technologies to support the extraction of information and new perspectives of self-service large data repositories unstructured and structured that reside in multiple sources, such as file systems, databases, streams , APIs and other platforms and applications.
  4. analysis Flows (analytics Stream): software that can filter, aggregate, enrich and analyze a high data transfer rate from multiple sources online disparate data and any data format (semi-structured).
  5. Persistent analysis (in-memory) “fabric”: allowing access to low latency and processing large amounts of data by distributing data over the dynamic random access memory (DRAM), Flash or SSD a distributed computer system.
  6. Distributed stores files: a network of computers where the data are stored in more than one node often replicated way, both redundancy and performance.
  7. Data virtualization: a technology that provides information from various data sources, including large data sources such as Hadoop tool and distributed data stores in real-time or near-real time (small delays).

This will require the last 3 steps that research suggests: 8. data integration: tools for data orchestration (Amazon Elastic MapReduce (EMR), Apache Hive, Apache Pig, Apache Spark, MapReduce, Couchbase, Hadoop, MongoDB) data preparation (modeling, cleaning and sharing) and data quality (enrichment and data cleansing at high speed) will be needed and that is done, you can make big productive Date “providing values ​​something of growth through a balance Phase “


Aiding transparency on the Web

19 Oct

A second-generation tool, called Sunlight is being developed to enable SunLightgreater transparency in the use of personal data on the Web.

Such as Web browsing is becoming easier day, and every day the corporate monitoring our emails and habits are more and more exploited, the tool developed in the Engineering Department of Columbia University, USA, is to prevent and warn how and when our data are being used..

According to Roxana Geambasu, scientist at Columbia and Data Science Institute, “The Web is like the Wild West,” where “there is no oversight of how our data is being collected, exchanged and used,” and falling into the wrong hands can be used against us.

According to Daniel Hsu, another researcher of the group, the tool is the first to analyze numerous inputs and outputs together to form hypotheses that are tested on a separate set of data selected from the original.In the end, each hypothesis, and its input and output is switched on, is classified for statistical and second confidence Hsu “We are trying to find a balance between statistical confidence and scale so that we can begin to see what is going on across the Web as a whole “.

The researchers set the texts from 119 Gmail accounts, and for more than a month last year, sent 300 messages with sensitive words in the subject line and email body, and found, for example, crossing words “unemployed” , “Jew” and “depressed” were used to stop trigger ads for “easy auto financing.”

The tool of course, does not solve these problems, only denounces the existence of these cases.