RSS
 

Arquivo para a ‘Bancos de Dados’ Categoria

The pandemic plateau remains

06 Jul

The data observed in the last week of deaths by the corona virus, which are reliable data, since the curve of infected people depends on testing, which is done by companies and is still low, indicate that the plateau remains and the pandemic is internalized in the Brazil (see graph), we have already stressed the importance of making the logarithm to better visualize the slope of the curve, which is exponential.

What the policy would be for this moment is to continue maintaining social isolation, personal hygiene, and social distancing habits, in addition to precautions in relation to municipal policies.

Any prospect of a peak, at least the data indicates, seems meaningless, the number of infections remains around a thousand daily deaths, and a #lockdown is no longer viable, as the virus has already spread and regional isolation does not mean pandemic control.

We will navigate through uncertainties, already tired of a long period of isolation and with an open and close policy that does not have much effective results, except to contain a greater contagion, without meaning any effective result of controlling the pandemic at the federation level.

The economic costs that would be great in the case of a #lockdown period, will now be higher because both the commerce and the services that effectively need face-to-face contact would not be justified to keep them disabled, and few services are non-essential.

The plan is to continue the so-called “social isolation”, whose more certain name in the Brazilian case we have already said, is “social distance” which is compatible with some open services.

The essential is therefore to maintain personal care and hope that the curve falls “naturally”. 

 

Four IT buzzwords for 2020

20 Jan

Some words have already been used in an excessive and mistaken way, we can mention disruptive technologies seen as any that have an impact on the market, when the problem is the scale of production and consumption, the data lakes, used to store raw data that do not they mean they are or can be handled easily (there are specific environments and tools for this), and the third term that is not new is also DevOps, which is the rapid implementation of codes with facilities to remove and correct possible bugs (errors in the code ).

The four buzzwords that are expected to grow in 2020 and which represent a danger both in their use and in their implementation are BigData (yes it already existed in 2019 but its expansion is indicated as a large volume for 2020), AI ditto the previous one, Agile which means the rapid market change and corporate strategy, if misused will be a failure and ultimately and no less essential, and lastly, what has been called a “digital transformation”.

Let’s start with the last one, which includes the previous ones, including the 3 excluded from the analysis, digital transformation does not necessarily mean that “everything now changes with digital processes”, and of course it does not mean that nothing changes, depending on the area, the impact, the disruption (in the scale) it is clear that the impact can and should happen, but be careful with Agile.

Agile is the process of responding quickly to changes, but the answer does not mean being responsive in any situation, the vast majority deserve analysis such as transient market situations, seasonal processes, response to competition and in particular, changes in “fashion”.

AI can be a response to many businesses, but the term “intelligence” itself is questioned, in fact it is a bit of each previous process, including bigData, Agile and Data lakes, that is, there must be tools like Analytics and Machine Learning that assist the process.

Gartner detected an increase from 25% to 37% from 2018 to 2019 in the use of AI for business, but the effectiveness is not guaranteed, just as only the use of IT does not mean the modernization of the company.

 

Free Databases

15 Feb

When more opensource databases grow, more companies covet this market, three competitor has great products: MySQL, Firebird, PostgreSQL.
Some obstacles always present is gaining trust among developers of these products in independent software producers, where a false idea works: what we pay is the best products.

On the other hand paid products developers, called proprietary software, is to use the advantages of this “free” market in their favor, for example, Microsoft’s SQL Server Express allows facilities for migration to its paid version, SQL Server Express, Also the generation and a database in a Microsoft Excel spreadsheet, which is easy to be done can be easily converted to Access database, which comes bundled with your Windows package.

But the Bigdata concept, which requires NoSQL databases, are those where a collection of databases become so complex and bulky that it is very difficult (in many cases impossible) to do simple operations such as removal, sorting, and summarization using Systems Traditional Database Managers.

BigData also refer to unstructured data found in social media, is a direct consequence of Web 2.0 that has entered millions of users as producers of information, use NoSQL (Not only SQL) applications. NoSQL promoting a number of innovative storage solutions and high volume information.

These diverse solutions are being used very frequently in countless companies, such as IBM, Twitter, Facebook, Google and Yahoo! for the analytical processing of Web log data, conventional transactions, among many other tasks.

The DBMS has a consistency model strongly based on the ACID (Atomicity, Consistency, Isolation and Durability) transactional control, but this model is not feasible when distributed over several nodes, a typical case of networks (important here are networks and not media).

The model developed then must be another: CAP (Consistency, Availability and Partition tolerance) where generally only two of these 3 properties can be guaranteed simultaneously, which makes processing more difficult, but not if the database is “semi-structured”, this Is, to work with the principle that the data is not structured in the conventional formats of SQL banks.

Among the various existing NoSQL products, we can consider that the most representative is Apache Hadoop, today there is a version adapted for the Web, called Handoop 2.0.
There are other products, among them HBase which is a distributed database, column-oriented, uses the Google BigTable model and is written in Java, and another is open-source software like Apache Cassandra (originally developed for Facebook).

HBase is an open-source distributed column-oriented database, modeled from Google BigTable and written in Java.

There are simple interfaces to SQL as associative arrays or key-value pairs, as well as standards for native XML databases supported by the XQuery standard.

A language that was developed for the Semantic Web is the SPARQL (Protocol and RDF Query Language) that has aided the growth of linked data clusters

 

(Português) Armazenamento 5D eterno

23 Feb

Sorry, this entry is only available in Brazilian Portuguese.

 

(Português) Google maps off-line disponível

12 Nov

Sorry, this entry is only available in Brazilian Portuguese.

 

(Português) Mapeando dados de Big Data

22 Sep

Sorry, this entry is only available in Brazilian Portuguese.

 

Snowden make virtual appear in USA

17 Mar

edwardMonday begins the American marketplace SXSW and Edward Snowden will make a virtual conversation via Hangout and can converse with audience members .

Anticipating his speech he said he had no regrets and everything he did was to protect the majority of ordinary citizens who have had their privacy ” violated on a massive scale.”

As a solution to the privacy issue , he said the programming community must develop new mechanisms to ensure data security and encryption for privacy .

The SXSW takes place in Austin (USA), and the connection will be made in Russia , due to ban him enter the United States for their complaints , but where is located and continue in secret transmission is protected for seven proxies (connections with servers) to mislead any search for your location .

The event received is called “A conversation with Edward Snowden” and he said he did reaffirm that  an oath to support and defend the U.S. Constitution.”

 

A curious problem of Big Data

10 Oct

Simon DeDeo a research in applied mathematics and complex systems of the Santa Fe Institute,MatematicaCriativa had a problem , as posted in Wired magazine .

He was collaborating on a new project to analyze data from the archives of the Old Bailey court in London criminal court central England and Wales 300 years . ”

But there were no clean data ( say structured ) as in a normal Excel spreadsheet format simple , including variables such as prosecution, trial and sentence in each case , but about 10 million words written over just under 200 000 trials .

How could analyze this data ? DeDeo question: ” It is not the size of the data set , it was difficult for patterns of large data size was quite manageable .” Was this enormous complexity and lack of formal structure that represented a problem for these ” big data ” that disturbed .

The paradigm of the research involved the formation of a hypothesis , decide precisely what it was intended to measure , then build a device to make this measurement accurately as possible , is not exactly like physics where you control variables and has a limited number of data.

Alessandro Vespignani , a physicist at Northeastern University , which specializes in harnessing the power of social networks to model disease outbreaks , the behavior of the stock market , the collective social dynamics , and electoral results , collected many terabytes of data networks social as Twitter , this approach can help treat texts written out of social networking .

Scientists like DeDeo Vespignani and make good use of this fragmented approach to the analysis of large data , but the mathematician at Yale University, Ronald Coifman says what is really needed is the large volume of data equivalent to a Newtonian revolution , comparing with the invention the calculation of the 17th century , which he believes is already underway .

Coifman says ” We have all the pieces of the puzzle – now how do we actually ride them ,” ie , we still have to move forward to address scattered data.

 

Serviço de nuvem dá anonimato

28 Nov

O serviço Amazon de nuvem, apelidado de EC2 (Elastic Compute Cloud) oferece capacidade de serviços virtuais de computador com promessas de confidencialidade por um tipo de estrutura que está sendo chamada de Orion, em projeto open source chamado Tor.

Tor é uma abreviação para The Onion Router, assim chamado devido à natureza multi-camadas da maneira como ele é executado. . É também conhecida como “dar net” (a rede escura).

No blog do projeto, os desenvolvedores afirmaram: “Através da criação de uma ponte, você doar largura de banda para a rede Tor e ajudar a melhorar a segurança e a velocidade com que os usuários podem acessar a internet”, revelando uma nova forma de colaboração que é a uso da banda.

O serviço custa em média normalmente custa £ 19 (perto de R$ 30) por mês, mas a Amazon está oferecendo um ano de armazenamento gratuito como parte de sua promoção, o que significa que o serviço deverá crescer.

O serviço é particularmente elogiado em regimes fechados, em países como Irã e outros do mundo árabe ele tem sido usado e muito elogiado.

Os serviços podem ser acessado no Android através de um aplicativo chamado Orbot e no início desta semana a Apple aprovou Navegador Covert para iPad que passou a ser vendido na App Store, sendo o primeiro aplicativo oficial app iOS que permite aos usuários rotearem suas comunicações on-line através do Tor.

 

Gerenciamento Eletrônico de Documentos (GED)

29 Oct

Gerenciadores eletrônicos de documentos (GED em português e ECM, Enterprise Content Management) propiciam que uma empresa, organização ou mesmo uma pessoa gerencie documentos mesmo que não estejam estruturados, ou seja, envolvem estratégias, métodos e ferramentas utilizadas para capturar, gerenciar, armazenar, preservar e distribuir conteúdo e documentos relacionados aos processos de organização do fluxo.

Neste sentido são mais amplos que os CMS (Content Managment System), como Drupal, Plone, WordPress, etc. que gerenciam conteúdos “carregados” dentro da plataforma e portanto são limitados, pois não é suficiente “gerenciar” o conteúdo.

Duas plataformas mais difundidas de GED são: Alfresco e Knowledge Tree (KT).

As principais motivações para se ter um GED são: o compartilhamento de arquivos é melhorar a colaboração e auditoria em documentos organizacionais. Seis pontos devem ser levados em consideração: métodos para organizar e armazenar de modo simples os documentos, segurança e proteção (isto é crítico, nem sempre levado a sério), capacidade de introduzir metadados, opções de pesquisa (outro ponto crítico), controle de versão e rastreamento de transações e documento de fluxo de trabalho (road map).

As duas ferramentas fazem isto, mas KT é paga, há uma outra paga chamada Dokmee, mais simples mas ao nosso ver mais limitada, mas muitas empresas preferem ferramentas “simples”, para tornar o treinamento simples e garantir o “serviço”.

Tanto Alfresco quanto KT oferecem todas as funcionalidades sugeridas acima, com pequenas diferenças.  Os dois têm os conceitos de usuários, grupos e papéis, mas KT fornece ainda  a opção de unidades.  Todos os usuários tem acesso aos documentos que podem ser controlados em uma escala de simples e com complexas opções de proteção.

Já os metadados e opções globais de pesquisas internas de documentos estão disponíveis em ambos, mas na versão KT estão ativados como padrão enquanto no Alfresco podem ser acrescentados com maior facilidade através de definição de aspectos de herança de acordo com as localidades. E por último ambos têm sistemas de fluxo contínuo de trabalho.