Skip to main content

Home/ SCSU Analytics/ Contents contributed and discussions participated by Chris Stanley

Contents contributed and discussions participated by Chris Stanley

Chris Stanley

How to Ensure Data Lakes Success | SmartData Collective - 0 views

  • it enables businesses to have a more unlimited view of data
  • Data lakes are defined as "a massive, easily accessible, centralized repository of large volumes of structured and unstructured data".
  • businesses must have some use cases in mind before constructing a data lake.
  • ...5 more annotations...
  • Oliver likewise suggests that businesses work with data scientists. Data scientists and engineers provide the necessary expertise required to make the data lake a successful data and analytics tool.
  • Configurable Ingestion WorkflowsNew sources of external information will continuously be available. Make sure to have an easy, secure and trackable content ingestion workflow mechanism that can rapidly add these new information into the data lake.
  • Knowledgent states that "without a high-degree of automated and mandatory metadata management, a Data Lake will rapidly become a Data Swamp" and that "attributes like data lineage, data quality, and usage history are vital to usability".
  • Data lakes must be industry-specific to cater to the industry's unique needs.
  • How to Ensure Data Lakes Success
Chris Stanley

Moving Your SQL Databases to Azure - Things to Know - The Microsoft MVP Award Program B... - 0 views

  • COMPARISON WITH ON-PREMAzure SQL is SQL Server behind the scenes, so most of the functionality is already there including tables, views, stored procedures, triggers, functions, primary and foreign keys and clustered indexes.
  • Of course there is no Windows authentication, and it currently uses SQL authentication only.
  • There is no to need maintain, balance, upgrade or patch the server as this is all done by Microsoft.
  • ...14 more annotations...
  • You also can't reboot the server, so if you end up with a runaway query you may have to open a support ticket.
  • There are always 3 copies of the database for high availability during disaster recovery.
  • There is a requirement for tables in a SQL Azure database to have a clustered index. This is necessary to keep the 3 copies of the database in sync.
  • The maximum SQL Azure database size is currently 500GB, but you can get around this using SQL federations and partitioning your data across multiple nodes.
  • There are a number of partially supported and unsupported features. A few of the ones I run into regularly are:• You cannot use the USE [databasename] sql statement. You must physically switch between databases in your application. • Remove from indexes - NOT FOR REPLICATION• Remove from your tables - WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] You can review a full list of unsupported features here: https://azure.microsoft.com/en-us/documentation/articles/sql-database-transact-sql-information/
  • When I migrate a database from SQL to SQL Azure, I typically follow this process using SSMS: • Create a blank database on the SQL Azure database server• Generate the scripts from the original database to create the database objects, excluding users• Do a find and replace to remove any unsupported features such as the two mentioned above• Run the create database object scripts against the new SQL Azure database• Create the users and apply permissions for the new database• Use SSMS or SSIS to copy the data over to the new database.
  • The SQL Database Management Portal is a web based, scaled down version of SSMS. You can create objects, and run queries and execution plans. But there is no GUI interface for some of the security features like creating users and logins. I find that it's a friendlier experience to create the database server in the portal, and do everything else using SSMS.
  • SQL Azure databases are protected by an automatic backup system.
  • The length of time the backups are retained depends on what tier you buy – 7 days for Basic, 14 days for Standard and 35 days for Premium.
  • The point-in-time restore is a self-service feature that costs you nothing unless you use it. If you use it, you pay regular rates for the new database that gets restored. You get all of the protection without any additional cost.
  • SECURITY You are in complete control the IP specific access to SQL Azure Database, at both the server AND database level. No one has access by default.
  • every time your IP changes, you have to update your firewall rules.
  • SERVICE TIERS AND PERFORMANCE LEVELS There are three tiers, with several levels of performance within them. I will summarize the Microsoft definitions.• Basic: Best suited for a small size database, supporting typically one single active operation at a given time.• Standard: The go-to option for most cloud applications, supporting multiple concurrent queries.• Premium: Designed for high transactional volume, supporting a large number of concurrent users and requiring the highest level of business continuity capabilities.
  • Costs can range anywhere from $7 per month for the Basic tier, $19 - $183 per month for a 250GB database in the Standard tier, to $566 to $8500 per month in the Premium tier.
Chris Stanley

Tool for Analyzing Survey Data, Survey Data Analysis Software - DataCracker - 1 views

  • mport your data to our survey data analysis software from all major survey programs and file formats. SurveyMonkey SurveyGizmo QuestionPro Qualtrics Survey Analytics Snap Surveys Toluna QuickSurveys Super Simple Survey SPSS (.sav) Triple-S (.xml or .sss) IBM SPSS Data Model (.mdd) Excel (.xls or .xlsx) CSV (.csv)
Chris Stanley

Chart and image gallery: 30+ free tools for data visualization and analysis | Computerw... - 1 views

  • he chart below originally accompanied our story 22 free tools for data visualization and analysis (April 20, 2011). We're updating it as we cover additional tools, including 8 cool tools for data analysis, visualization and presentation (March 27, 2012) and Six useful JavaScript libraries for maps, charts and other data visualizations (March 6, 2013). Click through to those articles for full tool reviews.
Chris Stanley

Purdue News - College students working out at campus gyms get better grades - 0 views

  • Home Current issue Submit info Subscribe Archive About us
  • "Students who worked out at Purdue's gym at least once a week were more likely to earn a higher grade point average than students who visited less or not at all,"
  • the more than 1,820 students who visit Purdue's France A. Córdova Recreational Sports Center at least 16 times a month earned a GPA of 3.10 or higher. The correlation between grades and gym use also is shown with moderate users. Students who used the gym at least seven times a month had an average GPA of 3.06
  • ...1 more annotation...
  • At Purdue, registered dietitians are on staff to discuss nutrition, and students have access to personal trainers and fitness consultants. Representatives from the Student Wellness Office are also onsite.
  •  
    " Home Current issue Submit info Subscribe Archive About us"
Chris Stanley

Yale Photogrammar revitalizes and adds new context to the FSA-OWI images - @joycevalenz... - 0 views

  • Exploiting Library of Congress metadata, the Photogrammar team created a web-based platform for organizing, searching, and visualizing the 170,000 photographs from 1935 to 1945
  •  
    "Yale Photogrammar revitalizes and adds new context to the FSA-OWI images"
Chris Stanley

Meeting Highlights | Charting the Future - 0 views

  • Student Success Implementation
  • Technology Initiative:
  • (a) predictive analytics,
Chris Stanley

The Popularity of Data Analysis Software | r4stats.com - 0 views

  • R resides in an interestingly large gap between the other domain-specific languages, SAS and SPSS. R has not only caught up with SPSS, but surpassed it with around 50% more job postings. MATLAB has many similarities to R so it’s interesting to see that it has only around half the job postings. Note that these are specific to analtyics and MATLAB has many engineering jobs that are not counted in this total.
  • SAS is still far ahead of R in analytics job postings
  • Figure 2a shows the number of articles found for each software package for all the years that Google Scholar can search. SPSS is by far the most dominant package, likely due to its balance between power and ease-of-use. SAS has around half as many, followed by MATLAB and R.
  • ...2 more annotations...
  • Minitab, Systat and JMP are all growing but at a much lower rate than either R or Stata.
  • R still dominates the discussions on the more statistically-oriented forums
Chris Stanley

How to create effective data visualizations - 0 views

  • 5. Enough with the text, already
  • 1. Present data that matter to your audience (and not just you)
  • Data scientists by definition are naturally inquisitive and love to quantify things. That makes them a good fit for the job. The bad news is that they sometimes become a little too enthusiastic about data for data’s sake and will overwhelm their audience with irrelevant information.
  • ...5 more annotations...
  • 2. Tell a story, simply
  • 3. Choose appropriate visualizations
  • 4. Make sure graphics accurately reflect the data
  • "Try to pick a visualization that depicts not only the level of a variable, but puts it in context for how important it is,"
  • Heat maps and bubble charts are a good example of this. You can see how important a particular region or customer or division is because it takes up more space on the map. You can show other attributes of the variables with color -- e.g., red for underperforming, green for doing well. With a visual like this, managers can quickly see where the problem is, and at the same time they can see how important it is."
Chris Stanley

Partners Say New Azure Machine Learning Service Could Be Microsoft's Secret Weapon In T... - 0 views

  • Azure Machine Learning is a public cloud-based service that lets developers embed predictive analytics into their applications
  • Machine learning software has been around for years but isn't easy to use or deploy, and it's also expensive, Sirosh said. Packaging up machine-learning-as-a-cloud service solves these problems, and by being first to bring it to market, Microsoft has a head start on the likes of Google, Amazon and IBM, he said. "I think, on this particular front, that we are the leaders," Sirosh told CRN.   
  • Hiring Sirosh was something of a coup for Microsoft. He joined last July from Amazon, where he spent close to nine years as a vice president in various machine-learning-related roles.
1 - 20 of 66 Next › Last »
Showing 20 items per page