Welcome!

Microservices Expo Authors: Elizabeth White, Carmen Gonzalez, Kong Yang, Yeshim Deniz, Liz McMillan

Related Topics: Machine Learning , Java IoT, Industrial IoT, Microservices Expo, PowerBuilder, Containers Expo Blog

Machine Learning : Blog Post

Where Is My Disk Space?

An HTML 5 File System Visualizer

by Nick Mueller, Zetta.net

Hello new users! The file system visualizer can be found at wheresmydiskspace.com - continue reading to learn more about the development of the tool and the visualization options.

Before buying more storage space it's a good idea to make sure your existing space isn't filled with redundant or old data - or hundreds of downloaded cat videos.

Disk capacity is increasing and while prices continue to drop, those savings are offset by demands for new capacity to store more and larger files. Not only does this mean more primary disk space, but 2x that amount for backups.

Zetta co-founder Lou Montulli may have the answer to this problem. Recently Lou combined his experience with browsers and storage in creating an open-source tool - a File System Visualizer (www.wheresmydiskspace.com) - for analyzing storage usage.

Lou was a founding engineer at Netscape in 1994 when he helped create the first commercial web browser Netscape Navigator. Over the years he's been responsible for the development of many browser related innovations, and co-founded Zetta.net in 2008 - where he continues to serve as VP of Engineering and Chief Scientist.

"The tool was conceived as a method for visualizing multiple aspects of any large file set: an existing file system, a backup or an archive," he says. "This can be a great tool to use if you find yourself running low on disk space and need to find files to delete to free up space."

The tool makes heavy use of the Data-Driven Documents JavaScript library together with jQuery, Dojo, PrettyPhoto, JavaScript and Scalable Vector Graphics. Sponsored by Zetta, all the source code for the File System Visualizer is available and a BSD license allows anyone to use it, commercially and non-commercially, free of charge.

"Part of the challenge and opportunity of this tool was writing it in JavaScript and using HTML as the user interface," Lou says. "I was part of the team who wrote the very first web browsers, so I was personally motivated to design a tool that takes advantage of some of the great new technologies coming out of HTML5, Mozilla.org and the broader web community."

Getting Started
The File System Visualizer is free to use and doesn't require installing any software. You just need a web browser that supports SVG and has a fast JavaScript engine. Go to www.wheresmydiskspace.com. The home page has a few video demonstration of the product that you can view before running the software.

Or you can:

  1. Click the link at the top of the page to take you directly to the visualizer.
  2. There you have three options: you can look at some sample data sets, use a Java applet to collect the data from your local machine and create a manifest file detailing what is in the file system, or you can load a manifest file created in a previous scan.
  3. If you choose to do a new scan, and there are a large number of folders, the software will prompt you to save the manifest to your disk rather than keeping it in the browser.

Test Setup
We recently had the File System Visualizer tested on a Windows 7 desktop with a third generation Intel Core i7 processor and 16 GB RAM. The scan took approximately 5 minutes. When completed, a message came up that there were 52,993 folders.

The software can analyze a local disk, or an administrator can run it remotely on any mountable drive. At this point it runs on Windows (32-bit and 64-bit) and OSX.

Visualizing Your Data
After running the scan, the software then presents seven different views of the data. The views are illustrated at the top of the page and you can click on any of the images to access that view of the data.

Summary Page - This showed that the test computer had 353.1 GB of data in 52,993 folders containing 364,931 items, with an average file size of 967.7 KB.

Visual Tree - This gives a hierarchical tree visualization of the data. On the left is a pull-down box where you can select to view the data by size, by type or by date. There is also a slider where you can select the tree display depth from one to seven levels.

Screenshot of the Tree View

Viewing by size shows a hierarchical view of the file system and the amount of data in each folder with up to seven levels of depth. To look at just the contents of a single folder, rather than the entire file system at once, just click on the dot next to that folder.

Viewing by type at the first level divided the data into known types and uncategorized. Going to the second depth level divided the uncategorized by their file extension and the categorized into groups such as disk images, games, database, software development, fonts, plugins, office types, settings, executables, media, backup and system. For most of those categories, going to the next level would give the file extensions, but some categories (media, office types and encodings) would further subdivide before getting to their final level.

Viewing by date, the first level divides the data into "1 year and older" and "within 1 year" and shows the GB of data in each category. Taking it to the second level splits the "within 1 year" branch into five levels and the "1 year and older" into each of the years for which you have data. There is no third level available.

Hierarchical List - This view presents the data in list rather than tree format. To get to deeper levels, click the + sign next to any of the categories. In addition to the file names, there are columns for Size in Directory, Total Size and % with children. When you click on the headers for the columns, up and down arrows appear, making it look like the data is sortable by those columns, but it isn't.

Flattened List - This is a sortable, non-hierarchical list of the folders. When viewing by Size, in addition to File Name, there are seven other sortable columns of data in each folder, including Size and Number of Items. The Type and Date views are similarly sortable. In none of these views can you look at a subtree, only at the entire file system. To view a subtree, go to one of the other views and narrow it down to the subtree and view type you want, and then click on the Flattened List visualization.

Your hard drive in "sun burst" view.

Sunburst - A type of pie chart, with rings showing each of the levels of depth. The chart can display each slice as an even size, or can adjust the sizes by the file count or amount of data in the slice. Clicking on any of the slices will move that folder or data point into the center circle, with the rings showing the subfolders or subcategories of that particular subdirectory.

Tree Map - A box type view of the data. As with the Sunburst, the boxes can be sized equally, or sized by data size or number of files. Clicking on any of the boxes will show the details within that subdirectory or data type.

Bubble Chart - This gives two layout options for showing the data: Bubble Chart or Circle Pack. The Bubble Chart shows bubbles for all the items in that category sized by the amount of data in that folder or file type. The Circle Pack presents a hierarchical view of the bubbles. In either view, clicking on a bubble or circle will give the bubbles showing the subcategories of that item.

Conclusion
The File System Visualizer is a quick and easy way to gain understanding of what's on your file system. It's intuitive to use and within minutes, you can start locating what is taking up disk space. Then you can delete or archive anything that is no longer needed, or establish policies to prevent wasted space. Then, if additional storage space is still needed, you can give management a clear visual presentation of how storage is being used in your environment. You can start visualizing your hard drive right now.

Nick is Zetta's Corporate Reporter, and has been writing and telling stories about technology with blogs, social media, and content marketing since the days when the BBS reigned.

More Stories By Derek Kol

Derek Kol is a technology specialist focused on SMB and enterprise IT innovations.

@MicroservicesExpo Stories
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
To more closely examine the variety of ways in which IT departments around the world are integrating cloud services, and the effect hybrid IT has had on their organizations and IT job roles, SolarWinds recently released the SolarWinds IT Trends Report 2017: Portrait of a Hybrid Organization. This annual study consists of survey-based research that explores significant trends, developments, and movements related to and directly affecting IT and IT professionals.
Developers want to create better apps faster. Static clouds are giving way to scalable systems, with dynamic resource allocation and application monitoring. You won't hear that chant from users on any picket line, but helping developers to create better apps faster is the mission of Lee Atchison, principal cloud architect and advocate at New Relic Inc., based in San Francisco. His singular job is to understand and drive the industry in the areas of cloud architecture, microservices, scalability ...
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Is your application too difficult to manage? Do changes take dozens of developers hundreds of hours to execute, and frequently result in downtime across all your site’s functions? It sounds like you have a monolith! A monolith is one of the three main software architectures that define most applications. Whether you’ve intentionally set out to create a monolith or not, it’s worth at least weighing the pros and cons of the different architectural approaches and deciding which one makes the most s...
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In large enterprises, environment provisioning and server provisioning account for a significant portion of the operations team's time. This often leaves users frustrated while they wait for these services. For instance, server provisioning can take several days and sometimes even weeks. At the same time, digital transformation means the need for server and environment provisioning is constantly growing. Organizations are adopting agile methodologies and software teams are increasing the speed ...
This recent research on cloud computing from the Register delves a little deeper than many of the "We're all adopting cloud!" surveys we've seen. They found that meaningful cloud adoption and the idea of the cloud-first enterprise are still not reality for many businesses. The Register's stats also show a more gradual cloud deployment trend over the past five years, not any sort of explosion. One important takeaway is that coherence across internal and external clouds is essential for IT right n...
Back in February of 2017, Andrew Clay Schafer of Pivotal tweeted the following: “seriously tho, the whole software industry is stuck on deployment when we desperately need architecture and telemetry.” Intrigue in a 140 characters. For me, I hear Andrew saying, “we’re jumping to step 5 before we’ve successfully completed steps 1-4.”
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, will discuss how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He will discuss how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor – all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Software as a service (SaaS), one of the earliest and most successful cloud services, has reached mainstream status. According to Cisco, by 2019 more than four-fifths (83 percent) of all data center traffic will be based in the cloud, up from 65 percent today. The majority of this traffic will be applications. Businesses of all sizes are adopting a variety of SaaS-based services – everything from collaboration tools to mission-critical commerce-oriented applications. The rise in SaaS usage has m...
The proper isolation of resources is essential for multi-tenant environments. The traditional approach to isolate resources is, however, rather heavyweight. In his session at 18th Cloud Expo, Igor Drobiazko, co-founder of elastic.io, drew upon his own experience with operating a Docker container-based infrastructure on a large scale and present a lightweight solution for resource isolation using microservices. He also discussed the implementation of microservices in data and application integrat...
We'd all like to fulfill that "find a job you love and you'll never work a day in your life" cliché. But in reality, every job (even if it's our dream job) comes with its downsides. For you, the constant fight against shadow IT might get on your last nerves. For your developer coworkers, infrastructure management is the roadblock that stands in the way of focusing on coding. As you watch more and more applications and processes move to the cloud, technology is coming to developers' rescue-most r...
2016 has been an amazing year for Docker and the container industry. We had 3 major releases of Docker engine this year , and tremendous increase in usage. The community has been following along and contributing amazing Docker resources to help you learn and get hands-on experience. Here’s some of the top read and viewed content for the year. Of course releases are always really popular, particularly when they fit requests we had from the community.
Even for the most seasoned IT pros, the cloud is complicated. It can be difficult just to wrap your head around the many terms and acronyms that make up the cloud dictionary-not to mention actually mastering the technology. Unfortunately, complicated cloud terms are often combined to the point that their meanings are lost in a sea of conflicting opinions. Two terms that are used interchangeably (but shouldn't be) are hybrid cloud and multicloud. If you want to be the cloud expert your company ne...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
The human body is the most complex machine ever created! With a complex network of interconnected organs, millions of cells and the most advanced processor, human body is the most automated system in this planet. In this article, we will draw comparisons between working of a human body to that of a datacenter. We will learn how self-defense and self-healing capabilities of our human body is similar to firewalls and intelligent monitoring capabilities in our datacenters. We will draw parallels b...