If you didn’t know it tech is rampant and wonton in its generation of buzzwords for all sorts of things, or more specifically for sexing up all sorts of things that are more than the sum of their parts. The singularity, Web 2.0, e-this, i-that, social, mobile, the cloud, big data, and the Internet of Everything (Things) to name a few. Sometimes these things overlap. Sometimes they overlap in a so many ways that a Venn diagram starts to need an expert algebraic topologies to explain. Lately, the buzz has been around the cloud, big data, and the internet of things. As a technology professional and in particular a database professional it is important to me to sort through the really cool things.
One of the things we now have an is internet of the light bulb. This is real unlike the hyper text coffee pot protocol which is a 16 year April fool’s joke the IETF created, or the Linux toaster which is a malware vector. Phillips Hue is a light bulb (system) that represents the culmination of incremental improvements to lighting bringing together a number of preexisting lighting features and adding them to an API. Hue combines, timers, dimmers, color, and circuit switching to provide a system that allows us to further take for granted one of the greatest technical achievements of the last millennium. That of course is the internet of things, at least the command side of it. I wonder if people have it in them to take their beverages according to a queue, or perhaps a locational aware protocol that starts my coffee when I approach the break room during a certain period of time.
Other things there are internets of will have sensors. These will write data somewhere. Lately, it looks like some Hadoop file system will be the standard. I wish I were more abreast of the internals the way a storage engineer would be so I could form a better opinion of why HFL would be a better choice than WAFL for writing streams. I suspect it has to do primarily with Map Reduce (at least public domain Map Reduce) only being implemented on Hadoop and MongoDB (and …). Of course if we need to respond to events in real time we need to put our event triggers upstream from storage.
Last night after our monthly PSSUG meeting our presenter Jason Brimhall found himself waiting for a cab at Microsoft’s Malvern PA Office. Having waited for a taxi in Philly’s suburbs before I felt bad for his wait that might not end so I took it on myself to offer him a ride to the airport Marriot. On the drive I took it upon myself to ask his opinion on different strategies for allocating database files in SQL Server. This is a topic I find myself drawn to and I welcome the opportunities to hear other people’s (especially a Microsoft Certified Master’s) opinion on the subject.
I tend to find myself drawn to a simple approach that increases the number of files proportional to the size of the database. I admit that this is a matter of convenience and portability that has its limits. While Microsoft publishes some file numbers as the maximum (4,294,967,295) and the number above which issues may occur (50,000), a much lower number of files will tend to practical depending on needs. In one case I have encountered I found that at about 200GB I struggled to get reasonable copy and move speeds even with ESEUTIL. This was of course a single site scenario with fiber channel SAN as target and destination. Other scenarios require much smaller file sizes as well as other tools for copying (XCOPY, ROBOCOPY). Copies over a WAN (should be over a VPN\VLAN, but that is a different discussion Sebastian Meine could comment on as a segway to his presentation on new security roles in SQL Server)… copies over a WAN will traverse network segments of varying speeds, few if any of which will ever approach the throughput of fiber SAN. In these cases it is a matter of balancing the need for a manageable number of files and folders and keeping files small enough to copy. But for each of these cases it is important to note that the performance is being measured on the dimensions of speed of database file copy and file manageability (as an intersection of scripting for file and folder creation and the flexibility of the physical data model to make use of files), these are not frequently dimension of performance that our users notice.
Jason pointed out that from the point of view of user any significant query slowness will be perceived as an outage, and was a point added to his case against the separation of data and indexes into separate files specifically relevant to cases of piecemeal file based recovery. Our discussion started with my asking him his opinion on the comparative merits of creating separate database files for each processor (core) vs. using IO affinity masks to delegating IO related processing to a specific processor. I feel it important to note that some NUMA systems have separate IO channels for each NUMA node in which case IO affinity would deprive those member processors those channels (I think). Jason was supportive of allocating files per processor in the more general case. He mentioned that it wasn’t necessary (or always possible) to have separate disks for these files to realize performance.
What are you thoughts?
Are you familiar with the proportional fill algorithm?
What do you think about the impact of using solid state local disk in tempdb in 2012 FCI? or tiered storage?
Proportional fill reading:
What feature does your app provide that cannot be provided via a mobile web site?