Another year is drawing to an end. That means it is time to review the successes and failures on my list of resolutions. It is also time to make new ones. On the plus side I have already met my commitment to at least one blog post per month average. I won’t have to put up a flurry of articles in the next week and a half to meet that resolution. I think I should modify my resolution next year such that I need to post more regularly.
Reviewing my posts I have noticed that I am following that advice that I post about what interests me.Of course, I have been told that I can be interested in things that don’t necessarily interest others. It’s not that people aren’t interested in database administration or SQL server; It’s that most people aren’t interested in more abstruse topics. At a recent board meeting of the Philadelphia SQL Server user’s group we were discussing speakers and topics. We wanted to be clear that we are seeking to bring in speakers that will bring our members to meetings.
To that effect I am going to stipulate for the coming year’s blog posts that I post about more accessible topics. I am also going to make it a point to articulate why my sometimes more advanced seeming efforts are not. One example comes to mind.
When you are trying to set up an automatic restore job and you want that job to start when the backups have finished. In order for a job to be started via a linked server the linked server account must own that job (it might also need to be a member of the target server role modified to be granted execute on sp_startjob). There must also be a connection such that it can read the backup tables to find the placement of the backup files. XP_dirtree of cmdshell can also be used to list existing files but they won’t fail the way attempting backup files that don’t exist (perhaps they got deleted, it would behoove you to notice this). If we were to use SSIS it would only need to be db_creator on the restore machine and db_reader on msdb on the backup machine, as well as have read file system permissions for the location of the backup. This is a smaller service area than would be needed to start remote jobs and read remotely, especially as it would be a single Windows login as opposed to two SQL Logins, and at least one Windows login (more if the SQL Agent account is not a domain account used by both machines).
If you didn’t know it tech is rampant and wonton in its generation of buzzwords for all sorts of things, or more specifically for sexing up all sorts of things that are more than the sum of their parts. The singularity, Web 2.0, e-this, i-that, social, mobile, the cloud, big data, and the Internet of Everything (Things) to name a few. Sometimes these things overlap. Sometimes they overlap in a so many ways that a Venn diagram starts to need an expert algebraic topologies to explain. Lately, the buzz has been around the cloud, big data, and the internet of things. As a technology professional and in particular a database professional it is important to me to sort through the really cool things.
One of the things we now have an is internet of the light bulb. This is real unlike the hyper text coffee pot protocol which is a 16 year April fool’s joke the IETF created, or the Linux toaster which is a malware vector. Phillips Hue is a light bulb (system) that represents the culmination of incremental improvements to lighting bringing together a number of preexisting lighting features and adding them to an API. Hue combines, timers, dimmers, color, and circuit switching to provide a system that allows us to further take for granted one of the greatest technical achievements of the last millennium. That of course is the internet of things, at least the command side of it. I wonder if people have it in them to take their beverages according to a queue, or perhaps a locational aware protocol that starts my coffee when I approach the break room during a certain period of time.
Other things there are internets of will have sensors. These will write data somewhere. Lately, it looks like some Hadoop file system will be the standard. I wish I were more abreast of the internals the way a storage engineer would be so I could form a better opinion of why HFL would be a better choice than WAFL for writing streams. I suspect it has to do primarily with Map Reduce (at least public domain Map Reduce) only being implemented on Hadoop and MongoDB (and …). Of course if we need to respond to events in real time we need to put our event triggers upstream from storage.
Last night after our monthly PSSUG meeting our presenter Jason Brimhall found himself waiting for a cab at Microsoft’s Malvern PA Office. Having waited for a taxi in Philly’s suburbs before I felt bad for his wait that might not end so I took it on myself to offer him a ride to the airport Marriot. On the drive I took it upon myself to ask his opinion on different strategies for allocating database files in SQL Server. This is a topic I find myself drawn to and I welcome the opportunities to hear other people’s (especially a Microsoft Certified Master’s) opinion on the subject.
I tend to find myself drawn to a simple approach that increases the number of files proportional to the size of the database. I admit that this is a matter of convenience and portability that has its limits. While Microsoft publishes some file numbers as the maximum (4,294,967,295) and the number above which issues may occur (50,000), a much lower number of files will tend to practical depending on needs. In one case I have encountered I found that at about 200GB I struggled to get reasonable copy and move speeds even with ESEUTIL. This was of course a single site scenario with fiber channel SAN as target and destination. Other scenarios require much smaller file sizes as well as other tools for copying (XCOPY, ROBOCOPY). Copies over a WAN (should be over a VPN\VLAN, but that is a different discussion Sebastian Meine could comment on as a segway to his presentation on new security roles in SQL Server)… copies over a WAN will traverse network segments of varying speeds, few if any of which will ever approach the throughput of fiber SAN. In these cases it is a matter of balancing the need for a manageable number of files and folders and keeping files small enough to copy. But for each of these cases it is important to note that the performance is being measured on the dimensions of speed of database file copy and file manageability (as an intersection of scripting for file and folder creation and the flexibility of the physical data model to make use of files), these are not frequently dimension of performance that our users notice.
Jason pointed out that from the point of view of user any significant query slowness will be perceived as an outage, and was a point added to his case against the separation of data and indexes into separate files specifically relevant to cases of piecemeal file based recovery. Our discussion started with my asking him his opinion on the comparative merits of creating separate database files for each processor (core) vs. using IO affinity masks to delegating IO related processing to a specific processor. I feel it important to note that some NUMA systems have separate IO channels for each NUMA node in which case IO affinity would deprive those member processors those channels (I think). Jason was supportive of allocating files per processor in the more general case. He mentioned that it wasn’t necessary (or always possible) to have separate disks for these files to realize performance.
What are you thoughts?
Are you familiar with the proportional fill algorithm?
What do you think about the impact of using solid state local disk in tempdb in 2012 FCI? or tiered storage?
Proportional fill reading:
What feature does your app provide that cannot be provided via a mobile web site?