A Summary of Openstack Austin Summit

17 May 2016

To give a summary of Openstack Austin Summit:

There are no much news on Cinder (features are developing but a bit routine). Manila becomes mature (and gets more exposure) now. Multi-site openstack is receiving increasing weight (from Cell V2, Ceph & Swift, backup/DR, to deployment practices).
Ceph Jewel release is remarkable (CephFS production-ready, RDB mirror available for journal replication). NVM/SSD technologies are game-changing (NVMe Ceph journal, XtremIO, etc). DPDK are quickly getting adopted (in OVS, NFV, monitoring). Hyper-converged native storage solution (Veritas[1][2]) for Openstack starts to show up (Ceph however not designed for this).
Kuryr or container overlay network doesn’t have much progress (looks like). Neutron keeps improving DNS, Baremetal support, IPAM, and NFV/VNF. For SDN part, OVN (vs Dragonflow), Opendaylight integration, and Dragonflow are progressing. Service Function Chaining (SFC) is coming to shape (and Tacker, OPNFV, NFVI, ETSI NFV, OASIS TOSCA, etc). Networking, NFV, SDN, service chaining, and various solutions & vendors, are still the most active part of Openstack. Note that VNF usually need custom Kernel (which is common for proprietary switches), thus you see Cumulus Linux.
Containers are of course hot, but most of them are supported via PaaS (rather than directly through Openstack) (Murano vs Magnum), or used for containerized Openstack deployment (Kolla, Ansible + container, run on Kubernetes, etc). OCI & CNCF are still working hard to get themselves known. Magnum is building Bay Driver (when will CloudFoundry and Openshift come in?).
IoT becomes hot (SAP, IBM, HPE, Pivotal, TCP Cloud, SmartCities). IBM is betting on Openstack. Mirantis is gaining increasing weight (and respect) in community. Ubuntu/Canonical is rising (they have so many presentations). Openstack Foundation is spending increasing effort on training activities, including the new Certificated Openstack Admin (COA) (99Cloud supports it in China), in preparation to become the true industry standard. Besides, this Summit has a new Superuser TV series.
An interesting thing is that super user/developer companies have basically occupied most presentation slots on the Summit (market/committee consolidation?). In the maillist someone even proposed to remove voting process of speaker proposals.

Interesting new projects:

Romana for network and security automation
Kingbird for multi-site services
Nation for compute node HA
Convergence to make Heat execution more scalable and handle failures better
Astara for virtualize Neutron agents, VNF, and ease of management
Tacker for network service function chaining orchestration
Fuxi as mentioned in Magnum etherpads with Kuryr to enable data volumes
Watcher provides continous resource optimization (energy-aware optimization, conslidation, rebalancing, etc) in close loop including monitoring and action/advise.
Higgins/Zun aims to enable Openstack as one platform for provisioning and managing VMs, baremetals, and containers as compute resources. Compared to Magnum, who enables containers by a second framework such as Kubernetes/Mesos/Swarm on top of Openstack, Higgins try to make containers Openstack-native. The developers come from original Nova-docker and Magnum. It being renamed to Zun.

Interesting new storage vendors: Scality (S3 compatible, unified solution for Openstack (EMC solutions are however more usecase specific as I see), dynamic loadbalancing & no consistent hashing & never balance data); Veritas (DAS hyper-converged native solution for Openstack (Ceph is not designed for this), built into hypersior).

How to Select Videos to Watch

Each Openstack Summit releases hundreds of presentation videos. It is no easy work to select the most worthwhile ones to watch. Here’s my guideline

Checkout the officially featured videos (link).
Checkout the officially summary/recap videos (example of Tokyo Summit).
Checkout the keynotes presentation (link). They are at the beginning of each summit day. They demonstrate key community events and directions.
Checkout the Breakout Sessions. Openstack Summit is divided into Breakout Sessions and Workshops, Developer & Operator working sessions, Keynote presentations, Marketplace Expo Hall, Lounges, etc (see summit timeline; it is clearer if you attend on-site). Note that this is not “track”. Breakout Sessions are usually of more importance (of course you can watch other types too), to locate them:
- Open the meeting room map (link). Find the Breakout Session meeting room.
- In presentation schedule page (link), find the talk by meeting room.
Checkout the vidoes on Youtube of high view count (link). High view count video indicates bigger impact.
Checkout the popular videos people liked through Twitter (effective if you followed the right group).
Checkout how many attendees signed to watch a video (example of Vancouver Summit, see the “Attendees” part). It shows how many person said “I want to watch” this video.
Checkout your interested videos by track (link). Track means type of a presentation, for example, storage, operations, enterprise IT strategies, etc.
Watch the presentation level (example: beginner). Choose your fits.
Don’t forget #vBrownBag videos (link, search “austin”). They are 15min each, but usually very inspiring. #vBrownBag is not part of Openstack Foundation; AFAIK it is a horizontal organization that borrows slots in all sorts of summits/conferences.
Checkout the Design Summit (link). This is where the next version Openstack (Newton) features are being discussed and planned. Wish there was video. The Etherpads content are pretty condensed, while the best way to understand what core developers have said is to attend on-site.

Besides, if you can go on-site to an Openstack Summit, listen to the questions asked by audience (and answer), ask your questions, and talk with people, are usually more important.

Featured Videos

After this Openstack Austin Summit, I found out that the official site provided us with a new lively video page.

As far as I can see, Openstack Summit is high focusing on users, especially the big users. The most favored contents are usecases, practices, experiences, etc. Technical details, black magic, design discussions are not the main theme, however, except that routinely core developers will come to stage and share the newest updates.

For real technical stuff, you may need to attend the design summit (it spans the full week, with most events scheduled to the last summit day; example). The core developers summarize their discussions on design summit etherpads (I wish there would be videos too). And remember that, the most cutting edge technical updates always appear on developer maillist, where the key is to learn how experts think and discuss upon a new problem.

AT&T’s Cloud Journey with OpenStack

AT&T is an elder and super user of Openstack. What they favor is common in the community: open white box architecture, multi-site deploy with combined local and global controllers, no vendor lock-in, and the agility. But essentially I think it is cost-reduction, which is actually the most seen. I can see multi-site Openstack is getting mature and getting adoptted now. Checkout the differences between zone vs region vs cell vs aggregates (Note: Openstack zone is very different from AWS zone). Cell V1 is deprecated, while Cell V2 is still being actively developed (link). Murano is recommended by AT&T, for which I personally like its object-oriented orchestration language; Magnum, however, is not seen. And eventually Mirantis, and its Fuel, is becoming more and more the canonical production-level Openstack distribution.

Doubling Performance in Swift with No Code Changes

It is amazing that Swift, who uses Python as the data path language (with so many C++/C/Golang competitors), becomes such a success today. So tweaking Pythin interpretor is a must-do. I remember that Jython tries to run Python on JVM, leveraging JVM’s GC, JIT & Hotspot and performance & maturity; not sure its stats, seems no much adoption. The default Python interpretor is CPython. PyPy, used in this video, however, features in the JIT, which is famous for interpertors. Using PyPy in Swift to improve performance is straightforward, which should have come out years before (since Swift is written in Python). Now it finally made progress, awesome progress, bravo!

Canonical - Carrier grade architecture with public cloud economics

NFV is hot and increasingly gaining heat in telecom area to adopt Openstack. But I think they are far from “carrier grade” now, the latter demands HA, security, demanding throughput & latency, manageability, and smooth upgrading & patching. For jargon such as Openstack vs OpenDaylight vs Openflow vs Open vSwitch, see here. Generally this video gives introduction to Juju (integrated with Ubuntu) that eases Openstack development, and provide support to various aspects such as containers, hyper-converged architecture, software-defined storage, NFV & SDN, deep learning, ceph monitoring. The interesting trends is that, Ubuntu becomes increasingly the canonical platform for Openstack and various opensource software. Although people saying CentOS is more production stable, it seems systemd draws too much repell from the community.

Embracing Datacenter Diversity

This is a keynote. 7500 people attended Austin Summit on-site (slightly less than 9000 in Tokyo?). A key move from this summit is the Certified Openstack Adminstrator (COA). We can see Openstack is preparing to become a mature industrial fundamental platform; increasingly more training activities occur on the summit, and now we have official Openstack admin certification. In China, 99Cloud instantly established the COA training facility. The video released currently voted Openstack Super Users winner: NTT (Tokyo), from nominated candidates: GMO INTERNET, Go Daddy, Overstock, University Federal de Campina Grande, Verzion.

AT&T’s OpenStack Journey. Driving Enterprise Workloads Using OpenStack as the Unified Control Plane

AT&T is elder. AT&T Integrated Cloud (AIC) starts from Juno, and moving to Mitaka in 2017. Agility, CI/CD, DevOps are the key enabler from Openstack; so like most adoptors, AT&T is using Openstack vastly in the development environment, but seems limitted in production. They use KVM and VMware (vCenter) in hybrid. They need to integrate Openstack with many other things, so writing Fuel plugins is priority, and also need to integrate Fuel with other management tools such as Ansible. Fuel upgradability is the key. There are things that AT&T needs but not present in upstream community, AT&T needs to close the gap itself (and contribute).

HPE - Lifecycle Management of OpenStack Using Ansible

HP Openstack, Helion, is elder, but doesn’t perform very successful. HP shutdown its public cloud in Oct 2015. This video demonstrate HPE’s lifecycle management of OpenStack using Ansible. To be honest, this is a hotspot in the past Openstack but already out-dated now (and we have Fuel).

Fireside Chat: Mark Collier & Jim Curry

It is very interesting that this talk tries to dig into venture capitalists’ key concerns related to startups based on opensource. For startups, how to evaluate the correct product and market is hard. Another problem is scaling (from small business to big), for example, how to do goto market, how to build the organization and leadership team, how to think about services vs product. Although they are not as familar with the technical part, Venture help beyond money. When enterprise wants technology, they want it standard and know where the support comes from, rather than free cost; the former is what Redhat is doing. Markey dynamics are changing; companies are invest more in opensource rather than proprietary. The speaker expressed concerns about a trend from building great technology to building for money. Opensource vs open-core is interesting; although the later is widely being employed today, but too many companies are burned by their open-core models. Customer expects their vendors to make money (they want healthy vendors), but don’t like to be held hostage by them (no vendor lock-in is big concern). The open-core model is slowly dying today (according to the speaker). Next generation angel is a new creation, which requires entrepreneurs to be under age 40, and a commitment that investor spends enough time staying with startups.

Intel Sponsor Keynote

Intel and IBM are radical investors in various opensource ecosystem. It is interesting to think how their strategies differ from other elder IT vendor companies. Recent breakthroughs, DPDK & SPDK, 3D XPoint NVM, Intel PCIe SSDs, and E5 v4 Cloud CPU, from Intel, are bringing great momentum in the storage and cloud world. Native GPU access in virtual machines now relies on Intel GVT; if you remember that Intel VT is one of the beginning foundation of the virtualization age.

Mirantis Sponsor Keynote

I can say that, Mirantis makes production-level Openstack distribution public accessable. Tales are that Mirantis before Openstack is nearly bankrupt. But Mirantis grabbed the big oppotunity, and became the canonical Openstack flagship (and gets a lot of financing investments). It doesn’t own any single line of proprietary code; the value comes from their selection of bug fixing, patches, and security enhancements, they step further than community, their solid testing, and their good deployment designs (link). Mirantis is also the top rank upstream contributor. This video tells an interesting opinion: Openstack is 1 part technology and 9 parts people and process.

DevOps At Betfair Using Openstack and SDN

This video is completely organized as a long and solid demo. Betfair shows how they use their tools and Openstack underlyingly to orchestrate package building, network creating, app deployment, setup loadbalancers, and rolling upgrade their app. It is curious that no one actually use Horizon; they build UI each of their own. In a word, the demo is killer usecase of Openstack in app lifecyle management.

Keynotes

What, I can’t find any Keynote video? Are they merged into featured videos? Or decomposed into a series of common videos? Weird … (There are still Tokyo keynotes on Youtbe, but no Austin …)

Recaps

There is no official recaps of Austin Summit. But I’ve found one from Rackspace and one from HPE.

Rackspace: OpenStack Summit Austin 2016 - Racker Recap

Talking about the Summit is exciting, in such a big scale, great experience, bla, bla, bla … Nothing important.

HPE: OpenStack® Summit 2016 Austin Recap, Day 2, Day 3, Day 4

To short. A lot of big things are happening … Video ends.

Cinder, Ceph and Storage in Openstack

I’m always interested in Ceph, Cinder and various storage technologies in Openstack, either data path or control path. Recent storage world are quickly evolving: DPDK & SPDK, PCIe SSD, NVMe, NVDIMM, RDMA adoption, smart NIC, Ceph BlueStore, hyper-converged architecture, software-defined storage (SDS), etc. Is an age that

Storage is again merging with computing. You can see Ceph (using commodity computing hardwares), and hyper-converged architectures.
Software-defined datacenter is the future. SDS is one of the pieces.
Flash is getting more and more adoptted. You can see from SAS/SATA SSD, PCIe SSD/Flash, NVMe SSD/Flash, NVDIMM SSD/Flash, persistent memory, etc, they are quickly climbing up the stack. Storage (and network) is too fast for CPU and memory, so people are finding ways to mitigate the memory bandwidth and PCIe bandwith limits, where you can see DPDK, SPDK, RDMA, etc. Many new technologies bypass the Linux Kernel to achieve lower latency. Also, Kernel page table (and the hardware-assistant MMU) now can be used to address filesystem metadata, see SIMFS, interesting.
Scale-out architecture is the king. I have to say that one reason is Intel cannot build any more scaled-up CPU (and architecture) now, so vendors need the industry to buy-in scale-out strategy. And scale-out is more friendly to the cloud fashion and commodity white box trend.

How Stuff Works Cinder Replication and Live-Migration

Cinder replication has been long under development, basically, get troubled because vendors have very different design requests. Replication V1.0 is in volume granularity, but given up. Replication V2.x is in fulll backend granularity. V2.1 hope fully will be available to use (doc). This article is a good introduction of how replication works; but it doesn’t mention thaw. The video is by NetApp. Live migration and storage compability chart is a bit useful.

Big Data Rapid Prototyping by Using Magnum with Cinder and Manila

Joint video by NetApp and SolidFire. This video introduces using Magnum to orchestrate container PaaS, use Manila to deploy a share (filesystem), and mount to Docker. Where is the “big data”?

One Does Not Simply Use Multiple Backends in Cinder

Cinder volume-type and multi-backend have been available for long time. This video teaches you how to use.

Cinder and Docker, Like Peanut Butter and Chocolate

Cinder and Manila are of coure volume solutions for Container/Docker, one as block and one as fileysystem. Docker now have volume-plugin. Kubernetes support Cinder (doc). The talk is by IBM and Dell, but promote rackspace/gophercloud in the end.

Leverage the Advantage of Multiple Storage Backends in Glance

This video is by UnitedStack. “More and more users want to leverage the advantages of ceph and enterprise storage. But with the restriction of glance we could only get images in one place and copy to another storage if we boot virtual machines in different backends.” Now we can use Glance Multi-location to solve the problem. It is also a usecase that we need more than one Ceph backends to be switched in Glance.

EMC - Enterprise Storage Management for Mixed Cloud Environments

Promoting using CoprHD in Openstack. AFAIK CoprHD can be used to replace Cinder (CoprHD supports Cinder API), or to be used as a Cinder driver. CoprHD actually has a pretty cool architecture and a much wider feature range covering block, object, filesystem, replication, and recovery.

Datera - OpenStack Cinder delivering Intent-Defined Infrastructu

Datera introducts its orchestration tool product. Talks a lot about the template.

Persistent Storage for Containers Using Cinder

The emccode/rexray is software-defined storage controller for container platforms such as Docker and Mesos. Magnum uses Rexray to provide persistent volumes for Mesos. Compared to Cinder, Rexray is more native to Docker, standalone, and simpler (also mentioned here).

Cephfs as a Service with OpenStack Manila

CephFS has finally gone production-ready (Jewell version). Integration of CephFS with Manila is OK but seems not mature yet.

Cinder Project Update

Cinder core developers presents.

The Replication API: V2.0 is disabled, V2.1 (Cheesecake) fallover the whole backend; avaible to use now, but not mature; vendor support list see here.
Backup supports full and incremental and non-disruptive backup. Active-active HA is very awesome design, there are a lot of moving parts, still WIP. Checkout the code if you like.
Multi-attach allows a volume to be attached to multiple hosts or VMs, not fully functional yet.
Rolling upgrade is OK now, but I guess it not very mature; it includes RPC versioning, versionedObjects, API microversions, and online DB schema upgrade. There are updates for Fibre Channel.
Some new backend drivers are added (now 53 in total); LVM, RBD, NFS are the reference architectures.

In Newtown (next release), we will have, Replication V2.x (Tiramisu); continuing of active-active HA, rolling upgrade, microversions, os-brick will help Cinder on ironic baremetal, and async operation and reporting. Cinder Replication Tiramisu gives tenants more control of the replication granularity, e.g. a volume or a group of volumes (using Replication Groups).

The Performance Issues of Cinder Under High Concurrent Pressure

AWCloud (海云迅捷) presents. They deployed 200 nodes Openstack and test by Rally. Boot from volume often fail because of the low performance of Cinder. The problem resides on

HAProxy reports 504. It is too slow because the version is too old
Cinder-api database connection driver blocks the thread (eventlet monkey patch doesn’t help). Solution is to increase worker count.
Cinder-volume is too slow to process large amount of requests: create volume, initialize connection, attach. Solution is to run more Cinder-volume workers (private code).
Cinder-volume race condition while running multiple works. Solution is to add lock (private code).
RDB rados call blocks the thread, because they are not patched by eventlet.
Download or clone image is too slow by Glance. Solution is to use RBD store.
The increase of database entries lead to sharp decline in performance. The hotspot is the reservation table. Solution is to add a combined index, and clean unecessary data.
Others: increase rpc_response_timeout, rpc_case_timeout, osapi_max_limt.

Results: boot_server_from_volume from failure in concurrency=200, to all success in concurrency=500; create_volume from failure in concurrency=1000, to all success in concurrency=2500. Good presentation!

Expanding DBaaS Workloads with OpenStack Trove and Manila

Presentation by Tesora, NetApp and Redhat. On 5min22s there is a summary of why people want Openstack

97% is to standardize one platform API
92% to avoid vendor lock-in
79% to accelerate innovation
75% to operation efficiency
66% to save money

“Until recently, the OpenStack Trove DBaaS project only used the Cinder block storage service for database storage. With joint development work from NetApp, Red Hat and Tesora, it is now possible to run database workloads on OpenStack using Manila-based file shares.”

Red Hat - Making Ceph the powerhouse you know it can be!

Introduce Red Hat Ceph Storage to you.

VMware - Charter, PernixData, VMware A case study

Cassandra deployment demo to introduce VMware Integrated OpenStack, VMware NSX, and PernixData. The case study is pretty detailed, with cluster layout design and benchmark results.

There and Back Again - Moving Data Across Your Clouds

Presentation by Mirantis. Migrating data from one storage backend to another backend, or inter-cloud. Challenge is usually network limits, and how to avoid impact SLA. Approaches can be

DD from block to block. Simple, slow, and don’t allow data udpates.
Rsync. It’s file but not block level.
Use storage backend’s replication. It is vendor dependent.
Just connect the storage backend to the other side.

They use bbcp protocol to accelerate block migration. The command dd | pv | dd looks useful. For ceph, we have rbd export-diff and rbd import-diff; rbd export and rbd import; this is called incremental snapshots transfer. Sébastien’s blog is using DRBD and Pacemake. MOS/Fuel plugin helps deploy existing Ceph as primary storage, i.e. connect instread of move; it is still under development.

Scality - S3 and OpenStack, the best of both worlds

Present by Scality. Swift is not fully compatible with AWS S3 API, for example container/object encryption. Scality Ring Storage product comes for you.

Amalgamating Manila And Swift for Unified Data Sharing Across Instances

Object storage has become more of a choice for many workloads. There are still traditional applications that need filesystem access. Swift and Manila solves the data sharing needs for VM. Presentation by IBM.

Ceph at Scale - Bloomberg Cloud Storage Platform

Ceph RGW, the object storage, is actually pretty popular. Many people are deploying RGW. The POD architecture of Ceph is interesting, even it may not be really necessary. VM use ephemeral storage vs Ceph, a summary. Ceph RGW stack configurations, see here. This video shares in detailed their Ceph and RGW config in both hardware and software. The orchestration of Ceph is by Chef. Their tools at github. The testing tools for Ceph and RGW:

Ceph: RADOS Bench, COS Bench, FIO, Bonnie++
Ceph RGW: JMeter. Test load by requesting from a cloud.

Swift Object Encryption

Presentation by IBM and HPE. This talk is about future, so Swift object encryption is not ready. The encryption can be supported in hardware disk level, virtual block device level (LUKS, dm-crypt), or Swift encryption middleware level. BYOK (bring your own key) can be supported only in the last approach. Here is encryption spec and code.

OpenStack + Open Compute Project == Best of Breed Clouds

Presented by Big Switch. This talk is about OCP hardware. It is still early age, so this talk is pretty “soft”.

Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph

Talking about backup in Openstack, there is a project, Freezer, focusing data level, and Smaug focusing on application-level. Besides, you can apply the common standard backup practices, for DB, filesystems, /var/lib/xxx, /etc/xxx, /var/log/xxx, etc.

This presentation focus on Ceph cross-site backup. RDB mirroring (finally, don’t need to export-diff now) is used to replicate Ceph. The architecture design replications each level of Openstack. RBD mirror is available with Ceph Jewel, with the upcoming Redhat Ceph Storage 2.0. RBD mirror replicates journal underlyingly; it is asynchronized replication. It is supported in Cinder Replication V2.1. The current gap mainly resides on metadata replication. New project Kingbird provides centralized service for multi-site Openstack deployments.

A Close Look at the Behaviors of the Multi-Region Swift Clusters

Presented by Inspur (浪潮) & 99Cloud. The write & read affinity creatly improve multi-site performance of Swift. But due to eventual consistency, new data may not have to be replicated to appropriate location when site failure happens. Basically this talk tells about some practices about read & write affinity.

SanDisk - The Consequences of Infinite Storage Bandwidth

SSD / CPU performances and bandwidth drop dramatically because the quick climbing of SSD speed. CPU / DRAM bandwidth bottleneck is another problem. SAN 2.0 - NVMe over Fabrics; this is an interesting idea:

NICs will forward NVMe operations to local PCIe decies
CPU removed from the software part of the data path
CPU is still need for the hardware part of the data path
IOPS improve, BW is unchanged
Significant CPU freed for application processing

To me, it looks like that the storage industry is evolving in spiral path. The rise of new NVM/SSD media, may bring back the past-style SAN architecture again. But this time, NVMe protocol is connected directly on PCIe bus, compared to the past-style expensive SCSI. Storage media access is bypassing kernel, bypassing CPU, bypassing memory, just direct RDMA; so it’s kinda like a computer controller connecting to bunch of disk arrays, even through the disk array box is actually a computer, its CPU/memory/OS is not used or necessary. New technologies also bring in a lot of proprietary hardware configurations, but they are really much faster than what pure-software white box can do now. Finally rack-scale architecture is a lot heard related to the storage market.

OpenIO - OpenIO Object Storage Made Easy

Presented by OpenIO. Commodity hardware + softwared-defined storage = hyper scalable storage and compute pool. Track containers rather than objects. Grid of nodes with no consistent hashing, never balance data. Dynamic loadbalancing by compute scores for each serivce in realtime. These designs are interesting. The OpenIO object storage is integrated at Swift Proxy server level.

Swift Middlewares - What Are They?

The “middileware” here orients from Python’s WSGI server design. It allows you to add customized feature to each part of Swift. Middleware can be added to the Swift WSGI servers by modifying their paste configuration file. Anyway, middleware is the decorator design pattern introduced by Python WSGI to overlay server features; it’s useful. Swift itself actually uses a lot of middleware, see its config file.

Optimizing Software-Defined Storage for OpenStack

Present by EMC. Promoting the idea of software-defined storage (SDS), and EMC ScaleIO. Shared best practices for work with SDS. Compared to Ceph, ScaleIO is purpose-built, native block, less trade-off on performance.

Swift 102 Beyond CRUD - More Real Demos

Present by SwiftStack. A practical talk to introduce Swift advanced features. Concurrent gets to reduce first byte latency. To optimize multi-region, use read/write affinity, memcache pooling, aysnc account/container updats. Swift 2.7 now allows for 1 byte segment in Static Large Objects (previous it is 1MB).

Developing, Deploying, and Consuming L4-7 Network Services

This is Hands-on Workshop, lasting 1h26m. The Youtube view count is 278, pretty high in average, looks welcome. There is a demo of network service chain: external -> firewall + lb -> lb -> app -> db. The demo is present on Redhat Enterprise Linux Openstack Platform (not Horizon, well).

Hey Storage Engineer Tell me About Backups in OpenStack!

Present by NetApp & SolidFire. Backup the volume snapshots from Cinder, to Swift, to NetApp appliance (dedup & compression is good), or to cloud through a cloud gateway (cloud-integrated storage appliance). Demo 2 shows the backup workflow of SAP HANA on Manila. Next they introduced Manila Share Replication. Replication is used as non-disruptive backup.

How to Integrate OpenStack Swift to Your “Legacy” System

Present by NTT. Swift is good solution backup / disaster recovery. Swift uses HTTP REST API. But customer, as mentioned in this video, wants NFS or iSCSI to be compatible with their legacy application. The solution is to mount Swift as filesystem using Cloudfuse. But note that Swift is optimized for large files rather than lots of small files. There various issues while trying to use Swift as NFS/iSCSI to solve the backup problem. This talk has in detail discussion of them.

Scality - Open Networking and SDS, vendor-level integration amplifies Software Defined Convergence

Introduction of Scality Ring Storage product. Cumulus Linux is interesting: a networking-focused Linux distribution, deeply rooted in Debian. Note that NFV runs VNF on commodity server, thus optimizing the Kernel is important; by which there emerges dedicated Kernel provider, such as Cumulus Linux. Scality is fully distributed P2P no-center-at-all architecture.

Scality - OpenStack Unified Storage One Platform to Rule them All

Scality Ring Storage product is a unified storage platform, being able to support Swift, Glance, Cinder, Manila (each has the dirver). It is able to replication, erasure coding, geo-redundancy, self-healing, etc. On 9m50s there is an Openstack Storage usecase diagram against storage type and size.

Unified storage solution of Openstack, interesting; AFAIK some companies choose Ceph for the same purpose. EMC solutions are however more usecase specific, AFAIK: Block is ScaleIO/XtremIO, filesystem is Isilon, object storage is ECS.

EMC - Accelerating OpenStack deployments with modern all-flash scale-out storage

Promoting EMC XtremIO. The problem to solve is: IO blender effect at large scale, VM provisioning & clone, dynamic policy-based operations. XtremIO is all-flash and sparkingly fast. The content-based addressing is a key design of XtremIO. Actually the best technical video to introduce XtremIO is the one from Storage Field Day and the one from SolidFire. XtremIO is the #1 all-flash market leader with 34% share. On 11m51s there is a comparison graph of scale-up vs scale-out on the rack shelf; scale-up is actually not able to survive shelf-level failure (e.g. power, switch). Per XtremIO controller provides 150K IPOS, scale-out to 16 boxes 2M IOPS. XtremIO has 100% metadata in memory, inter-connected with RDMA fabric. XtremIO integrates with Openstack Cinder to provide block storage.

Monitoring Swift ++ (incl Nagios, Elasticsearch, Zabbix, & more)

Presented by SwiftStack. 8m13s is a nice summary of monitoring components: agent, aggregration engine, visualizer, alerting, and the popular solutions for each of them. 10m11s categorizes the types of data to monitor, and the monitoring lifecycle: measurement, reporting, characterization, thresholds, alerting, root cause analysis, remediation (manual/automated). 19m49s records the key point to monitor in Swift: cluster data full, networking including availability and saturation, proxy states such as CPU and /healthcheck, auditing cycles, replication cycle timing. The checks can be installed on load balancer. Later of this talk is demo.

Canonical - ZFS, Ceph and Swift for OpenStack and containers wi

Present by Ubuntu. Canonical is the company behind Ubuntu. Ubuntu is quite active on this Summit. Compariing raw Disk (3-year refresh) vs AWS storage price:

SSD $12 TB/month
HDD $1.5 TB/month
EBS SSD $100 TB/month
EBS HDD $45 TB/month
S3 $30 TB/month
Glacier $7 TB/month
S3 $90/TB transfer out
Glacier $10/TB transfer out

8m45s is a summary of how recent new technologies save cost (is low-power archtecture ready to use now?). So how Ubuntu helps reduce storage cost? ZFS, Ceph, and Swift. [Deutsche Telekom] evaluated Manila, summarizing that Manila is enterprise mature, and something needs improve.

Cephfs in Jewel Stable at Last

Finally! CephFS is production-ready in Jewel release. For previous history, see CephFS Development Update, Vault 2015.

CephFS has “consistent caching”. The client is allowed to cache, and server invalidates them before change, which means client will never see any stale data. Filesystem clients write directly to RADOS. Only active metadata is stored in memory. CephX security now applies to file path. Scrubbing is available on MDS. Repair tools are available: cephfs-data-scan, cephfs-journal-tool, cephfs-table-tool. MDS has standby servers; they replay MDS logs to warm up the cache for fast take-over. CephFS sub-tree partitioning allows you to have multiple active MDSes. Directory fragmentation allows you to split a hot directory over many active MDSes; it is not well-tested. Snapshot is available now. You can create multiple filesystem, like pools or namespaces (not well tested). Still pain points: file deletion pins inode in memory, client trust problem (there is totally no control expcet separate them in namespaces/tenants), some tools to expose states are still missing (dump individual dirs/files, see why things are blocked, track access to file).

Designing for High Performance Ceph at Scale

Presented by Comcast. The storage node is using NVMe for journal (but SATA HDD). To benchmark, FIO for block, Cosbench for object. Remember to test scaled-out performance. Issues encountered

TCMalloc eats 50% CPU. Solution is to give it more memory
Tune the NUMA. Map CPU cores to sockets; map PCIe devices to sockets; Map storage disks (and journals) to the associated HBA; pin all soft IRQs to its associated NUMA node. Align mount points so that OSD and journal are on the same NUMA node.

General performance tips below

Use latest vendor drivers (can be up to 30% performance increase)
OS tuning focus on increasing threads, file handles, etc
Jumbo frames help, particular on the cluster network
Flow control issues with 40Gbe network adapters; watch out for dropping packets
Scan for failing disks (slow responding disks), take them out

Popular, But No Time to Watch

(Note that several of the popular videos are moved to my “interested” sections.)

Driving the Future of IT Infrastructure at Volkswagen Group

Cisco - Scaling Containers and OpenStack

VMware - IBM + VMware Everything you need to know

DevTest Cloud The Ultimate OpenStack UseCase

OpenShift and OpenStack Devlivering Applications Together

Canonical - Using containers to create the World s fastest OpenS

OpenStack and the Power of Community-Developed Software

Integrate Active Directory with OpenStack Keystone

Erisson - Changing the Context with OpenStack Orchestration to Support SDN/NFV

OpenStack and Opendaylight The Current Status and Future Direction

Designing for NFV Lessons Learned from Deploying at Verizon

Why Betfair Chose OpenStack - the Road to Their Production Private Cloud

Windows and OpenStack - What s New in Windows Server 2016

Achieving DevOps for NFV Continuous Delivery on Openstack - Verizon Case Study

A Deep Dive into Project Astara

Managing OpenStack in a Cloud-native Way

Practical OVN Architecture, Deployment, and Scale of OpenStack

Tap-As-A-Service What You Need to Know Now

Deploying Neutron Provider Networking on Top of a L3 Provider Network Using BGP-EVPN

Just Interesting

How to Become an Advanced Contributor

By Errisson. Get familar with tools, do the dirty work, do code reviews, focus on project/feature, enter large project by code review or priority bugs/features. More advanced, to drive the agenda: know usecases, solutions, why, alternatives, and usability; find supporters via maillist, events; use the Big Tent (to create new project). Inter-project features and communication are becoming more impotant these days. Before start big features, talk with core devs to make sure they support (and align with the project design decisions). Focus, Be professional, Be collaborative.

Project Kuryr - Docker Delivered, Kubernetes Next!

The problem to solve by Kuryr is the overlay^2 network of VM nested containers, which results in great performance penalty. According to video, I think there isn’t much actual progress. Magnum has plan to integrate with Kuryr. Let’s wait.

Service Function Chaining Technology Analysis and Perspective

Two technologies: NSH-based SFC and MPLS/BGP VPN-based SFC. Comparison at 30m28s. Related platforms: Openstack Tacker as orchestration platform, OpenDaylight SDN Controller, OPNFV Apex Installer Platform, and Custom OVS with NSH patch. There are quite a lot of diversity in the implementation (but not fragmentation, according to the video).

Tacker - Building an Open Platform for NFV Orchestration

Tacker orchestrate VNFs. Tacker Multi-Site allows Operators to place, manage and monitor VNFs in multiple OpenStack sites. It closely works with OPNFV and standard bodies like ETSI NFV and OASIS TOSCA. 99Cloud is the 3rd top contributor of Tacker. The later slides introduces Tacker architecture, how it works, and various features. Multi-site VIM support is interesting.

Quantifying the Noisy Neighbor Problem in Openstack

Presented by ZeroStack. This talk presents how workloads interfere with each other in Openstack, from a several month long study of running workloads in different configurations on ZeroStack. They use micro-benchmarks as well as enterprise workloads such as Hadoop, Jenkins and Redis. The experiment setup is showed in great detail. SSD backends cope with random read/writes well, compared to HDD. Both VM perform well before storage is not saturated, but drop significantly after that. Lessons learned: use SSD, use local storage, don’t need to use reliable storage for Hadoop, Cassandra who have in-built replication. Single VM is not able to saturate all 10Gbps NIC due to CPU saturation; throughput is OVS bound; GRE encap/decap consumes high CPU. Suggestions for network: leverage DPDK, explore VLAN-based solutions. Anyway, the overall observations and conclustions are a bit too plain … I remember that Google Heracles have done quite a lot of analysis in depth.

Nokia - Combining Neutron, DPDK, Ironic and SRIOV for seamless high-performance networking

DPDK, (Ironic,) SR-IOV are new technologies that can significantly boost performance. To use them: SR-IOV VM driver, DPDK VM driver. There are however a lot of issues before make them work together. The later slides focus on them. (However I want to know how to enable DPDK and SR-IOV in compute host or VM: this? this? …).

Telco Cloud Requirements What VNF s Are Asking For

Present by Juniper. Value moves up The Value Stack and away from Telo’s. The needs are to enable applications which is closer to customers, and ingration of DC existing technologies and network & operations. Usecases span from L2-L7 networking, security services, 3GPP, to CDN, voice and video. The current gap of what is needed and what is available requires various of solutoins (or compromises or just wait). Generally this is a pretty good video with deep understanding to the Telco needs.

End-To-End Monitoring of OpenStack Cloud

Zenoss promoting their monitoring solution: model, events, metrics. It uses no extra agents (use what is already there). There is Ceilometer integration from Ceilometer collector; and integration with Neturon, etc. Impact analysis generate a dependency graph to show the risk of a failure. I remember that a new project, Openstack Vitrage (invented from Nokia), is able to do root cause analysis; interesting but not receiving much attention yet. Not sure how many machine learning / detection / prediction are actually ready product use. As the slides illustrated, Zenoss monitoring solution is quite comprehensive; wish it is opensource.

Trusted Cloud Solutions

Presented by Redhat, to demo the list of her Openstack platform customers: FICO, betfair, Verizon, etc. See 0m44s. Nothing related to “trust” technology.

Watcher, a Resource Manager for OpenStack: Plans for the N-release and Beyond

IBM & Intel (and ZTE) presents. Watcher governs the Openstack and provides resource optimization, e.g. energy aware optimizations, workload consolidations, rebalancing, etc. It includes monitoring in the close loop. Users can template their own strategies. Watcher can run in advise mode, active mode, and verbose mode. It reminds me of VMware DRS which uses live-migration to conlidate VMs and saves power. Good project orientation.

Interesting, But Watch When Have Time

DPDK, Collectd & Ceilometer The Missing Link

Deploying OpenStack Using Docker in Production

Ancestry.com in Production with OpenStack and Kubernetes

Split Brain Overlays as Seen by Linux Vs. Networking Folks

Troubleshooting oslo.messaging RabbitMQ issues

Troubleshoot Cloud Networking Like a Pro

Tuning RabbitMQ at Large Scale Cloud

Achieving Five-Nine of VNF Reliability in Telco-Grade OpenStack

Optimising NFV Service Chains on Openstack Using Docker

Nokia - Nokia SDN & NFV: Bringing Dynamic Service Chaining to the Telco Cloud with Nuage Networks & CloudBand

Neutron Quality of Service, New Features And Future Roadmap

Installing, Configuring, and Managing a 300+ OpenStack Node Network In Under An Hour

High Availability for Pets and Hypervisors - State of The Nation

Scalable Heat Engine Using Convergence

Horror Stories How we keep breaking the Scheduler at Scale!

Dive into Nova Scheduler Performance - Where is the Bottleneck

vBrownBag

vBrownBag is less than 15min each. There are about 70 sessions of them on Openstack Austin Summit. vBrownBag has dedicated meeting rooms; I think it’s far from saturated; there must still be many empty slots remaining available.

Abhijit Dey – Guaranteed Performance with Advanced QoS in OpenStack

Present by Veritas. End-to-end storage QoS

Application SLA: max IOPS, min IOPS, workload priority, latency
Efficienty utilization of all tiers of storage
Storage congestion control
Resolving distributed IO dependencies
Data management IO prioritization
I added: quotas, throttling, reservation, …

The QoS ability is provided via

Cinder QoS APIs
Scheduling filters for VM and storage affinity
Commodity storage: Near Storage: implemented in IO stack of hypervisor
- Awareness of direct-attached storage
- Adaptive features/feedback

OpenFlame by Veritas - a software-defined storage solution for Openstack powered private clouds.

John White & Shishir Agrawal – Juniper Container SRX Intro & Use

Juniper conainerized SRX (vs virtualized SRX) to monitor west-east traffic (a firewall). Would containerized network functions overides VNF one day?

ABHIJIT DEY – Deep Dive into VM Live Migration in DAS environment

Present by Veritas. Still introducing their storage solution; there are a few quite interesting designs. To enable efficient support for VM live migration:

Use direct-attached storage (DAS) on compute node
Make sure VM always have local replica
Storage lazily moves with VM during live migration
Nova/Cinder orchestrate the storage movement

Openstack native hyper-converged storage solution, interesting. Ceph however is not designed for this. Veritas OpenFlame.

Ravi Jagannathan – Security Vulnerabilities in OpenStack deployments

CVE vulnerability database; pay attention. There are tons of Openstack-specific vulnerabilities disclosed on CVE. Make sure you patch faster than cyber attackers. This talk generally walks through the important CVE exploits, horrible. I guess soon (or already?) public net Openstack vulnerability scanner will show up.

Balaji Ethirajulu – Network Analytics, catalyst for NFV & SDN

Ericsson Network Manager - Analytics: insights of VNF.

Ben Silverman – Dynamic Capacity Planning in Elastic Clouds

Mirantis & Cisco presents. Workload testing

Use realistic production environment as you can
Compare results against baseline virtualization and baremetal results
Use incremental adjustments to flavors to find sweet spot in CPU and Memory requirements

Proper flavor definition. Monitoring, metrics and elasticity calculations. Develop triggers for expansion.

Elasticity: the number of growth and shrinkage daily based on total capacity

Available capacity managment softwares for Openstack: Talligent (commercial), Rightscale Cloud Analytics (Commercial), Cloud Kitty (opensource). Generally, this talk provides a lot of good practices for capacity planning. Mark.

Haseeb Akhtar and Toby Ford – Autonomous Network Management

Present by AT&T. Introducing AIC ECOMP architecture. Basically this is automated management, monitoring & auto-scaling, multi-datacenter hybrid cloud support, and visualized control panel.

Trevor Roberts Jr – What is VMware doing with OpenStack

VMware do community contribution, provide drivers, NSX, and the VIO (VMware integrated Openstack) for Openstack. But … what most Openstack adoptors want is

97% is to standardize one platform API
92% to avoid vendor lock-in
79% to accelerate innovation
75% to operation efficiency
66% to save money

And now Openstack show much more feature diversity (NFV, big data, IoT, container, etc) than VMware (except you pay for Pivotal again).

Joe Arnold – Native File Access for OpenStack Swift

Introducing to ProxyFS, which provides filesystem through Swift object storage. The key is that, ProxyFS is integrated to Swift Proxy, rather than on top of Swift API. There are some comparisons against SwiftStack Filesystem Gateway. (Swift Filesystem supports Hadoop. And there is a S3QL who supports file access to Google Storage, Amazon S3, or Swift). ProxyFS use Log-Structured Files/Directories to store data in Swift. Present by SwiftStack. Looks promising.

Design Summit (Newton)

Openstack Austin Summit is for Mitaka version release. But the Design Summit is for Newton. Looking into the future.

Openstack Summit is high focused on users. Most contents are usecases, practices, experiences, project status, IT strategies, vendor promotions, etc. For real technical stuff, you may need to attend the Design Summit (it spans full week accompanying the main Summit). Core developers rally to discuss important decisions for the new release cycle, and summarize them on etherpads (I wish there would be videos too).

Besides, the most cutting edge technical updates always appear on developer maillist, where the key is to learn how experts think and discuss upon a new problem. And checkout the blueprint status (example) and code reviews (example) are helpful.

I mainly focus on Cinder and Magnum status

Cinder

On-going topics:

Replication Next Steps (Tiramisu, in granularity of volume group)
Rolling upgrades - next steps (Good to borrow to other systems)
HA Active/Active (Awesome design, and long work)
Scalable backup (Interesting)
Cinderclient to OpenStackclient parity
Changes to our current testing process
Move Cinder Extenstions to Core
Move API docs in tree

There are tons of details. Basically each core dev governs their specifc feature. See it yourself (and in accompany of blueprints). So don’t say Cinder is too mature to move. A lot of work needs to be done :-)

Magnum

A lot of topics in Magnum too

The bay driver design (Finally! When will CloudFoundry and Openshift come into Magnum?)
Lifecycle operations for long running bays (Rotating certs, soft/hard rest, dynamically reconfigure, automate failover/recoveretc. Learning from Carina.)
Magnum scalability / discussion of async implementation (Wait to learn from the design)
Container Storage - Support for Container Data Volumes (What? OverlayFS is 5 - 7 times more performance than devicemapper?)
Container Network - Integrate a Kuryr Networking Driver (The overlay^2 network performance penalty is biting)
Ironic Integration - Add Support for Running Containers in Baremetal (Ironic has long been ignored but quite necessary in enterprise usecases)
Challenges in Magnum Adoption - Experiences and How to Address
Unified Container Abstraction for Different COEs
Magnum HEAT template versioning (To allow Magnum upgration being compatible)
Bay monitoring: health, utilization (Notifications and Ceilometer integration, etc)

See their details in each (and in accompany with blueprints).

Other Sources of Summaries

I like the Openstack Austin Summit observation written by Sammy Liu: Openstack community is marching in the “tier II” area, bigdata, NFV, IoT, blockchain, finance & trading, e-commerce core web servers, etc; VMware is still basically “tier I” however, while in “tier II” its voice is hardly heard. The opinion is very inspiring; but as a comment, I know CloudFoundry, which is usually deployed with VMware in commercial use, is an early starter in IoT, with GE the big partner; and Pivotal CF is also veteran in bigdata area (while it is true that voice of commerical IaaS & PaaS is very small in Openstack summmit). “Tier II” area is usually done with PaaS, rather than VMware which is IaaS.

Another good summary & video recommendation for the network parts is Neutron Community Weekly Notes. The official also released their summaries for Day 1, Day 2, Day 3, Day 4; good.

And, usually each release of Openstack will have a release note that summarize major changes (e.g. Kilo’s). They are very useful. But on Mitaka, release note is re-organized by projects. They are not as informative as before, but I think still very useful to grasp the lastest updates.

Finally, if you are really interested in the feature & development progress of each Openstack component, I think however checkout the blueprint status on Launchpad (example) is the best way (and checkout the dev maillist).

Openstack 22

Create an Issue or comment below